C语言中的接口和特性

原文

Everyone likes interfaces in Go and traits in Rust. Polymorphism without class-based hierarchies or inheritance seems to be the sweet spot. What if we try to implement this in C?

Interfaces in Go • Traits in Rust • Toy example • Interface definition • Interface data • Method table • Method table in implementor • Type assertions • Final thoughts

Interfaces in Go

An interface in Go is a convenient way to define a contract for some useful behavior. Take, for example, the honored io.Reader:

// Reader is the interface that wraps the basic Read method.
type Reader interface {
    // Read reads up to len(p) bytes into p. It returns the number of bytes
    // read (0 <= n <= len(p)) and any error encountered.
    Read(p []byte) (n int, err error)
}

Anything that can read data into a byte slice provided by the caller is a Reader. Quite handy, because the code doesn't need to care where the data comes from — whether it's memory, the file system, or the network. All that matters is that it can read the data into a slice:

// work processes the data read from r.
func work(r io.Reader) int {
    buf := make([]byte, 8)
    n, err := r.Read(buf)
    if err != nil && err != io.EOF {
        panic(err)
    }
    // ...
    return n
}

We can provide any kind of reader:

func main() {
    var total int
    b := bytes.NewBufferString("hello world")

    // bytes.Buffer implements io.Reader, so we can use it with work.
    total += work(b)
    total += work(b)

    fmt.Println("total =", total)
}

Go's interfaces are structural, which is similar to duck typing. A type doesn't need to explicitly state that it implements io.Reader; it just needs to have a Read method:

// Zeros is an infinite stream of zero bytes.
type Zeros struct{}

func (z Zeros) Read(p []byte) (n int, err error) {
    clear(p)
    return len(p), nil
}

The Go compiler and runtime take care of the rest:

func main() {
    var total int
    var z Zeros

    // Zeros implements io.Reader, so we can use it with work.
    total += work(z)
    total += work(z)

    fmt.Println("total =", total)
}

Traits in Rust

A trait in Rust is also a way to define a contract for certain behavior. Here's the std::io::Read trait:

// The Read trait allows for reading bytes from a source.
pub trait Read {
    // Readers are defined by one required method, read(). Each call to read()
    // will attempt to pull bytes from this source into a provided buffer.
    fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize>;

    // ...
}

Unlike in Go, a type must explicitly state that it implements a trait:

// An infinite stream of zero bytes.
struct Zeros;

impl io::Read for Zeros {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        buf.fill(0);
        Ok(buf.len())
    }
}

The Rust compiler takes care of the rest:

// Processes the data read from r.
fn work(r: &mut dyn io::Read) -> usize {
    let mut buf = [0; 8];
    match r.read(&mut buf) {
        Ok(n) => n,
        Err(e) => panic!("Error: {}", e),
    }
}

fn main() {
    let mut total = 0;
    let mut z = Zeros;

    // Zeros implements Read, so we can use it with work.
    total += work(&mut z);
    total += work(&mut z);

    println!("total = {}", total);
}

Either way, whether it's Go or Rust, the caller only cares about the contract (defined as an interface or trait), not the specific implementation.

Toy example

Let's make an even simpler version of Reader — one without any error handling (Go):

// Reader an interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
type Reader interface {
    Read(p []byte) int
}

Usage example:

// Zeros is an infinite stream of zero bytes.
type Zeros struct {
    total int // total number of bytes read
}

// Read reads len(p) bytes into p.
func (z *Zeros) Read(p []byte) int {
    clear(p)
    z.total += len(p)
    return len(p)
}

// work processes the data read from r.
func work(r Reader) int {
    buf := make([]byte, 8)
    return r.Read(buf)
}

func main() {
    z := new(Zeros)
    work(z)
    work(z)
    fmt.Println("total =", z.total)
}

Let's see how we can do this in C!

Interface definition

The main building blocks in C are structs and functions, so let's use them. Our Reader will be a struct with a single field called Read. This field will be a pointer to a function with the right signature:

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} Reader;

To make Zeros fully dynamic, let's turn it into a struct with a Read function pointer (I know, I know — just bear with me):

// An infinite stream of zero bytes.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    size_t total;
} Zeros;

Here's the Zeros_Read "method" implementation:

// Reads up to len(p) bytes into p.
size_t Zeros_Read(void* self, uint8_t* p, size_t len) {
    Zeros* z = (Zeros*)self;
    for (size_t i = 0; i < len; i++) {
        p[i] = 0;
    }
    z->total += len;
    return len;
}

The work is pretty obvious:

// Does some work reading from r.
size_t work(Reader* r) {
    uint8_t buf[8];
    return r->Read(r, buf, sizeof(buf));
}

And, finally, the main function:

int main(void) {
    Zeros z = {.Read = Zeros_Read, .total = 0};

    Reader* r = (Reader*)&z;
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}

See how easy it is to turn a Zeros into a Reader: all we need is (Reader*)&z. Pretty cool, right?

Not really. Actually, this implementation is seriously flawed in almost every way (except for the Reader definition).

Memory overhead. Each Zeros instance has its own function pointers (8 bytes per function on a 64-bit system) as "methods", which isn't practical even if there are only a few of them. Regular objects should store data, not functions.

Layout dependency. Converting from Zeros* to Reader* like (Reader*)&z only works if both structures have the same Read field as their first member. If we try to implement another interface:

// Reader interface.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} Reader;

// Closer interface.
typedef struct {
    void (*Close)(void* self);
} Closer;

// Zeros implements both Reader and Closer.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    void (*Close)(void* self);
    size_t total;
} Zeros;

Everything will fall apart:

int main(void) {
    Zeros z = {
        .Read = Zeros_Read,
        .Close = Zeros_Close,
        .total = 0,
    };
    Closer* c = (Closer*)&z;  // (X)
    c->Close(c);
}

Closer and Zeros have different layouts, so type conversion in ⓧ is invalid and causes undefined behavior.

Lack of type safety. Using a void* as the receiver in Zeros_Read means the caller can pass any type, and the compiler won't even show a warning:

int main(void) {
    int x = 42;
    uint8_t buf[8];
    Zeros_Read(&x, buf, sizeof(buf));  // bad decision
}

size_t Zeros_Read(void* self, uint8_t* p, size_t len) {
    Zeros* z = (Zeros*)self;
    // ...
    z->total += len;                   // consequences
    return len;
}

C isn't a particularly type-safe language, but this is just too much. Let's try something else.

Interface data

A better way is to store a reference to the actual object in the interface:

// An interface that wraps the basic Read method.
// Read reads up to len(p) Zeros into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    void* self;
} Reader;

We could have the Read method in the interface take a Reader instead of a void*, but that would make the implementation more complicated without any real benefits. So, I'll keep it as void*.

Then Zeros will only have its own fields:

// An infinite stream of zero bytes.
typedef struct {
    size_t total;
} Zeros;

We can make the Zeros_Read method type-safe:

// Reads len(p) bytes into p.
size_t Zeros_Read(Zeros* z, uint8_t* p, size_t len) {
    for (size_t i = 0; i < len; i++) {
        p[i] = i % 256;
    }
    z->total += len;
    return len;
}

To make this work, we add a Zeros_Reader method that returns the instance wrapped in a Reader interface:

// Returns a Reader implementation for Zeros.
Reader Zeros_Reader(Zeros* z) {
    return (Reader){
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
        .self = z,
    };
}

The work and main functions remain quite simple:

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return r.Read(r.self, buf, sizeof(buf));
}

int main(void) {
    Zeros z = {0};

    Reader r = Zeros_Reader(&z);
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}

This approach is much better than the previous one:

The Zeros struct is lean and doesn't have any interface-related fields.
The Zeros_Read method takes a Zeros* instead of a void*.
The cast from Zeros to Reader is handled inside the Zeros_Reader method.
We can implement multiple interfaces if needed.

Since our Zeros type now knows about the Reader interface (through the Zeros_Reader method), our implementation is more like a basic version of a Rust trait than a true Go interface. For simplicity, I'll keep using the term "interface".

There is one downside, though: each Reader instance has its own function pointer for every interface method. Since Reader only has one method, this isn't an issue. But if an interface has a dozen methods and the program uses a lot of these interface instances, it can become a problem.

Let's fix this.

Method table

Let's extract interface methods into a separate strucute — the method table. The interface references its methods though the mtab field:

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} ReaderTable;

typedef struct {
    const ReaderTable* mtab;
    void* self;
} Reader;

Zeros and Zeros_Read don't change at all:

// An infinite stream of zero bytes.
typedef struct {
    size_t total;
} Zeros;

// Reads len(p) bytes into p.
size_t Zeros_Read(Zeros* z, uint8_t* p, size_t len) {
    for (size_t i = 0; i < len; i++) {
        p[i] = i % 256;
    }
    z->total += len;
    return len;
}

The Zeros_Reader method initializes the static method table and assigns it to the interface instance:

// Returns a Reader implementation for Zeros.
Reader Zeros_Reader(Zeros* z) {
    // The method table is only initialized once.
    static const ReaderTable impl = {
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
    };
    return (Reader){.mtab = &impl, .self = z};
}

The only difference in work is that it calls the Read method on the interface indirectly using the method table (r.mtab->Read instead of r.Read):

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return r.mtab->Read(r.self, buf, sizeof(buf));
}

main stays the same:

int main(void) {
    Zeros z = {0};

    Reader r = Zeros_Reader(&z);
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}

Now the Reader instance always has a single pointer field for its methods. So even for large interfaces, it only uses 16 bytes (mtab + self fields). This approach also keeps all the benefits from the previous version:

Lightweight Zeros structure.
Easy conversion from Zeros to Reader.
Supports multiple interfaces.

We can even add a separate Reader_Read helper so the client doesn't have to worry about r.mtab->Read implementation detail:

// Reads len(p) bytes into p.
size_t Reader_Read(Reader r, uint8_t* p, size_t len) {
    return r.mtab->Read(r.self, p, len);
}

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return Reader_Read(r, buf, sizeof(buf));
}

Nice!

Alternative: Method table in implementor

There's another approach I've seen out there. I don't like it, but it's still worth mentioning for completeness.

Instead of embedding the Reader method table in the interface, we can place it in the implementation (Zeros):

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} ReaderTable;

typedef ReaderTable* Reader;

// An infinite stream of zero bytes.
typedef struct {
    Reader mtab;
    size_t total;
} Zeros;

We initialize the method table in the Zeros constructor:

// Returns a new Zeros instance.
Zeros NewZeros(void) {
    static const ReaderTable impl = {
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
    };
    return (Zeros){
        .mtab = (Reader)&impl,
        .total = 0,
    };
}

work now takes a Reader pointer:

// Does some work reading from r.
size_t work(Reader* r) {
    uint8_t buf[8];
    return (*r)->Read(r, buf, sizeof(buf));
}

And main converts Zeros* to Reader* with a simple type cast:

int main(void) {
    Zeros z = NewZeros();

    Reader* r = (Reader*)&z;
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}

This keeps Zeros pretty lightweight, only adding one extra mtab field. But the (Reader*)&z cast only works because Reader mtab is the first field in Zeros. If we try to implement a second interface, things will break — just like in the very first solution.

I think the "method table in the interface" approach is much better.

Bonus: Type assertions

Go has an io.Copy function that copies data from a source (a reader) to a destination (a writer):

func Copy(dst Writer, src Reader) (written int64, err error)

There's an interesting comment in its documentation:

If src implements WriterTo, the copy is implemented by calling src.WriteTo(dst). Otherwise, if dst implements ReaderFrom, the copy is implemented by calling dst.ReadFrom(src).

Here's what the function looks like:

func Copy(dst Writer, src Reader) (written int64, err error) {
    // If the reader has a WriteTo method, use it to do the copy.
    // Avoids an allocation and a copy.
    if wt, ok := src.(WriterTo); ok {
        return wt.WriteTo(dst)
    }
    // Similarly, if the writer has a ReadFrom method, use it to do the copy.
    if rf, ok := dst.(ReaderFrom); ok {
        return rf.ReadFrom(src)
    }
    // The default implementation using regular Reader and Writer.
    // ...
}

src.(WriterTo) is a type assertion that checks if the src reader is not just a Reader, but also implements the WriterTo interface. The Go runtime handles these kinds of dynamic type checks.

Can we do something like this in C? I'd prefer not to make it fully dynamic, since trying to recreate parts of the Go runtime in C probably isn't a good idea.

What we can do is add an optional AsWriterTo method to the Reader interface:

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    // required
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    // optional
    WriterTo (*AsWriterTo)(void* self);
} ReaderTable;

typedef struct {
    const ReaderTable* mtab;
    void* self;
} Reader;

Then we can easily check if a given Reader is also a WriterTo:

void work(Reader r) {
    // Check if r implements WriterTo.
    if (r.mtab->AsWriterTo) {
        WriterTo wt = r.mtab->AsWriterTo(r.self);
        // Use r as WriterTo...
        return;
    }
    // Use r as a regular Reader...
    return;
}

Still, this feels a bit like a hack. I'd rather avoid using type assertions unless it's really necessary.

Final thoughts

Interfaces (traits, really) in C are possible, but they're not as simple or elegant as in Go or Rust. The method table approach we discussed is a good starting point. It's memory-efficient, as type-safe as possible given C's limitations, and supports polymorphic behavior.

Here's the full source code if you are interested:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} ReaderTable;

typedef struct {
    const ReaderTable* mtab;
    void* self;
} Reader;

// Reads len(p) bytes into p.
size_t Reader_Read(Reader r, uint8_t* p, size_t len) {
    return r.mtab->Read(r.self, p, len);
}

// An infinite stream of zero bytes.
typedef struct {
    size_t total;
} Zeros;

// Reads len(p) bytes into p.
size_t Zeros_Read(Zeros* z, uint8_t* p, size_t len) {
    for (size_t i = 0; i < len; i++) {
        p[i] = i % 256;
    }
    z->total += len;
    return len;
}

// Returns a Reader implementation for Zeros.
Reader Zeros_Reader(Zeros* z) {
    // The method table is only initialized once.
    static const ReaderTable impl = {
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
    };
    return (Reader){.mtab = &impl, .self = z};
}

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return Reader_Read(r, buf, sizeof(buf));
}

int main(void) {
    Zeros z = {0};

    Reader r = Zeros_Reader(&z);
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}

Cheers!

★ Subscribe to keep up with new posts.