Home

Templates in C

David Priver, July 22nd, 2022

Sometimes I see people on the internet sharing their generic libraries in C. More often than not, they turn out to use some giant macro that generates a huge amount of code on a single source line. I thought I would show a better way of doing it if you need to write some C and don't want to copy-paste yet another dynamic array or hash table.

None of these ideas are new — I am sure people were doing this in the 80s — but not everyone is aware of them.

Note, _Generic has nothing to do with this (althought you could use it to simulate overloads if you generate a lot of code).

Generics in C

There are a few ways of creating generic data structures or algorithms in C:

Void Pointers

If the data you are working with can be coerced to a void pointer, you can side step the problem by writing a single implementation that only works with void pointers or only with raw bytes. When the user actually uses it they have to cast back to the original type and to be careful not to get confused about which data structure is which.

This is easy to implement, but is error-prone and type-unsafe to use. If you require all data to be pointers the user is encouraged to heap allocate a large number of individual items, which is bad for performance.

Function pointers

You write your library or algorithm to work with opaque types and when you need type-specific functionality, you call a function pointer given by the user.

A good example of this in the standard C library is the qsort function. For moving things it just memcpy's a void pointer, but for the actual comparison it calls the comparison function given by the user.

The advantage of this approach is that it takes up very little code space - there is just one qsort.

The disadvantages are that it is type unsafe — there is no way to check that the user's function pointer is the right one and it is annoying to use. The user has to define a (probably trivial) global function just to sort some data.

Inline Macros

Another approach is to not define a function at all and just do the work in a macro. The classic K&R definition of max is an example of this:

#define max(a, b) ((a) > (b)? (a) : (b))

Ignoring the multiple evaluation problem (a or b is evaluated twice, which can be expensive or surprising if it involves side-effects), this is not adequate for more advanced operations. Sans extensions, you can't really define or use local variables, you can't use recursion, you can't get the normal compiler errors when you pass the wrong type to the "function" and this forcibly inlines the code at every usage site.

It is very easy to use though (it looks like a function).

Code Generating Macros

Another approach is to actually generate a function with the actual types and call that generated function. To avoid having to actually copy-paste, you generate the function via a macro.

#define COMBINE(a, b) a##b
#define MAKE_MAX(T, prefix) \
  static inline \
  T COMBINE(prefix, max)(T a, T b){ \
      if(a > b) return a; \
      return b; \
  }
MAKE_MAX(int, int_)

#include <assert.h>
int main(void){
    int x = int_max(1, 2);
    assert(x == 2);
    return 0;
}

This is type safe and you get the benefits of functions (separating implementation details from usage, local variables, early returns, etc.), but it becomes impossible to debug if you need to step through the generated code — the nature of C macros is everything will be on a single line. Checking the result of the macro expansion is also more difficult.

Additionally, customizing the expanded code is difficult: you either need a large number of arguments to the macro or just forgo customization of things like prefixes to function names.

Finally, just writing it is annoying. You need a \ on every line, most editors give up on syntax highlighting it properly, etc.

Source Code Generation

Another alternative is to generate extra source files that then are compiled into your program. The issue with this is that it greatly complicates the building and compiling of your code. Suddenly you need a build system that is aware of the need to generate the code instead of just leveraging the C compiler that you already have. You step farther away from the ideal of just compiling with cc *.c -o program.

Copy Paste

The manual version of source code generation is copy-pasting the data structure and functions every time you need a new one. This is a nightmare for maintainability — a bug copy-pasted 100 times might never get fixed, especially if that code is then altered and diverges.

Template Headers with Multiple Inclusion

This approach is very similar to Code Generating Macros, but with the benefit of it being easy to check the expanded code, it is possible to debug and customization is much more straightforward thanks to #ifdef. The rest of this article will explain how they work.

Template Headers

The idea of a template header is that you have a file that's meant to be included multiple times. Each time the file is included, it generates a new data structure or new functions, specialized to a given type or types, or really on anything you can imagine.

The easiest way to explain this is via an example:

// Darray.h
// Include this header multiple times to implement a
// simplistic dynamic array.  Before inclusion define at
// least DARRAY_T to the type the dynamic array can hold.
// See DARRAY_NAME, DARRAY_PREFIX and DARRAY_LINKAGE for
// other customization points.
//
// If you define DARRAY_DECLS_ONLY, only the declarations
// of the type and its function will be declared.
//

#ifndef DARRAY_HEADER_H
#define DARRAY_HEADER_H
// Inline functions, #defines and includes that will be
// needed for all instantiations can go up here.
#include <stdlib.h> // realloc, size_t

#define DARRAY_IMPL(word) DARRAY_COMB1(DARRAY_PREFIX,word)
#define DARRAY_COMB1(pre, word) DARRAY_COMB2(pre, word)
#define DARRAY_COMB2(pre, word) pre##word

#endif // DARRAY_HEADER_H

// NOTE: this section is *not* guarded as it is intended
// to be included multiple times.

#ifndef DARRAY_T
#error "DARRAY_T must be defined"
#endif

// The name of the data type to be generated.
// If not given, will expand to something like
// `darray_int` for an `int`.
#ifndef DARRAY_NAME
#define DARRAY_NAME DARRAY_COMB1(DARRAY_COMB1(darray,_), DARRAY_T)
#endif

// Prefix for generated functions.
#ifndef DARRAY_PREFIX
#define DARRAY_PREFIX DARRAY_COMB1(DARRAY_NAME, _)
#endif

// Customize the linkage of the function.
#ifndef DARRAY_LINKAGE
#define DARRAY_LINKAGE static inline
#endif

typedef struct DARRAY_NAME DARRAY_NAME;
struct DARRAY_NAME {
    DARRAY_T* items;
    size_t count;
    size_t capacity;
};

#define DARRAY_push DARRAY_IMPL(push)

#ifdef DARRAY_DECLS_ONLY

DARRAY_LINKAGE
void
DARRAY_push(DARRAY_NAME* array, DARRAY_T item);

#else

DARRAY_LINKAGE
void
DARRAY_push(DARRAY_NAME* array, DARRAY_T item){
    if(array->count >= array->capacity){
        size_t old_cap = array->capacity;
        size_t new_cap = old_cap?old_cap*2:4;
        size_t new_size = new_cap * sizeof(DARRAY_T);
        array->items = realloc(array->items, new_size);
        array->capacity = new_cap;
    }
    array->items[array->count++] = item;
}
#endif

// Cleanup
// These need to be undef'ed so they can be redefined the
// next time you need to instantiate this template.
#undef DARRAY_T
#undef DARRAY_PREFIX
#undef DARRAY_NAME
#undef DARRAY_LINKAGE
#undef DARRAY_push
#ifdef DARRAY_DECLS_ONLY
#undef DARRAY_DECLS_ONLY
#endif

Example Usage

// example.c

#include <assert.h>
#include <stdio.h>

#define DARRAY_T int
#define DARRAY_PREFIX i
#define DARRAY_NAME IntArray
// Must be manually instantiated by #including the file
#include "darray.h"

int main(void){
    IntArray ints = {0};
    ipush(&ints, 1);
    ipush(&ints, 2);
    ipush(&ints, 3);
    ipush(&ints, 4);
    ipush(&ints, 5);
    ipush(&ints, 6);
    assert(ints.count == 6);
    assert(ints.items[0] == 1);
    assert(ints.items[1] == 2);
    assert(ints.items[2] == 3);
    assert(ints.items[3] == 4);
    assert(ints.items[4] == 5);
    assert(ints.items[5] == 6);
    assert(ints.capacity == 8);
    for(size_t i = 0; i < ints.count; i++){
        printf("[%zu] = %d\n", i, ints.items[i]);
    }
    free(ints.items);
    return 0;
}

One reason this approach shines is that it is simple to debug the generation of the code. For example, you can simply see what the code expands to by a command like:

$ cc -DDARRAY_T=int darray.h -E -P

(assuming a gcc or clang-like cc).

If you tried to do a similar thing with a Code Generating Macro, it would be a giant mess on a single line.

The biggest benefit though is that if you have a bug in your generated code and an assertion fails or address-sanitizer complains — you have a real source file with a real location. You can step through the code in a debugger and it is readable and reasonable.

The final benefit is that it is simple to implement customization points: check for a macro to be defined and if not use some default behavior. This can get pretty ugly, but it is better than a huge number of arguments to a Code Generating Macro.

Conclusion

By leveraging the ability of the preprocessor to include files multiple times and the limited amount of introspection it provides, you can generate efficient, specialized code without resorting to generating external files, giving up type safety or copy-pasting code around.

Newer languages than C have better solutions to this problem, but there are many situations you still want to write some C and need things like type safe dynamic arrays, hashtables or even sorting functions (this approach can easily be used to specialize a quicksort).

All code in this article is released into the public domain.