Fixing C's Biggest Mistake

Back in 2009, Walter Bright (creator of the D programming language and Digital Mars C++ compiler before that), wrote an article arguing that C's biggest mistake was "Conflating pointers with arrays." In particular, that it is impossible to pass an array to a function without it being converted to a pointer, so the array dimension is lost.

He then goes on to state that the fix is to add a fat pointer syntax:

void foo(char a[..])

with similar semantics as for his D programming language. Such a fat pointer would be a {count, pointer} pair.

Although he doesn't say this directly, on most ABIs a small struct as that would be passed in two registers.

So it would end-up being ABI equivalent to

#if NEWC
  extern void foo(char a[..]);
#elif C99
  extern void foo(size_t dim, char a[dim]);
#else
    extern void foo(size_t dim, char *a);
#endif

He then fleshes out the rest of the operations.

Fat pointers, slices, views, spans, many languages have a concept like this

Slices in DrC

I've been writing a C interpreter lately and been having fun modernizing C by adding backwards-compatible extensions. Adding a fat pointer type (I chose to call it a slice type) became the obvious thing to do next. As I write a lot of Python in my day job, I also decided to add slicing as an obvious addition to having a first-class slice type.

For the slice operation itself, I decided syntactically it made the most sense to not use `..` or `...` as that makes lexing ambiguous with integer literals: `1..3` looks like `1.` and `.3`, which are two double literals. So I went with the Python syntax of `:`. Once I was using `:`, it made sense to also use that for the type itself instead of Walter's `..`.

int slice[:]; // {count, data} pair
int arr[] = {1,2,3};
slice = arr; // arrays implicitly convert to slices
// access length with .count or _Countof
assert(_Countof slice == slice.count);
int sum(int vals[:]){
  int result = 0;
  for(size_t i = 0; i < vals.count; i++){
    result += vals[i];
  }
  return result;
}
// implicit conversion when passed to function, array length is retained!
assert(sum(arr) == 6);

Since it re-uses C's peculiar way of declaring arrays, returning them from functions looks weird, similar to how returning function pointers looks weird.

#include <ctype.h>
#include <stdio.h>
char strip(char s[:])[:]{ // [:] goes after params
    while(s.count && isspace(s[0])){
        s.count--;
        s.data++;
    }
    while(s.count && (!s[s.count-1] || isspace(s[s.count-1])))
        s.count--;
    return s;
}
char stripped[:] = strip(" hello world!   ");
printf("'%.*s'\n", (int)stripped.count, stripped.data); // 'hello world!'

Unlike Walter's proposal, I don't special case string literals. They are defined as arrays of char, with their length including the terminating NUL. So implicit conversion to slices retains that terminating NUL.

This is actually a better design as it means you can actually check if the slice is NUL-terminated when passing to legacy C functions instead of just hoping it is.

My interpreter includes slice-bounds-checking of course, but if this would be added to the C standard I think it should be implementation-defined what happens for out-of-bounds accesses. It could then be controlled by a compiler switch whether it traps or is undefined, etc.

Conclusion

Implementing slices in DrC was surprisingly easy. Ideally such a construct could be standardized so that we don't all have to define our own bounds-checked array types or raw-dog length+ptr APIs anymore.

Fixing C's Biggest Mistake

Slices in DrC

Conclusion

Copyright