Home

C in WASM

David Priver, January 20th, 2023

Contents

If you search for information on how to compile C code to webassembly, most sources will tell you to use Emscripten. For various reasons (complication, bloat, etc.), you probably don’t want to use Emscripten. This is partially just for my own reference, but here is how you could compile some simple C code to wasm and access it from the browser.

I am assuming you want to target the browser as I don’t see the point of targeting the wasm runtimes that exist outside of the browser — just generate native code. C is already a portable language and with some care you can target any OS used today. It will be much faster as well.

The Compiler

You'll need Clang. Other C compilers, such as GCC or MSVC, do not support compiling to wasm. Note that “Apple Clang” does not support generating webassembly. Strangely, it does support compiling to webassembly, but lacks the ability to actually generate the bytecode — it will typecheck, semantically analyze etc. just fine and error when it would generate code.

You can get clang from your package manager or just download from their github releases.

The Flags

You'll need to pass several flags to clang:

# Wasm
--target=wasm32

# Don't link to libc or assume there's crt1.o.
--no-standard-libraries

# Not a hosted environment.
--ffreestanding

# Don't add normal c include paths.
-nostdinc

# If you want to provide your own libc headers.
-isystem <your replacement libc headers>

Also, enable the wasm extensions:

-mbulk-memory
-mreference-types
-mmultivalue
-mmutable-globals
-mnontrapping-fptoin
-msign-ext

Basically, you want to use the wasm extensions that your targeted browsers will support.

Additionally, more flags need to be passed to the wasm linker (pass them through Clang using -Wl):

# export all of our non-hidden symbols
-Wl,--export-all

# don't look for _start or _main
-Wl,--no-entry

# allow functions to be defined at runtime
-Wl,--allow-undefined

The Standard Library

Almost nothing from the C standard library will be available out of the box. You'll either need to write them yourself or find suitable versions to adapt. There are a few exceptions though: notably, memmove, memcpy and memset can be provided by the host environment as long as you enable the bulk-memory extension for wasm (which is available basically everywhere).

In my opinion, it is most useful to treat webassembly like an embedded platform — don’t try to provide things like printf, file IO, etc. If you want to pretend you’re not actually targetting wasm, just use Emscripten.

Most of the functions you want to use from libc are not actually that hard to implement. The tough ones are things like scanf that you don’t want to use anyway in a reasonable C lib.

You can also always just import functions from javascript.

malloc

You're going to want at least a primitive version of malloc so that javascript can alloc memory in C land (this will come up for string conversion).

A primitive malloc can be implemented as so:

// primitive-malloc.c

// Use clang extensions to your heart's delight, it's
// your only C compiler.
#pragma clang assume_nonnull begin
typedef typeof(sizeof(1)) size_t;
enum {SIZE_T_SIZE=sizeof(size_t)};

// This will be defined by the instantiating wasm
// code.
extern unsigned char __heap_base[];

static unsigned char*_base_ptr = __heap_base;

// Useful function expose to javascript.
void
reset_memory(void){
    _base_ptr = __heap_base;
}

// The workhorse function.
// Just a bump allocator.
static inline
void*_Nonnull
alloc(size_t size, size_t alignment){
    if(alignment > SIZE_T_SIZE)
        alignment = SIZE_T_SIZE;
    size_t b = (size_t)_base_ptr;
    if(b & (alignment-1)){
        b += alignment - (b & (alignment-1));
        _base_ptr = (unsigned char*)b;
    }
    void* result = _base_ptr;
    _base_ptr += size;
    return result;
}

void*
malloc(size_t size){
    return alloc(size, 8);
}

void*
calloc(size_t n_items, size_t item_size){
    void* result = alloc(n_items*item_size, item_size);
    // Provided by bulk memory extension.
    __builtin_memset(result, 0, n_items*item_size);
    return result;
}

// Just don't do anything.
// You could improve this by maintaining a free list.
void
free(void*_Nullable p){
    (void)p;
}
#pragma clang assume_nonnull end

Strings

You'll probably want to pass strings back and forth between C and JavaScript. Unfortunately, there seems to be no performant way to do this - if you do this too much it will dominate the runtime of your code. I'll show the simplest way to do it.

Basically, to pass JS strings to C, you have javascript malloc a buffer and then use the TextEncoder API to encode their utf-16 strings into utf-8 (this is half of what is slow and I can’t figure out a fast way to just copy the utf-16 string directly as it's not that hard to write your string processing on utf-16 (like utf-8, ascii characters are the same)).

To pass C strings to JS, you call a function provided by your javascript code that takes a pointer + length pair. JavaScript can then decode that memory into a JS String using the TextDecoder API.

Code demonstrating this:

// js-strings.js
let mem;
let memview;
const decoder = new TextDecoder();
const encoder = new TextEncoder();
function wasm_string_to_js(p, len) {
    const sub = mem.subarray(p, p + len);
    const text = decoder.decode(sub);
    return text;
}
function write4(p, val) {
    memview.setInt32(p, val, true);
}
function js_string_to_wasm(s) {
    const encoded = encoder.encode(s);
    const p = malloc(encoded.length + 4);
    write4(p, encoded.length);
    mem.set(encoded, p + 4);
    return p;
}
// js-strings.c

// I've found wrapping APIs this way is easiest
// for me.
typedef struct PrefixedString PrefixedString;
struct PrefixedString {
    size_t length;
    unsigned char data[];
};

static
void some_api(size_t length, const char* txt);

void
some_api_js(PrefixedString* ps){
    some_api(ps->length, (char*)ps->data);
}

Building the Code

You'll end up with a commandline that looks like this (put in a Makefile).

clang mycode_wasm.c -o mycode.wasm \
--target=wasm32 --no-standard-libraries \
-Wl,--export-all -Wl,--no-entry \
-Wl,--allow-undefined -ffreestanding \
-nostdinc -isystem Wasm \
-mbulk-memory -mreference-types \
-mmultivalue -mmutable-globals \
-mnontrapping-fptoint -msign-ext \
-O3

Instantiating the Code

The MDN docs are pretty good, but for completeness, here is an example of how to instantiate a web assembly module in JavaScript.

let wasm_inst; // WebAssembly.Instance
let mem; // Uint8Array
let memview; // DataView
let exports; // WebAssembly.Exports
const decoder = new TextDecoder();
const encoder = new TextEncoder();

function wasm_string_to_js(p, len){
    const sub = mem.subarray(p, p+len);
    const text = decoder.decode(sub);
    return text;
}
function write4(p, val){
    memview.setInt32(p, val, true);
}
function js_string_to_wasm(s){
    const encoded = encoder.encode(s);
    const p = exports.malloc(encoded.length+4);
    write4(p, encoded.length);
    mem.set(encoded, p+4);
    return p;
}
const imports = {
    // JavaScript functions you’re exposing to C go here.
    env:{
    },
};
fetch('some_wasm_path')
  .then(response => response.arrayBuffer())
  .then(bytes => WebAssembly.instantiate(bytes, imports))
  .then(x=>{
      wasm_inst = x.instance;
      exports = wasm_inst.exports;
      const m = exports.memory;
      m.grow(1024);
      mem = new Uint8Array(m.buffer);
      memview = new DataView(mem.buffer);
  });

The above example doesn’t end up doing much, but it will instantiate the C code and have you set up for passing strings back and forth.

Debugging

As far as I can tell, debugging wasm code is basically only supported by Chrome. You can step through the bytecode in Firefox and Safari, but to actually map the instructions to your original source code requires chrome + a chrome extension. You’ll need this extension.

You'll need to add the -g flag to your compiler command line. That instructs clang to include DWARF debug info in the wasm file.

The extension looks up the source files by absolute path. This means that if you compile on one machine and debug on another, you’ll need to have the same file hierarchy on both machines.

Profiling

Profiling in the major browsers are all more or less the same - you can see how much time is spent at the granularity of function calls. This is somewhat unfortunate if you are used to native profiling tools which give you timings of each line or even drop down to the assembly level. So you won't be able to do things like optimize control flow. Hopefully this improves in the future.

I personally prefer the Firefox profiler UI, but they are more or less equivalent.

Demo

Below is a complete example of using WASM.

This does very little, but shows all the necessary pieces.

// demo.js
"use strict";
let wasm_inst; // WebAssembly.Instance
let mem; // Uint8Array
let memview; // DataView
let exports; // WebAssembly.Exports
const decoder = new TextDecoder();
const encoder = new TextEncoder();
function wasm_string_to_js(p, len){
    const sub = mem.subarray(p, p+len);
    const text = decoder.decode(sub);
    return text;
}
function write4(p, val){
    memview.setInt32(p, val, true);
}
function js_string_to_wasm(s){
    const encoded = encoder.encode(s);
    const p = exports.malloc(encoded.length+4);
    write4(p, encoded.length);
    mem.set(encoded, p+4);
    return p;
}
const imports = {
    env:{
      write: (p, len) => {
        const s = wasm_string_to_js(p, len);
        document.getElementById('scratch').innerText+=s;
      },
    },
};
document.addEventListener('DOMContentLoaded', ()=>{
  // prog64 is the base64 version of the wasm file.
  const wasm_buffer = Uint8Array.from(
    atob(prog64),
    c => c.charCodeAt(0)
  ).buffer;
  WebAssembly.compile(wasm_buffer)
    .then(x => WebAssembly.instantiate(x, imports))
    .then(x => {
      wasm_inst = x;
      wasm_inst.exports.memory.grow(24);
      exports = wasm_inst.exports;
      mem = new Uint8Array(exports.memory.buffer);
      memview = new DataView(mem.buffer);
      let i = 0;
      document.getElementById('click-me').onclick=()=>{
        exports.clicked(i++);
      };
    });
});
// example.c
#include "primitive_malloc.c"

extern void write(const char*, size_t);

void clicked(int i){
    char c[2] = {(i &0xf)+ 'a', '\n'};
    write(c, sizeof c);
}


All code in this article is released into the public domain.