Subscribe Now
Trending News

Blog Post

News

C99 doesn’t need function bodies, or ‘VLAs are Turing complete’ 

19 Feb 2022


The 1999 revision to the C programming language brought several interesting
changes and additions to the standard. Among those are variable length arrays,
or VLAs for short, which allow array types to have lengths that are determined
at runtime. These can show up in different contexts, but for the purposes of
this post we will be looking at VLAs as parameters to functions. For example:

void f(int n, float v[n]);

Since array parameters decay to their corresponding pointer type, that
declaration is functionally equivalent, and compatible (6.7.5.3p7) to the
following:

void f(int n, float *v);

The length of v in the first declaration is given by the value of a previous
parameter of the function, which can only be known at runtime. However, in a
function prototype (i.e. not a function definition) like the above, the length
of a VLA parameter is never computed and essentially ignored (6.7.5.2p5).
But in a function definition, the length is evaluated and must be greater than
zero. In practice this is rather useless because the sizeof operator doesn’t
work as one might expect it to with array parameters (6.5.3.4-note88):

void f(int n, float param[n]) {
    float local[n];
    int a=sizeof local,
        b=sizeof param;

    printf("n=%dn", n);
    printf("sizeof local=%dn", a); // this will be n * sizeof(float)
    printf("sizeof param=%dn", b); // but this will be sizeof(float *)
}

In other words, using sizeof with a VLA will evaluate the runtime-known length
and calculate the array size based on that, except when that VLA is a function
parameter, then, for whatever reason, it is the size of the corresponding
pointer type (in this example, sizeof local===n * sizeof(float) but sizeof param==sizeof(float *)). The length of a VLA parameter may be used when e.g.
computing indices when accessing multi-dimensional variable length arrays.

Alas, the standard mandates that the variable array length be computed when
the function is called. Of course, the expression in between the square
brackets is not limited to simple expressions like n, so one can write
something like:

void f(int a, int b, int m[(a + b) / 2]) {}

or

void f(int x, int b[abs(x)]) {}

or even

void f(int v[getchar()]) {}

The following program should give you an idea of the kind of constructs
that these rules allow for (try it on compiler explorer):

int main(int argc, char *argv[printf("Hello")])
{
    printf(" world!n");
}

The length expression of the VLA is evaluated before any of the statements in
main. I couldn’t find anywhere in the standard saying whether this evaluation
order is well-defined but it is what clang and gcc do, and luckily, it does not
matter for the sake of this article, as we will see shortly.

Let us refer to the subset of C99 where function bodies must be empty as
disembodied C. You might naturally ask yourself what things can be
accomplished in this subset (though you can probably guess the answer from the
title of this page). In other words, what can you do in C if you are limited to
just evaluating expressions and no statements?

  • Using the comma operator, we can sequence arbitrarily many expressions for
    their side effects, so

    void f() {
        printf("Press enter to confirm: ");
        getchar();
        printf("thanks.n");
    }
    

    becomes

    void f(char _[(
        printf("Press enter to confirm: "),
        getchar(),
        printf("thanks.n"),
        1
    )]) {}
    

    In a comma expression, the operands are evaluated left-to-right and the value
    of the last operand is the resulting value of the whole expression. The 1
    at the end ensures that the evaluated array length is>0 (to avoid UB).

  • Functions in disembodied C are going to need a dummy parameter where the VLA
    length expression evaluation can take place. For consistency, we will denote
    it as char _[...] and give it the value "" (the empty string) when calling
    said functions (note that the value we give it doesn’t actually matter, though
    its size should be at least as big as the computed VLA size to avoid UB).

  • If-else statements can be replaced with ternary conditional expressions, such that

    void f(int n) {
        if (n  0)
            printf("positive!");
        else
            printf("zero!");
    }
    

    becomes

    void f(int n, char _[(
        (n  0) ?
            printf("positive!")
        :
            printf("zero!")
        , 1
    )]) {}
    
  • Remember that the VLA length expression can access previous function arguments,
    so parameter passing is essentially unchanged.

  • We cannot return values, but we can use out parameters by taking advantage of
    the fact that assignments are expressions in C, so instead of

    int imax(int a, int b) {
        return a> b ? a : b;
    }
    

    we can write

    void imax(int *out, int a, int b, char _[
        (*out=a> b ? a : b),
        1
    ]) {}
    
  • We cannot define local variables inside of expressions, but we can just add
    extra function parameters to use as temporaries, rewriting

    void fswapf(float *a, float *b) {
        float tmp=*a;
        *a=*b;
        *b=tmp;
    }
    

    as

    static void fswapf_aux(float *a, float *b, float tmp, char _[(
        tmp=*a,
        *a=*b,
        *b=tmp,
        1
    )]) {}
    
    void fswapf(float *a, float *b, char _[(
        fswapf_aux(a, b, 0, ""), 1
    )]) {}
    

    Alternatively, if re-entrancy and thread-safety are disregarded, we could
    just use global (static) variables.

  • What about loops? Clearly we cannot use while or for inside expressions.
    Thankfully, they are unnecessary thanks to recursion. For example:

    float sum(float *v, size_t n) {
        float sum=0.0f;
        for (size_t i=0; i 

    can be expressed as

    /* the forward declaration is necessary */
    static void sum_aux(float *out, float *v, size_t n, char *);
    static void sum_aux(float *out, float *v, size_t n, char _[(
        (n> 0) ? (
            *out +=*v,
            sum_aux(out, v + 1, n - 1, ""),
            1
        ) : 1
    )]) {}
    
    void sum(float *out, float *v, size_t n, char _[(
        *out=0.0f,
        sum_aux(out, v, n, ""),
        1
    )]) {}
    

    In fact, any imperative-style loop can be turned into an equivalent recursive
    loop (as any functional programmer will be happy to demonstrate), though since
    C lacks closures or any form of anonymous functions, it can get quite unwieldy
    (I hope you like auxiliary functions).

    The astute reader might point out that these two versions of sum are not
    equivalent because the recursive definition may cause a stack overflow for
    large enough values of n. This is unfortunately true, and the major hurdle
    for the practicality of disembodied C, but does not preclude Turing
    completeness (an ideal Turing machine has infinite memory at its
    disposal). Luckily, modern compilers are smart, and if we write our functions
    carefully to be tail recursive, they will often be able to perform tail
    call elimination, removing the risk of a stack overflow. For the above
    example, both gcc and clang are able to optimize the tail call (see on
    compiler explorer
    ).

  • Although not relevant to standard C99, in case you had the idea of using gcc
    statement expressions
    to bypass these limitations, the compiler will stop
    you right on your tracks since it doesn’t allow those outside of function
    bodies.

After all of that, it seems that basically anything can be done in disembodied
C, so, is there something that can’t be expressed in this C99 subset?

With the rules that we’ve laid down so far, yes. In particular:

  • It is not possible in the general case to use APIs that require callbacks.
    For example, look at the standard qsort function:

     void qsort(void *base, size_t nmemb, size_t size,
                int (*compar)(const void *, const void *));
    

    The compar parameter is a pointer to a function that should return the
    result of comparing two values in the given array. Since its parameters are
    void pointers, we can’t have the VLA argument we need to perform computations
    (arrays of void are not valid C), and we also cannot provide an int
    return value. Thus there is no way (that I know of) to use this function in
    standards-compliant disembodied C.

    This is not a dealbreaker since we could just reimplement qsort ;).

  • We cannot access the program arguments. The entry point of a program in
    disembodied C looks like this:

    int main(int argc, char *argv[ /* code here ... */ ]) {}
    

    In the VLA length expression for argv, argc is in scope, but argv is
    not.

    Fortunately there is a way to work around this: we can use the common
    extension where main can take a third argument which is a pointer to the
    environment. According to the standard (J.5.1), this is implemented in
    hosted environments. In practice, POSIX environments and Microsoft both
    support it. Then, our entry point looks like this instead:

    int main(int argc, char **argv, char *envp[/* ... */]) {}
    

    Now argv is in scope within the envp array length expression.

    Side note: even though we can’t return, main is the exception to the rule
    that reaching the closing } of a function returning non-void is verboten,
    and is equivalent to returning zero (5.1.2.2.3). If we need to terminate
    with an exit code other than zero, we can use exit().

It seems that with enough effort anything can be done with these peculiar
restrictions. As an example, I wrote implementations of the MD5 and SHA256
hash functions as well as Conway’s Game of Life using SDL for graphics.
I was also working on a more interesting and “useful” program but I kind of
lost motivation halfway through. I’ll update this article if I ever come back
to finish that.

In case it wasn’t clear enough, there is zero practical reason to ever do any
of this. I hope you’ll agree it’s a fun party trick though :).

Read More

Related posts

© Copyright 2022, All Rights Reserved