Create Next App

Most C tutorials start with the same line: #include <stdio.h>, but you don't have to. We answer the question: Is C even usable without a Standard Library (libc)?

The answer is yes, but you have to do some work yourself. We restrict to the scope of competitive programming, such that our bad decisions here do not come back to haunt us. First, lets give some motivation.

Why would you do this?

Consider a simple Hello, World! program.

#include <stdio.h>

int main()
{
    puts("Hello, World!");
}

We then produce an executable using gcc -Wall -O2 helloworld.c -o helloworld. For what it does, we find it includes a lot of sections that aren't relevant to what we're doing.

[aten@machine misc]$ size -A helloworld
helloworld  :
section              size    addr
.note.gnu.build-id     36     848
.interp                28     884
.gnu.hash              28     912
.dynsym               168     944
.dynstr               141    1112
.gnu.version           14    1254
.gnu.version_r         48    1272
.rela.dyn             192    1320
.rela.plt              24    1512
.init                  27    4096
.plt                   32    4128
.text                 281    4160
.fini                  13    4444
.rodata                18    8192
.eh_frame_hdr          36    8212
.eh_frame             116    8248
.note.gnu.property     64    8368
.note.ABI-tag          32    8432
.init_array             8   15824
.fini_array             8   15832
.dynamic              480   15840
.got                   40   16320
.got.plt               32   16360
.data                  16   16392
.bss                    8   16408
.comment               27       0
Total                1917

This "overhead" is justified by the scaffolding that you need for memory management, security features, and runtime initialization that occurs before main() even starts. But what if we weren't concerned with such features? The following C binary did exactly that, and achieves the same thing using just 0.35.

[aten@machine nlctest]$ ./hello
Hello, World!
[aten@machine nlctest]$ size -A hello
hello  :
section              size      addr
.text                 218   4198400
.rodata                15   4202496
.note.gnu.property     48   4202512
.bss                  408   4206656
Total                 689

It's worth noting if we truly wanted the smallest possible binary, we would reach for handwritten assembly. While sites like DMOJ do support NASM x86_64, most competitive programming platforms don't.

Losing the Standard Library

Most of the added sections are a result of linking against glibc. By ditching the standard library, we avoid this altogether. This means we have to implement our own way to do I/O, and find our own way to read ints and strings.

In competitive programming, many believe that manually reading from stdin with getchar() is faster than scanf or cin in C++. In the tips page of dmoj.ca, we find the following snippet.

Finally, if the problem only requires unsigned integral data types to be read, you can prepend this macro to the top of your source:
#define scan(x) do{ \
    int _; \
    while(((x)=getchar()) < '0' && (x) != -1); \
    if((x) != -1) { \
        for((x)-='0'; '0' <= (_=getchar()) && _ <= '9'; (x)=10*(x)+_-'0'); \
    } \
} while(0)

This suggests we only need to provide an implementation of getchar(). After that, we may implement our own custom logic for reading in negative integers, floats, etc.

To do so, we implement a syscall wrapper, syscall3 to adhere to DRY (don't repeat yourself). Our environemnt is x86_64 linux, so the following suffices:

static long syscall3(long number, long arg1, long arg2, long arg3) {
    long ret;
    __asm__ volatile (
        "syscall"
        : "=a" (ret)
        : "0" (number), "D" (arg1), "S" (arg2), "d" (arg3)
        : "cc", "rcx", "r11", "memory"
    );
    return ret;
}

From there, a quick look at the syscall table has us arrive at the following.

#define SYS_read  0

int getchar(void) {
    static unsigned char buf;
    long res = syscall3(SYS_read, 0, (long)&buf, 1);
    if (res <= 0) return -1;
    return buf;
}

Note that in a real implementation, we would use a larger buffer and only call make the syscall when our buffer is empty. Without such an optimization, our implementation may actually be slower than scanf.

Identically, we can easily implement putchar, pu (print unsigned integer). Lets skip that for now, and implement a simple Hello World! program. Link to the full program.

...

int main(int argc, char *argv[]) {
    char *s = "Hello, World!\n";
    for (int i = 0; i < 14; i++) {
        putchar(s[i]);
    }
}

Unfortunately, we get a segfault, with or without the return 0;.

[aten@machine nlctest]$ ./hello
Hello, World!
Segmentation fault         (core dumped) ./hello

We are very used to glibc taking care of entry/exit of main. Note that on many competitive programming platforms (citation needed), partial marks/passing is granted despite having UB/segfaults. But for completeness, we will handle this.

The culprit is that when main exits, the RIP (next instruction) pointer is popped off from a stack that libc isn't managing, resulting in the stack pointer going somewhere it's not allowed to. To keep main clean, we define a _start as the true starting point of the program, within which we call main and exit gracefully with a syscall.

void _start() {
    int ret = main(0, 0);
    syscall3(SYS_exit, (long)ret, 0, 0);
}

Since we are not so concerned about the additional features gcc has, to further reduce the size of the binary, we use the following Makefile:

CC      = gcc

CFLAGS  = -Os -fno-asynchronous-unwind-tables -fno-stack-protector \
          -fno-ident -ffreestanding -nostdlib -static \
          -Isysroot/include

LDFLAGS = -s -Wl,--build-id=none -Wl,--no-dynamic-linker

hello: hello.c
	$(CC) $(CFLAGS) $(LDFLAGS) -o hello hello.c
	@strip -s hello
	@ls -lh hello

clean:
	rm -f hello

One potential optimization is to ditch our putchar altogether, and simply print using a syscall.

void _start() {
    const char msg[] = "Hello\n";
    syscall3(1, 1, (long)msg, 6);
    syscall3(60, 0, 0, 0);
}

Both approaches gives us our tiny binary, as desired.