Back

A Web Server in C

2021-04-21

Table of Contents
  1. Socking it to them
  2. Binding Our Time
  3. You have to be a good Listener!
  4. Accept Yourself
  5. Learning to Read and Write
  6. The C Programming Language

I'm going to try writing a web server in C using only my copy of the C programming language and the man pages on my server as a resource. Let's see how it goes. I'm also going to not look up the http protocol and just go off of my memory.

Let's see what happens!

Socking it to them

The first thing I need to do is open a socket and listen on it. Once I can listen on a socket hopefully everything will start flowing from there.

And immediately I hit a snag. I checked the index for socket and found nothing. Then I looked for network, open and so on and found that the book doens't talk about sockets at all. Shocking to me, I thought sockets would have been dcefinitely in the book.

However I did read the bit about file descriptors and the open function as I know linux will probably treat network handles similar to file hands. I'll need to use the man pages already.

Next problem is I don't know where the C man pages are. i thought they were generic but nope, looks like man socket brings up the perl page. I spammed through all the shells to see if maybe they were hiding the man pages and I ended up reading man man and learning abouit the -a option. I came to the conclusion that my system didn't have the linux syscall man pages.

Now how the hell do I get them. Now I read man yum for the first time... I ended up doing a check for gcc and found nothing of note. I had gcc installed. I then used yum search for manual and man-pages.noarch which very clearly looks like what I want.

Quite a bit of fun hunting for it.

yum install man-pages.noarch

Voila! I have the Linux Programmer's Manual available to me now.

Okay! Now I can maybe start listening...

Great man socket now has 3 pages, 1 Linux, 1 Posix, and 1 Linux.

Yes you read that right.

This looks like its going to be a bit tough to get the flow. At least it does mean alot of reading. I cant just say listen on port 8000 on all interfaces and start reading data. It looks like I need a socket, then I need to bind then listen to it, then read from it.

Reading the man page mentions there is an example in getaddrinfo.

Skipping that for now and slowing working on the socket call. So at this point I got a file handle coming through and after reading the man page I figured I'll also handle the errors.

c
#include <stdio.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <errno.h>

void main() {
    printf("Hello World!\n");

    int fd = socket(AF_INET, SOCK_STREAM, 0);

    if (fd == -1) {
        printf("%d Failed to create socket", errno);
        return;
    }

    printf("Created a socket: %d\n", fd);

}

The constants are all taken from the man page and 0 is the defualt protocol. The socket takes a domain like what area of the internet we want to talk, in our case its ipv4. Then it takes in a type, we want tcp. Finally a protocol which I don't know but is 0.

If the socket fails we get a -1 and we exit early otherwise we get a fd.

This works! We get a fd of 3 which makes sense as sdin, sdout and stderr take 0, 1 and 2. Perfect.

Binding Our Time

Reading the man page for accept makes the flow clear, create a socket, bind it, listen on it, accept it and boom you have a socket to respond on.

Been stuck on the bind as i keep getting error 22 which i can't translate and strace is saying invalid. Not sure how to fix it.

bash
bind(3, {sa_family=AF_INET, sa_data="\350\3\0\0\0\0"}, 8) = -1 EINVAL (Invalid argument)
write(1, "22 Failed to create socket\n", 2722 Failed to create socket

I don't even know what I'm looking at. I'm guessing sa_data is wrong. Which makes sense.

Careful reading of the man page mentioned that there were utilities to convert from ascii to network byte order which was cool and htons which converted a number to network byte order.

I finally felt like I had the family name and port right and actually looked at the size.

The size was 8 and I had feeling it might be wrong. Looked at the man file again. And goddamn it. it was sizeof the struct not sizeof the actual thing I created. I'm guessing the object I made was empty for most of it so sizeof only picked up the part I was using. Not sure, size of the struct was 16 bytes which sounds more reasonable.

c 
#include <stdio.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <errno.h>

void main() {
    printf("Hello World!\n");

    int fd = socket(AF_INET, SOCK_STREAM, 0);

    if (fd == -1) {
        printf("%d Failed to create socket\n", errno);
        exit(EXIT_FAILURE);
    }

    printf("Created a socket: %d\n", fd);

    struct in_addr *a = calloc(0, sizeof(struct in_addr));
    a->s_addr = INADDR_ANY;

    struct sockaddr_in *addr = calloc(0, sizeof(struct sockaddr_in));
    addr->sin_family = AF_INET;
    addr->sin_port = htons(8070);
    addr->sin_addr = *a;

    int result = bind(fd, (struct sockaddr *) addr, sizeof(struct sockaddr_in));

    if (result == -1) {
        printf("%d Failed to create socket\n", errno);
        exit(EXIT_FAILURE);
    }

    printf("Binding socket: %d\n", result);
}

Finally! After hours of slamming into the man pages and reading it more and more carefully. I finally got a binding to happen.

You have to be a good Listener!

Listen looks to be much simpler. All it does is takes in a socket handle and marks it as listening.

bash
#include <stdio.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <errno.h>

void main() {
    printf("Hello World!\n");

    int fd = socket(AF_INET, SOCK_STREAM, 0);
    if (fd == -1) {
        printf("%d Failed to create socket\n", errno);
        exit(EXIT_FAILURE);
    }

    struct in_addr *a = calloc(0, sizeof(struct in_addr));
    a->s_addr = INADDR_ANY;

    struct sockaddr_in *addr = calloc(0, sizeof(struct sockaddr_in));
    addr->sin_family = AF_INET;
    addr->sin_port = htons(8070);
    addr->sin_addr = *a;

    int result = bind(fd, (struct sockaddr *) addr, sizeof(struct sockaddr_in));
    if (result == -1) {
        printf("%d Failed to bind socket.\n", errno);
        exit(EXIT_FAILURE);
    }

    result = listen(fd, 5);
    if (result == -1) {
        printf("%d Failed to listen on socket.\n", errno);
        exit(EXIT_FAILURE);
    }

    printf("Listening on socket: %d\n", fd);

    free(a);
    free(addr);
}

Voila! We are now listening on our socket. Now for the crucial step.

Accept Yourself

This is just weird.

c
    printf("Listening on socket: %d\n", fd);

    struct sockaddr *peer = calloc(0, sizeof(struct sockaddr));
    socklen_t *peer_length = calloc(0, sizeof(socklen_t));

    int peer_fd = accept(fd, &*peer, peer_length);
    if (peer_fd == -1) {
        printf("%d - Failed to accept connection.\n", errno);
        exit(EXIT_FAILURE);
    }

    printf("New Connection  on socket: %d\n", peer_fd);

So accept is pretty straightforward, however th &* is strange. accept is looking for a pointer to some space but when I did a strace it told me that peer by itself was not in user addressable space. When I added the ampersand star however it could reach it. Not sure what's going on here but the fact that accept works is amazing!

Learning to Read and Write

#include <stdio.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <errno.h>

void main() {
    printf("Hello World!\n");

    int fd = socket(AF_INET, SOCK_STREAM, 0);
    if (fd == -1) {
        printf("%d Failed to create socket\n", errno);
        exit(EXIT_FAILURE);
    }

    struct in_addr *a = calloc(0, sizeof(struct in_addr));
    a->s_addr = INADDR_ANY;

    struct sockaddr_in *addr = calloc(0, sizeof(struct sockaddr_in));
    addr->sin_family = AF_INET;
    addr->sin_port = htons(8070);
    addr->sin_addr = *a;

    int result = bind(fd, (struct sockaddr *) addr, sizeof(struct sockaddr_in));
    if (result == -1) {
        printf("%d Failed to bind socket.\n", errno);
        exit(EXIT_FAILURE);
    }

    result = listen(fd, 5);
    if (result == -1) {
        printf("%d Failed to listen on socket.\n", errno);
        exit(EXIT_FAILURE);
    }

    printf("Listening on socket: %d\n", fd);

    struct sockaddr *peer = calloc(0, sizeof(struct sockaddr));
    socklen_t *peer_length = calloc(0, sizeof(socklen_t));

    int peer_fd = accept(fd, &*peer, peer_length);
    if (peer_fd == -1) {
        printf("%d - Failed to accept connection.\n", errno);
        exit(EXIT_FAILURE);
    }

    printf("New Connection  on socket: %d\n", peer_fd);

    char *buff = calloc(0, 20);
    recv(peer_fd, buff, 20, 0);

    char *buffer = "Hello World\n";
    send(peer_fd, buffer, 12, 0); 

    free(a);
    free(addr);
    free(peer);
    free(peer_length);
}

Voila! we have something that can now read from a connection send output back. This is the very beginning of our server finally.

It took about 4 hours of slamming into the man pages to get to here. I'm pretty proud as it felt longer. Especially since the first thing I had to do was get the actual man pages. and Finding out that the book didn't have anything about sockets.

Also all my c syntax is now new knowledge that I haven't used since first year university.

Now that we can read and write, the next step is to get an entire http request. i ran the server and nagivated to the page and voila I can see parts of the request comnig through. Right now I only see the first 20. Here I can do 2 things, i can read in a large chunk and then parse it but I don't always know how big the content is gonig to be. Rather it might be better to instead read character by character and also process the http request while I'm reading. im not sure which way I like as going character by character seems like it would be slow. But it does fit with the idea I read in phiolosofy of software design as I need to go character by character of the http request at some pint. If I do it now, then I should also do all my http parsing here as well.

I unfortunately don't remember the structure of the http request but after taking a glance at a previous http server I wrote I think the dilenation is 2 new line characters signifys the end of the headers and the start of the body. The body is null if there is no content-length field. So the first step is to recv from the socket character by character and respond with something, anything when we get 2 carriage return line feeds.

Side Note - Do the braces start a new closure that will automatically free up arrays?

Side Note - Do I need to use calloc constantly and what happens if I don't.

Side Note - Memset vs calloc, which one should I be using?

side Note - Calloc vs defining a struct as struct x = { 1, 2, 3}, is the calloc implicit?

Side Note - Does C have hash maps?

The C Programming Language

Now that I'm at the point where I'm ready to start parsing I should probably read the C Programming book. The first chapter has a word count routine which I can use to do the parsing where I can keep track of being in a word or not and then creating 2 arrays of headers and values that will act as a hashmap. The problem is this seems kind of inefficnet as if I need a header I need to first search the array for it. Instead it might be good to create a hashmap implementation amybe. If I could stubmle into one.

I'm about half through the book and overall it was really simple stuff. Alot of the fundementals of C are javascript or really it should be that javascript took a lot from C. The biggest thing was static variables and register variables which are cool. Register variables tells the compiler to stick stuff in registers, this is just a suggestion though. The static variables implies that you can define a variable inside a function and then it will remember its value the next time the function is run almost like a closure.

The book also explained that stuff is deallcoated once the final brace is hit so it looks like that works the way I expected. The next couple of chapters are the meat of C, pointers and structs so hopefully they give me a good enough base to get started on working on the http server.

Started reading about pointers and so far pointer arithmetic seems straightforward. The real problem is making sure not to pass the bounds of the array and so far haven't seen anything that talks about that.

You need to allocate the correct number of bytes for a pointer.

I finished up the chapter on pointers and damn do they cover alot. They go into function pointers and and 2 day arrays both of which i think I will be using. They also wrote about writing a recursive descent parser and there's just so much useful stuff but its all so short. They basically cover tokenizing and parsing in a few pages which shows how good the people who made C were. This was the first chapter that really made me feel like there is so much to know and understand.

I can already see how function dictionaries are going to work and how to parse http requests.

The chapter on structs is also really good. Finally understand why -> is used. That is used when we have a pointer to a struct. So given a struct we can do struct.member. We can pas the struct as &struct and then from there use struct->member to go through the address to the value.

The other thing mentioned in this chapter is why malloc returns a void generic pointer and it then gets coerced into the type. I'm curious if this really matters maybe the compilers uses the type information of the pointer for stuff like the sizeof function.

Sizeof is used to get the correct size with gaps for a struct. This is because of alignment issues in memory.

The chapter also goes over trees really quickly. So far it expects you to be familiar with quicksort, parsing logic, trees, recursion and all sorts of more complex things. I wonder if the average developer at the time was so far advanced or if I'm behind. I only started studying this stuff now. I'm guessing a comp sci degree would have been helpfulst.

There is also a table look up implementation. Some of the utilities that end up being written are ls, cat, cp, rm, and a memory allocater. Which I really liked from Dan Luu but its really cool to see what the original was like. There is a ton of code to read in the C book and to play with. I think I can now get started working on the http server. I think a fun project would be get onto a very barebones and empty linux and write everything from scratch. That would be insanely cool.

2021-05-01

Getting started on the http server after almost a couple weeks of taking time off to read the book. I think there is a bool type because vim highlights it but I don't remember reading it in the book so I'm not sure if its real. I wonder if it indeed takes up 1 bit as doing an int to hold just 0 or 1 seems wasteful.

Bool isn't a real type so i used int. True is also not a default so I used 1.

As I work on the server its starting to hit me that I am missing something. I think I need to mnaully write a string handling set of functions. For example I read in a character from the socket and then send that to a buffer. But the buffer is just a pointer to an address with nothing allocated and set. So I add my character and the null character. But I need to malloc some space for the string. Then if I hit the buffer max then I need to double it. There must be something already made with this idea but I can't think of it. So I need to write a few string functions that will malloc space, add to the buffer and check the sizes and re malloc if needed. Once I have this then hopefully I can start building up strings and handling requests

Began writing a String struct to manage allocations. During that trying to print an int as a %s caused a segfault for some reason. Dafuq

Began writing a String struct to manage allocations. During that trying to print an int as a %s caused a segfault for some reason. Dafuq

2021-05-04

Took another few days off from working on the little server but I got the gist of it working. it will now check the current directory for index.html and send that back.

Took my sweet time doing it but really this could be done in a couple days. i'm really happy that I did this without checking the internet. i used only the man pages for C and even the book wasn't as helpful as I thought it would be. The man pages are more than enough to get started with programming C. This was a fun little adventure. The plan is to get javascript and css sent back. At that point I may or may not work adding support for the usual multimedia types.

I'll need to sit on this project to see how it compares to Rust but overall C feels so much more painful in regards to the allocations that I had to think about and deal with. However it was also way easier and feels like a simple language.

C feels very much like something written by humans whereas rust feels mechanical and cold.

2021-05-04

Wired up the stylesheets and javascript and set up a default directory to serve everything out of. As always images and video are now next and are a little bit harder. I'm guessing I need to send it out as bytes as the images contain nulls. I'm guessing I need to escape stuff but I'm not sure how. one idea is stat the file and get the size and then use fread to read everything in one shot. I could then send out the headers first and then the binary blob I just read in.

The above strategy worked perfectly. The code's a bit of a mess and there are abstractions taht I can make that will simplify things but for now i consider my little server up and working. The major piece that I may get to is form post requests. There are quite a few glitches as sometimes the images take a bit long to show up. The other major piece would be to add forking and a function dictionary.