Thursday | 21 NOV 2024
[ previous ]
[ next ]

TCP Can Be Fragmented

Title:
Date: 2023-11-02
Tags:  

Today is a good day to learn something that I should have known. But before I get to that let me set the stage. I've been working on getting uploads working in my web server, SERAPHIM, and I learned quite a bit about how files are uploaded over http.

Locally I was fine, my server was handling everything relatively well. When I deployed to the public server however I ran into some strange issues. Sometimes my upload would work other times it would just stop in the middle. I had attributed this to using ScarletDME on my production server and using UniVerse on my local server.

This pushed me to dig into the ScarletDME code but I had first spent some time porting my big integer logic to zig. Once I got everything wired up there, I decided to start stabbing at the socket code in ScarletDME.

Debugging here made me realize that recv was actually ending early. This was because there is no gaurentee that the packets will come in the correct sizes. I had assumed that when I set the READ.SIZE to 5000 that I would get a set of 5000 byte chunks and a final chunk that was less. This last chunk being less is what would dictate that I had received everything.

This however is not true. recv can read up to 5000, it doesn't mean that all the chunks will be 5000 bytes and that the last will be less. This doesn't really matter for most of the http requests I was already making but for requests where I was uploading much more data, this became an issue. I quickly tested and saw that recv was getting fragments and that I was ending my reads too early.

The solution was to update my webserver to parse out while building the request if there was a Content-Length in the data. This would give me the length of the expected data and now I could use that instead of reading until a read was less than the buffer size.

This required some logic and I also had to add some special logic for UniVerse as the matching is lacking. Everything is now well and my web server has proper support for uploading images and documents.

This was a fun realization and I think highlights why having the source code for something is nice. I was able to dig all the way down to the C code and seeing the recv made me read into it. This is the kind of thing that is hard to pin when you are much higher up in the stack though I'm sure if I refreshed my socket knowledge this would have been obvious.