Back

Strings in C

2021-05-03

Table of Contents

    The first problem I ran into, okay, another problem I ran into was that because c required manual allocation of memory it made string handling harder than I was used to. There very may well be a set of String classes that makes things easier but I ended up rolling my own as I couldn't figure out if there was something that automatically made char *s bigger.

    The problem is if I initialize a string for 10 characters and then I need an 11th character, then I need to re-allocate a new memory block and copy over the string. From this idea, I came up with a String struct that keeps track of its size. This feels like a natural progression of having to deal with strings that are getting munged but it's also likely that I had seen some reference to it elsewhere. I think rust called these things fat pointers.

    bash
    struct String {
        char *str;
        int size;
    };

    The struct is really bare bones, the struct will contain a pointer and a size. The pointer will reference some string. This way I can use strlen and also know what the bounds of the memory space are.

    The first thing to write a constructor. I don't think C has classes as that was C++'s claim to fame. From what I can tell this means that the struct needs to go into a function and get changed there. I'm going to miss class style initializers and methods.

    bash
    struct String *new_str(int size) {
        struct String *x = (struct String*) calloc(0, sizeof(struct String));
        x->size = size;
        x->str = calloc(0, size);
        return x;
    }

    The constructor here takes in a size which I'll use when allocating the char array. The first thing however is to allocate some memory for the struct itself. I'm gonig to returning the pointer to the struct.

    This means that the struct needs an allocation for both the pointer and the int. Once I do this, I can set the size on the struct and then set the str variable of the struct to the newly allocated character array.

    There is 2 levels of pointers here. The first pointer I deal with the struct which in turn contains a pointer elsewhere to the actual string.

    Once we have this function, we can write a set string function so we can set up a real string.

    bash
    void set_str(struct String *x, char *buffer) {
        int buffer_length = strlen(buffer);
        if (buffer_length >= x->size-1) {
            char *temp = x->str;
            x->size = buffer_length * 10;
            x->str = calloc(0, x->size);
            free(temp);
        }
        strcpy(x->str, buffer);
    }

    This function takes in a String struct and a character array. It first checks to see what the length of the buffer is. This is because we need to allocate enough space for the buffer and it's future use.

    We check to see what the buffer length is compared to the size of the struct. The size on the struct is what we set on intialization but if our size is too small, we need re-allocate space. This is why inside the if we change size to buffer length times 10 to ensure we have space for the buffer and any future use.

    We then allocate a new block of memory and free the old one.

    We then copy the buffer to the newly allocate buffer.

    Now that we have our set function, I then wrote a function to add a character. This re-uses the logic of the set_str function but now we use strlen to get the last position of the string. The last position is the null character. So when we add a character we will overwrite the null with our character and then set the next position to null. This means that we need to do some bounds checking and re-allocate memory if we need to.

    bash
    void add_char(struct String *x, char c) {
        int last_position = strlen(x->str);
        if(last_position >= x->size-1) {
            char *temp = x->str;
            x->size = x->size * 2;
            x->str = calloc(0, x->size);
            strcpy(x->str, temp);
            free(temp);
        }
        x->str[last_position] = c;
        x->str[last_position+1] = '\0';
    }

    So far this has been some of the more tedious stuff I've worked on. I'm also sure that I'm screwing up memory somewhere but everything does sort of flow from the idea that you need to manually manage memory in C.

    Right now I can do single characters which is useful as I do need to build the string character, but I also need to piece together whole strings so I need a add_str function.

    The first thing to do is to get the sizes of the 2 strings and make sure that we have enough space for both string. If we don't, we do the same thing where we re-allocate a new array and free up the old one. We copy over the original string into our new block.

    Next the logic I use to concatenate the 2 strings is done through a loop. I could have used strcat or even the code example but to be honest I only saw it after I already wrote my version and really the two are pretty close to being the same. This seems to be another function of C where alot of things naturally fall into place as you start doing things.

    bash
    void add_str(struct String *x, char *buffer) {
        int string_length = strlen(x->str);
        int buffer_length = strlen(buffer);
        
        int n = string_length + buffer_length;
    
        if(n >= x->size-1) {
            char *temp = x->str;
            x->size = n * 5;
            x->str = calloc(0, x->size);
            strcpy(x->str, temp);
            free(temp);
        }
    
        int i;
        int j = string_length;
    
        for (i=0; i<buffer_length; i++) {
            x->str[j] = buffer[i];
            j++;
        }
    
        x->str[j] = '\0';
    }

    I really wish I could set i inside the for loop so that i disappears after it is used but thats really a thought about c and not strings.

    Another strange thing is with calloc. I'm expecting every call to calloc to give me a new block of memory initialized to 0 but it looks like I run into a problem where if call calloc it will give me space that is too small. From what I can tell it's giving me a block that I used previously.

    17, 10
    1.0x1657190
    2.0x16571b0
    35, 85
    60, 85
    80, 85
    102, 85
    1.0x16571b0
    2.0x1657190
    Segmentation fault

    Here I printed out the addresses of temp which contains the old address and x->str after i called calloc. The 190 address is the old one but when I get back inside my if statement when the size is 102, it shows me the old address properly but apparently calloc gave me the 190 address. I'm assuming that 190 and 1b0 are next to each other and maybe the 190 address i getting picked up incorrectly. Switching the calloc to malloc works fine though.

    I'm misunderstanding something.

    For now, I switched the string allocations to malloc.