Thursday | 21 NOV 2024
[ previous ]
[ next ]

Chumsky

Title:
Date: 2024-11-09
Tags:  

Table of Contents

  1. Tutorial

I'm slowly learning with the eventual goal of creating my own version of Pick. I could probably jumped right into it but I'd bet that I'm going to make a big mess. Hopefully this stuff that I'm doing to get to that point is helping. We'll see how it shakes out.

This post will be about chumsky which is a parser library that should make writing parsers much easier. This will get rid of the need to handroll my own parsers which I may do but not quite yet. I think my first step should be to start messing with languages from a higher abstraction.

There's a tutorial for this package so I'm going to give it a shot here and write some notes.

Tutorial

The first step is to create the rust project:

cargo new --bin foo

Next we update Cargo.toml with:

[dependencies]
chumsky = "0.9.3"

We then update main.rs:

use chumsky::prelude::*;

fn main() {
    let src = std::fs::read_to_string(std::env::args().nth(1).unwrap()).unwrap();
    
    println!("{src}");
 }

This will take in a file path from the command line and display the contents. This uses a slightly different way to access the arguments and it hard unwraps things. This is different than the rust book but this is being written as an example to show chumsky rather than to explain rust so it makes sense.

The first step is to define the various Abstract Syntax Trees we need the parser to create. This will look like the following:

enum Expr {
   Num(f64),
   Var(String),
}

The first thing we define is the numeric AST which will be just a floating number and we also define variables. These are leaf nodes and so they are the primitives that everything ultimately will end up.

Other ASTs will be recursively designed because they can hold sub expressions.

Here are the binary operators, these go inside the same enum:

enum Expr {
   ...
   
    Neg(Box<Expr>),
    Add(Box<Expr>, Box<Expr>),
    Sub(Box<Expr>, Box<Expr>),
    Mul(Box<Expr>, Box<Expr>),
    Div(Box<Expr>, Box<Expr>),
}

Reading the book has let me understand what Box is doing and why it's needed. You can set it to Expr directly because the compiler can't size Expr because it's recursive. This would result in an infinite allocation. Box will make Expr heap allocated and the resulting address will instead be stored.

Next we have the definitions of calling functions, assigning variables and creating functions:

    Div(Box<Expr>, Box<Expr>),
    
    Call(String, Vec<Expr>),
    
    Let {
        name: String,
        rhs: Box<Expr>,
        then: Box<Expr>,
    },
    
    Fn {
        name: String,
        args: Vec<String>,
        body: Box<Expr>,
        then: Box<Expr>,
    }

These are a bit more complex. These also look to be linked lists with the then clauses.

These are all the syntax tress that need to now get built.