Thursday | 31 OCT 2024
[ previous ]
[ next ]

A Gemini Client in Rust

Title:
Date: 2020-11-09
Tags:  

The Specification

Hello! In this series we're going to build a Gemini client. What is Gemini you ask? Before we look at Gemini it's helpful to understand it's beginnings.

It was designed as a souped up version of Gopher which in turn is a way to share files across the internet. These files are usually people's thoughts and ideas, not movies and pictures.

You can also start at chapter 3 as the first 2 chapters are really just set up, it should be fine to start coding first and then come back to read chapter 1 and 2.

History

Gopher came out of the University of Minnesota at roughly the same time as when the WWW (World Wide Web) came out of CERN. Both Gopher and WWW (today known as the web and in general the internet) are protocols to send and request files from servers. Both go over TCP, Gopher goes over port 70, WWW goes over port 80. Gopher is a very simple protocol with no extensibility whereas WWW (also known as HTTP) had extensibility built into which in turn is probably one factor in why WWW is so popular now. So popular that the internet and the World Wide Web are often seen as the same thing! In reality though the internet is so much more, the internet could be thought of as the road in cities and the protocols such as WWW and Gopher are just cars that use those roads. Many things can use those roads, we have e-mail, FTP, SSH, telnet and more that all go over the internet in addition to WWW.

Now the World Wide Web has taken over as the main source way people interact with the internet but there are other spaces that we can visit and take part in. Gopher is one of them, and in my opinion quite an interesting one. Today, no one can build a browser by themselves and take part on the WWW. But! Gopher is so simple a protocol that you can build your own client and interact with gopher space completely under your own power. There is something quite seductive about that.

I wrote a series before outlining how I built a Gopher client but I was muddling my way through the code so couldn't write down the steps. In this series I hope to be able to go a bit slower with less muddling and hopefully be able to show you how to build a client for one of these spaces that is so interesting to me.

Gemini is a simple protocol, designed so that the entire spec will fit in your head and a client and server implementation can be made in a weekend. This is very much in the vein of Gopher, and the Gopher spec made it clear that simplicity was king. This is true for Gemini as well.

You can find more information on the Gemini website.

https://gemini.circumlunar.space/

With that being said, I think you should go directly to the source and read Solderpunk's (the creator of the protocol) phlog (gopher blog) posts on Gemini on zaibatsu.circumlunar.space.

> telnet zaibatsu.circumlunar.space 70
Trying 167.88.113.156...
Connected to circumlunar.space.
Escape character is '^]'.
/~solderpunk/gemini

Using Gopher!

Just a quick intro because it is quite fun!

We telnet directly to the server on port 70.We enter the selector string /~solderpunk/gemini to navigate to Solderpunk's gopherhole. Had we not known the selector string we can hit enter and we will get a list of gopher items, and the selector strings for them.

0(2019-04-07) More on gopher and crypto /~solderpunk/phlog/more-on-gopher-and-crypto.txt        zaibatsu.circumlunar.space                         70
0(2019-03-31) Why gopher needs crypto   /~solderpunk/phlog/why-gopher-needs-crypto.txt  zaibatsu.circumlunar.space     70
0(2019-03-03) Pondering what's inbetween gopher and the web     /~solderpunk/phlog/pondering-whats-inbetween-gopher-and-the-web.txt                zaibatsu.circumlunar.space       70

After we entered the above selector we should see listing like above. It is quite messy but that's because we are using straight telnet. A gopher client would clean this up drastically.

Now Gopher has no concept of keeping connections open, once the request and response cycle is finished, the connection is closed immediately. This means we'll need to do another telnet again with the new selector string.

So to read the first phlog post Solderpunk made, we would do the following commands.

> telnet zaibatsu.circumlunar.space 70
Trying 167.88.113.156...
Connected to circumlunar.space.
Escape character is '^]'.
/~solderpunk/phlog/pondering-whats-inbetween-gopher-and-the-web.txt

We telnet into the server, and then we pass in the selector we found in the above listing, then we would get the text file the selector references and can read the document! A very strange way to browse the internet eh (This is why I built a client....) :)

Gemini

Now this was a long way to get to the actual point. Gopher is a very simple protocol and WWW is a very complex protocol. Gemini seeks to exist in the middle which is a very interesting idea!

Now before we begin I need to mention something that I find quite funny. You would think Gemini is pronounced Gemini but it's actually pronounced Gemini. Joking! We, at least I, pronounce Gemini, Gemini ending with the 'eye' sound. NASA pronounces it Gemini, Gemini ending a knee sound. This is quite an interesting thing to me for some reason and I can help but smiling thing about it.

On with the show!

I've made some notes on the Gemini spec and I'm going outline them here. Feel free to leave comments about anything I've missed. We'll be using these notes to write our client, after all it is a protocol and a protocol is just a bunch of rules on how data is transferred and processed.

Let's get started!

  1. The server will close the connection immediately after sending a response. There is no connection reuse.
  1. Requests made by the client are simply the URL, either absolute or relative, followed by a CRLF (Carriage Return Line Feed, rn)
  1. URLs are prefixed with gemini:// and this is also the default for if the URL doesn't have a prefix already. Other URL prefixes such as http:// and https:// can also be sent and the server may act as a translator.
  1. Responses made by the server are a single header line, with a CRLF. This can be followed by a response body if the header line indicates success. This is the only time a response body is sent.

Statuses

  1. The header line will be followed by a single space, followed by a string which can be a error message or the document type of the response body, which is followed by CRLF.
20 text/plain; charset=utf-8
<RESPONSE BODY formatted as plain text using UTF-8>
40  Failed to Retrieve Page

Here we have 2 examples, the first which is successful and will also have a response body and the other is an error with a single header line.

  1. Status codes are 2 digits, but a single digit is enough to know the error, the second digit just adds a little bit more information for logging purposes.
  1. If the status is not 20, success, the server must not send a response body.
  1. There are 6 status types, because we are building a basic client, we'll only worry about the standard statuses.
  • Status of 10 - INPUT, the META text is a prompt that should be shown to the user and when they enter their response, the selector should get a query string attached to it.

If we requested gemini://test.gemini/hello-world and received the below response

10 What is your name?

And the user responded with Alice.

We would then make a request for gemini://test.gemini/hello-world?name=Alice

  • Status of 20 - SUCCESS, A response body will follow the header line, the META string will be the mime type of the document.
  • Status of 30 - REDIRECT, The meta string is the new selector string the client should use to follow the redirect.
  • Status of 40 - TEMPORARY FAILURE, Something has gone wrong
  • Status of 50 - PERMANENT FAILURE, 40 and 50 can be combined and we can show just a generic error message and the meta string. The meta string will be error message the server wants to show the client.
  • Status of 60 - CLIENT CERTIFICATE REQUIRED - This error we'll leave out for now as it functions as a way for a client to generate a session in Gemini. I'm not entire sure how this is set up yet but I'll come back and fill this in afterwards. This isn't required for basic clients so we can safely ignore this.
  1. Response bodies can be anything, the meta string in the header is what determines what happens. Basic clients can get away with only dealing with text/* and everything else can get saved to the disk. For now that is what we'll implement but you can make it so that if the mime type is an image, to open the data in an image viewer.
  • Text response bodies can use both CRLF(rn) and LF(n) as line breaks and the client should handle both of them.

Certificates

10. TLS is mandatory! This one is maybe my least favorite option in the entire spec. TLS is the one place we will need to rely on a dependency. With Gopher, everything was under our control as programmers, we could build the client with 0 dependencies, this however isn't true for Gemini. But I get the reason for it! (I think...)

11. TLS must be 1.2 or higher with 1.3 being recommended! Not sure what this means quite yet but hopefully whatever crate we end up using has sane defaults.

12. Clients can handle certificates in any way, including ignoring them! The recommended way is to use TOFU certificate pinning, meaning when we first connect to a server, we will assume that that certificate is the real certificate and will save that and its expiry date on our machine. From that point on, we will validate our connection to the server using that certificate. TOFU stands for Trust on First Use. If for some reason the certificates stop matching and the expiry date on the saved certificate hasn't passed yet, then a warning should be shown to the user.

13. Client certificates can also generated by the client and used to communicate with a server. It works the same way as TOFU except the flow is the other way. This way the client has a way of identifying itself to the server and this allows for sessions to be set up. We're going to leave this out of our basic client but I think it would be quite interesting to implement.

Document

14. Just as HTTP/WWW has HTML as it's default format, Gemini uses text/gemini as it's default.

15. text/gemini is a very barebones format and the meta string can optionally contain the encoding charset and it can optionally contain the language parameter.

text/gemini; charset=utf-8; lang=en

16. Links must go on their own lines and the first 3 characters of a line will dictate their type

17. The first line type is link, any line starting with => is a link and should be rendered as such.

=>URL This is the Link Text
=> URL This is the Link Text
=>   URL      This is the Link Text

The URL must be after the link signifier and can be surrounded by any whitespace. This works because the space isn't allow in URLs and so we can use whitespace to get the link and then everything after it will be the text that we want to show the user.

18. There are also preformat toggles which will cause lines in between the toggles to be displayed without any styling. Luckily because we'll be writing a terminal program, almost everything will look the same anyway!

One thing to note is anything after the leading preformat toggle on the same line as the toggle should not be displayed to the user.

19. We also have [] [[">[#] ### to mark headers but because we are doing a very basic client] we'll just show these lines as usual">[] ### to mark headers but because we are doing a very basic client">[">[#] ### to mark headers but because we are doing a very basic client] we'll just show these lines as usual] had we done something a little more complex">[## to mark headers but because we are doing a very basic client">[">[#] ### to mark headers but because we are doing a very basic client] we'll just show these lines as usual">[] ### to mark headers but because we are doing a very basic client">[">[#] ### to mark headers but because we are doing a very basic client] we'll just show these lines as usual] had we done something a little more complex] we could add coloring or font size changes.

20. Lines starting with * are lists.

21. Lines starting with > are quotes.

22. Lines with nothing on them are blank lines and should be rendered as blanks, multiple blanks shouldn't be collapsed into a single a blank line.

23. Lines starting with anything else are regular lines!

and with that, we are done! We have the gist of the specification outlined now. The key point is that because we are doing a basic client, we can safely ignore a great deal of things, but we also have the option of adding complexity. This I think is quite valuable, everything is very much opt in rather than being forced to do things a certain way.

Alright! In the next chapter we can start coding something up.

A TLS Side Story

Hello! Before we start our little tangent, I do have a caveat. TLS and security is important! Our data is flying everywhere across the internet and TLS (also known as SSL) is what keeps it safe.

This chapter is completely optional, and it may be better to move on to chapter 3, get some context and then come back to read this. This chapter really exists because the rustls crate doesn't have working examples and I wanted to have a place I could refer back to, to get working snippets to play with.

https://docs.rs/rustls/0.18.1/rustls/

I am going to outline 4 examples in total. The first example will be my own explanation and code of the TLS process just to make it clear what is happening when we set up TLS.

The next 2 examples will be working examples from the rustls docs page.

The last example will be what we will ultimately be using in our Gemini client. This is where we will remove the verification of the TLS certificates.

Feel free to try out the examples in a new project.

cargo new tls-test

1. The Flow Example

Before we get started, we need to include some crates.

./Cargo.toml

[dependencies]
rustls = version="0.18"
webpki = "0.21"
webpki-roots = "0.20"

Now that we have everything we need, we can get started on the code.

./main.rs

use std::sync::Arc;
use std::net::TcpStream;
use std::io::{Write, Read};
use rustls::Session;

fn main() {
    let mut config = rustls::ClientConfig::new();
    config.root_store.add_server_trust_anchors(&webpki_roots::TLS_SERVER_ROOTS);

    let rc_config = Arc::new(config);
    let google = webpki::DNSNameRef::try_from_ascii_str("google.ca").unwrap();

    let mut client = rustls::ClientSession::new(&rc_config, google);

    let request = b"GET / HTTP/1.1\r\nHost: google.ca\r\nConnection: close\r\n\r\n";

    let mut socket = TcpStream::connect("google.ca:443").unwrap();

    println!("1. Request TLS Session");
    client.write_tls(&mut socket).unwrap();

    println!("2. Received Server Certificate");
    client.read_tls(&mut socket).unwrap();

    println!("3. Check certificate");
    client.process_new_packets().unwrap();

    println!("4. Write out request");
    client.write(request).unwrap();

    println!("5. Encrypt request and flush");
    client.write_tls(&mut socket).unwrap();

    println!("6. Decrypt response");
    client.read_tls(&mut socket).unwrap();

    println!("7. Check certificate");
    client.process_new_packets().unwrap();

    println!("8. Read data");
    let mut data = Vec::new();
    client.read_to_end(&mut data).unwrap();

    println!("9. Data");
    println!("{}", String::from_utf8_lossy(&data));
}

The first line sets up our configuration object that will act as our verification.

In this configuration object we will add out root certificates, in this case it comes from crate webpki_roots. This is a certificate store generated from Firefox.

Next, we make sure our url, "google.ca" is valid and create a DNSNameRef for it.

We then setup a ClientSession object.

We construct a valid HTTP request. Host and Connection are mandatory in this case as we do need the server to close the connection for us to know when a response is really done. We don't want to calculate the content length and such information!

Next we set up a socket.

At this point we have 2 things in our little example, we have a TLS session object and we have our socket that we write to. The next part will muddy the two together!

The first step is to trigger TLS, the client needs to connect to the server on port 443. This is what the first write_tls call does.

The second step we need to do is read_tls, we need to read the server's TLS certificate.

process_new_packets will panic if the certificate for some reason is incorrect or doesn't match what's in the certificate store, webpk_roots. This function is what verifies our TLS connection.

Now that we have verified the server's certificate, we can write out the request we constructed before.

We then flush this by doing another call to write_tls. This is where the request would get encrypted and sent to the server.

Next we call read_tls as the response the server sends to us will be encrypted.

We call process_new_packets again this will decrypt the data.

Now we can read the data from the TLS session into a vector.

We then convert the vector of u8s into a string and voila! We have used TLS to make a request and got the response!

The next example is very much the same as this example except it uses a loop to get around having this meaning separate read and writes.

The other thing to note is that we use our TLS session (our client object) to do all of our write and reads, the socket exists but we don't interact with it directly because we need the TLS session to encrypt and decrypt the data.

In the third example we will see the combining of the 2, client and socket behind one interface that we can use much more easily. But before we get there, let's look at the rustls main example.

2. The rustls Example

./main.rs

use std::sync::Arc;
use std::net::TcpStream;
use std::io::{Write, Read};
use rustls::Session;

fn main() {
    let mut config = rustls::ClientConfig::new();
    config.root_store.add_server_trust_anchors(&webpki_roots::TLS_SERVER_ROOTS);

    let rc_config = Arc::new(config);
    let google = webpki::DNSNameRef::try_from_ascii_str("google.ca").unwrap();

    let mut client = rustls::ClientSession::new(&rc_config, google);


    let request = b"GET / HTTP/1.1\r\nHost: google.ca\r\nConnection: close\r\n\r\n";

    let mut socket = TcpStream::connect("google.ca:443").unwrap();

    client.write(request).unwrap();

    loop {
        if client.wants_write() {
            client.write_tls(&mut socket).unwrap();
        }
        if client.wants_read() {
            client.read_tls(&mut socket).unwrap();
            client.process_new_packets().unwrap();

            let mut data = Vec::new();
            client.read_to_end(&mut data).unwrap();
            if data.len() != 0 {
                println!("{}", String::from_utf8_lossy(&data));
                break;
            }
        }

    }
}

This is the example from the rustls docs page. I made one change here which is that I break the loop after we get one piece of data from the server, otherwise we'd be in an infinite loop in a connection that had already ended.

The break isn't needed had we kept the connection type as something like keep-alive.

The loop removes the need for the explicit labeling of steps that the flow example had!

3. The Stream Example

./main.rs

use std::sync::Arc;
use std::net::TcpStream;
use std::io::{Write, Read};

fn main() {
    let mut config = rustls::ClientConfig::new();
    config.root_store.add_server_trust_anchors(&webpki_roots::TLS_SERVER_ROOTS);

    let rc_config = Arc::new(config);
    let google = webpki::DNSNameRef::try_from_ascii_str("google.ca").unwrap();

    let mut client = rustls::ClientSession::new(&rc_config, google);
    let mut socket = TcpStream::connect("google.ca:443").unwrap();

    let mut stream = rustls::Stream::new(&mut client, &mut socket);

    let request = b"GET / HTTP/1.1\r\nHost: google.ca\r\nConnection: close\r\n\r\n";

    stream.write(request).unwrap();
    let mut data = Vec::new();
    stream.read_to_end(&mut data).unwrap();
    std::io::stdout().write_all(&data).unwrap();

    println!("{:?}",String::from_utf8_lossy(&data));

}

This example uses the Stream option from rustls which marries our socket and our client together and gives us one easy interface to use!

Much better!

4. TLS but No TLS!

Finally let's look at removing TLS verification completely. We'll have our connection and data go over TLS but won't validate anything.

This is dangerous and this feature is gated behind both a crate feature flag and is also gated with in the code itself. Let's start enabling this!

To enable dangerous mode for our TLS we need to update our Cargo.toml.

./Cargo.toml

[dependencies]
rustls = { version="0.18", features=["dangerous_configuration"] }
webpki = "0.21"

Now that we have dangerous_configuration added to our Cargo.toml file we can begin working on our main function.

./main.rs

use std::sync::Arc;
use std::net::TcpStream;
use std::io::{Write, Read};
use rustls::{RootCertStore, Certificate, ServerCertVerified, TLSError, ServerCertVerifier};
use webpki::{DNSNameRef};

struct DummyVerifier { }

impl DummyVerifier {
    fn new() -> Self {
        DummyVerifier { }
    }
}

impl ServerCertVerifier for DummyVerifier {
    fn verify_server_cert(
        &self,
        _: &RootCertStore,
        _: &[Certificate],
        _: DNSNameRef,
        _: &[u8]
    ) -> Result<ServerCertVerified, TLSError> {
        return Ok(ServerCertVerified::assertion());
    }
}

fn main() {
    let mut cfg = rustls::ClientConfig::new();
    let mut config = rustls::DangerousClientConfig {cfg: &mut cfg};
    let dummy_verifier = Arc::new(DummyVerifier::new());
    config.set_certificate_verifier(dummy_verifier);

    let rc_config = Arc::new(cfg);
    let google = webpki::DNSNameRef::try_from_ascii_str("google.ca").unwrap();

    let mut client = rustls::ClientSession::new(&rc_config, google);

    let request = b"GET / HTTP/1.1\r\nHost: google.ca\r\nConnection: close\r\n\r\n";

    let mut socket = TcpStream::connect("google.ca:443").unwrap();

    let mut stream = rustls::Stream::new(&mut client, &mut socket);

    stream.write(request).unwrap();

    let mut data = Vec::new();
    stream.read_to_end(&mut data).unwrap();
    println!("{}",String::from_utf8_lossy(&data));

}

Our original ClientConfig object used the certificate store from webpki_roots, now that we are removing that dependency, we need to replace it with something. In this case we're going to write our certificate validator.

This is where the DangerousClientConfig comes in. We use this struct to mutate our cfg object so that we can change the default certificate verifier.

We do this by first creating a struct, DummyVerifier which will contain nothing but an override for ServerCertVerifier. We will write this trait to do nothing but simply accept the certificate. We then use set_certificate_verifier to change the certificate verifier on our config object.

With that! We have our TLS session now using the certificate verifier that we wrote. In this case it will simply say that every certificate is valid. Which is pretty dangerous but we're going to be needing it for our Gemini client.

The Gemini spec allows the client to do whatever it wants with the certificate so it is valid to just ignore it but the recommendation is to do TOFU pinning. We may end up implementing this in a future chapter but because we are building a very basic client, for now just being able to ignore the server's certificate is enough.

Gemini & rustls

This next part is very flukey, I see what's going on but I don't understand what's going on. This unfortunately is also the code we're going to be using in our client which is a bit frustrating.

There is an updated version that I'm much happier with in Chapter 4 - The Visit Function but I used this as the base for it so I'll leave this as is. You should be able to adapt the below code and use the improvements in the later chapter together.

use std::sync::Arc;
use std::net::TcpStream;
use std::io::{Write, Read};
use rustls::Session;
use rustls::{RootCertStore, Certificate, ServerCertVerified, TLSError, ServerCertVerifier};
use webpki::{DNSNameRef};

struct DummyVerifier { }

impl DummyVerifier {
    fn new() -> Self {
        DummyVerifier { }
    }
}

impl ServerCertVerifier for DummyVerifier {
    fn verify_server_cert(
        &self,
        _: &RootCertStore,
        _: &[Certificate],
        _: DNSNameRef,
        _: &[u8]
    ) -> Result<ServerCertVerified, TLSError> {
        return Ok(ServerCertVerified::assertion());
    }
}

fn main() {
    let mut cfg = rustls::ClientConfig::new();
    let mut config = rustls::DangerousClientConfig {cfg: &mut cfg};
    let dummy_verifier = Arc::new(DummyVerifier::new());
    config.set_certificate_verifier(dummy_verifier);

    let rc_config = Arc::new(cfg);
    let google = webpki::DNSNameRef::try_from_ascii_str("gemini.circumlunar.space").unwrap();

    let mut client = rustls::ClientSession::new(&rc_config, google);

    let request = b"gemini://gemini.circumlunar.space/servers/\r\n";

    let mut socket = TcpStream::connect("gemini.circumlunar.space:1965").unwrap();

    println!("1. Request TLS Session");
    client.write_tls(&mut socket).unwrap();

    println!("2. Received Server Certificate");
    client.read_tls(&mut socket).unwrap();

    println!("3. Check certificate");
    client.process_new_packets().unwrap();

    println!("4. Write out request");
    client.write(request).unwrap();

    println!("5. Encrypt request and flush");
    client.write_tls(&mut socket).unwrap();

    println!("6. Decrypt response");

    loop {
        while client.wants_read() {
            client.read_tls(&mut socket).unwrap();
            client.process_new_packets().unwrap();
        }
        let mut data = Vec::new();
        let _ = client.read_to_end(&mut data);
        println!("{}", String::from_utf8_lossy(&data));
    }
}

From what I can tell, the interaction of the server and rustls is a bit wonky. rustls doesn't seem to be waiting for responses from the server and so we need to be sitting inside a loop checking for data. This is why we let read_to_end come back with nothhing, if it does, we simply continue with our looping.

Something noteworthy here is that when we do go to use this logic, we'll need to somehow know when to break our loop but let's deal with that when we get there.

Hopefully these examples can shed some light on how rustls works and I may in the future sit down and see if I can understand what's going on in with the sockets. Likely the socket interface itself is non-blocking and it's confusing me but I had thought it was blocking.

Alright! Our little side adventure is over and now we can actually start working on our code!

Getting Started

Now back to our regularly scheduled content.

Hello! Chapter 2 was completely optional so we don't really have anything to recap. This chapter we will get Rust set up and get the smallest kernel of our client working.

A note is that we are going to be writing all our code in one file, src/main.rs. This is because I find it easier to hold everything in my head if it is all in one file. Once you have a handle on the concepts, you can split the code into logical files. In my eyes, TLS, and parsing could be easily split up but for now let's stick with one file for everything!

Let's get started!

Installing Rust

Rust, like many other utilities, installs via a shell script that we execute. However you shouldn't trust every shell script execute command on the internet so let's go directly to the rust site to install rust.

https://www.rust-lang.org/tools/install

On that page, you will get a curl command which will get piped to sh which will install rust onto your computer.

Of course, this is probably only true for Linux distributions. I personally use Windows Subsystem for Linux which I really like. The polish of windows with the programming environment of linux.

! Good luck with the installation, sometimes this step is the hardest to get working.

Once we have rust installed, we can verify our installation by printing our version.

> rustc --version
rustc 1.47.0 (18bf6b4f0 2020-10-07)

IDE

Unfortunately I don't have any knowledge about GUI IDEs for Rust. You may want to search for a rust IDE and/or look into using VSCode as it may have a plugin for the rust language server.

The language server is what allows an IDE to tell us all the errors we're making before we compile. Extremely valuable! Especially in the case of using rust via the Windows Subsystem for Linux as compile times are slow on it.

I use vim with YouCompleteMe with the rust language server option. I find that this works really well. I really like vim. (Remember this, you will be tested!)

  • Side note - the fish shell is quite fun to use so if you are in the mood to try a new shell, I would recommend giving it a shot.

https://jvns.ca/blog/2017/04/23/the-fish-shell-is-awesome/https://fishshell.com/

Getting Started

Alright! Hopefully you have rust installed and it went smoothly. Let's jump straight into starting out application.

> cargo new gem-client
     Created binary (application) gem-client package

We're going to create a new rust project using cargo and we've called it gem-client.

  • I did it look at the Gemini Wikipedia page to steal a name but I didn't find anything I liked. Closest was 'gusmobile' which is what the NASA guys called the Gemini capsules, named after Gus Grissom.

Now let's make sure everything works!

> cd gem-client/
> cargo run
   Compiling gem-client v0.1.0 (/home/nivethan/bp/gem-client)
    Finished dev [unoptimized + debuginfo] target(s) in 1.21s
     Running target/debug/gem-client
Hello, world!

We first cd into our project, and then we will build and run our project with cargo run.

Voila! Our started application has compiled and run. Feel free to take a peek at src/main.rs as that is what we just compiled, and the file we will be editing through out our little adventure.

Making Life Easier

Before we move to writing some code, we have one quality of life improvement that I would say is on par with syntax highlighting - code recompiling on change! (Okay maybe a bit hyperbolic, I can without auto recompiles)

Currently if we write any code, we would have to manually recompile to see our changes. We're going to install cargo-watch and use that to do our recompiling.

> cargo install cargo-watch

Once cargo watch is installed, we can trigger it by the following commands. You will need a shell open and sitting in the correct directory for this step. I have 2 shells open, one shell contains vim, the other contains cargo watch.

/path/to/gem-client

> cargo watch -x run
[Running 'cargo run']
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running target/debug/gem-client
Hello, world!
[Finished running. Exit status: 0]

Now we can write some code and cargo will immediately recompile and run our code every time we save. Wonderful!

Our Gemini Client

Now, now, now we can write some code, I promise.

We are going to write the kernel of our client and in the following chapters, we'll add so much stuff on top that you'll be surprised that we started so small!

./src/main.rs

use std::io;
use std::io::{Write};

fn main() {
    let prompt = ">";
    loop {
        print!("{} ", prompt);
        io::stdout().flush().unwrap();

        let mut command = String::new();
        io::stdin().read_line(&mut command).unwrap();
        println!("{}", command.trim());
    }
}

Our Gemini client is going to be a command line program and so we don't want it to exit after just visiting a gemini space, we want to stay in out client as we may have other things to do.

This is why we have an infinite loop going. We begin first by including some crates we will be using. io stands for input/output and is part of rust's standard library, std. We also need to bring in the Write option from the io crate specifically as we will be using it to do the flush().

Next we start our main function. This is the function that will get executed when our binary file is executed.

The first step is to set up some sort of prompt. For now it will just be the angle bracket. Next we start out infinite loop.

Inside our loop, we use the print! macro because we want to keep the cursor on the same line. The println! macro would append a n to whatever we print so the cursor would be one line below our prompt.

You can test this out by changing print! to println!.

Next we need to flush the buffer, I'm not sure why this is the case, it appears print! simply places data in the output buffer but doesn't write to the screen. It may be n is a flushing character and because print! doesn't add it, we need to manually flush the output.

Either way! Once we flush it we can then see it on screen.

The next 2 commands are related, this is how we get input from the user. We will read in the standard input and have it write to command. When the user hits enter, the input buffer is processed and we can then move to the final line in our code.

The final line of our little loop simply writes out what the user typed in. We use the trim() function to remove the n the command came in with.

! There we have it! The very essence of our client. It doesn't do anything yet but we'll get there.

In the next chapter we will look at responding to our users input and doing things based off of it. We'll also be writing the core of our Gemini client in the next chapter - the visit function!

The Visit Function

Hello! We currently just wrote our infinite loop to process user command. In this chapter we will implement 2 commands, we will implement the quit option so user can leave our client, and we will write the visit function which will allow our users to visit gemini space!

Let's jump right in.

Quitting

Pack it up, tutorial is over friends!

Bad joke, moving on currently we print out our command to the screen, now we're going to do some very light tokenization. Tokenization means to make the command into sub pieces and we do different things based on the pieces. Luckily our client is going to be simple, we will split on white space and the resulting array will be our tokens.

./src/main.rs

use std::io;
use std::io::{Write};

fn main() {
    let prompt = ">";
    loop {
        print!("{} ", prompt);
        io::stdout().flush().unwrap();

        let mut command = String::new();
        io::stdin().read_line(&mut command).unwrap();

        let tokens: Vec<&str> = command.trim().split(" ").collect();
        println!("{:?}", tokens);
    }
}

We take our command and do some very light processing. Real tokenization will go character by character and will have rules on how to split a line into sub pieces. We are getting away with splitting on white space because we are writing a basic client.

To learn more, I highly recommend the following book.

http://craftinginterpreters.com/

I used TypeScript for the first half of the book, whereas the author used Java.

Back to Rust! We first trim the command to remove the newline character, we then split on white space. At this point the split function has returned an iterator. We then collect everything and the result is a vector of strings.

Now for another rule. Our first token will dictate what we will do, so all we need to do now is match against it.

./src/main.rs

use std::io;
use std::io::{Write};

fn main() {
    let prompt = ">";
    loop {
        print!("{} ", prompt);
        io::stdout().flush().unwrap();

        let mut command = String::new();
        io::stdin().read_line(&mut command).unwrap();

        let tokens: Vec<&str> = command.trim().split(" ").collect();

        match tokens[0] {
            "q" => break,
            _ => println!("{:?}", tokens),

        }
    }
}

Voila! All we do is match on the first token, if we have a q, we will break our infinite loop back into our main function, which will then immediately end. All other tokens will match against our _ which will print our tokens.

Now let's work on the core of our client, the visit function!

Visit Function

Pack your bags, we're going to see grandma!

Another joke...

Similar to our quit option, lets add our visit command

./src/main.rs

...
fn visit(tokens: Vec<&str>) {
    println!("Attempting to visit....{}", tokens[1]);
}
...
...
            "q" => break,
            "visit" => visit(tokens),
...

First thing, we'll only go over what has changed now so please feel free to leave any bugs or errors in the comments below and I'll update as soon as possible. Thank you!

The first first thing to note is that we now added a visit command underneath our quit option. We call our visit function which currently does nothing but we'll fix that shortly.

Rust is a safe language but it isn't idiot proof! If we enter visit all by itself, we would cause rust to panic because we reference tokens[1] in our function which wouldn't exist.

We need to add some array verification so we don't cause a panic!

./src/main.rs

...
fn visit(tokens: Vec<&str>) {
    if tokens.len() < 2 {
        println!("Nowhere to visit.");
        return;
    }

    let destination = tokens[1];

    println!("Attempting to visit....{}", destination);
}
...

Wonderful! We will now print an error message for the user if they try typing visit without a destination.

Now we're going to work on the big step. We're going to connect to a Gemini server and actually get data!

Connecting

The more we connect, the more we hurt each other.

https://en.wikipedia.org/wiki/Hedgehog%27s_dilemma

!

This next part we're going to work on, isn't going to work but its fun to see how things break just as much as its fun to see it all work!

./src/main.rs

...
use std::io::{Read, Write};
...
fn visit(tokens: Vec<&str>) {
    if tokens.len() < 2 {
        println!("Nowhere to visit.");
        return;
    }

    let destination = format!("{}:1965", tokens[1]);
    println!("Attempting to visit....{}", destination);

    let mut socket = TcpStream::connect(&destination).unwrap();
    socket.write(tokens[1].as_bytes()).unwrap();

    let mut raw_data: Vec<u8> = vec![];
    socket.read_to_end(&mut raw_data).unwrap();

    let content = String::from_utf8_lossy(&raw_data);
    println!("{}", content);
}
...

We first include the Read option from the standard library's io module. This will allow us to use the read options on sockets.

Next we update our destination variable, instead of just where we want to go, we are now adding the port number as well. Port 1965 is what is used by Gemini servers to host content.

Next we connect to the destination on that port.

We then write out the location we want as that is what the Gemini server will serve us. We need to convert this to bytes.

Next we read the data, we don't know how much data the Gemini server is going to send back so we set up a vector of bytes and then read_to_end. This will give us a growing buffer to hold the data we receive.

Next we convert that data we received into a String. We do this by using the from_utf8_lossy function, we use the lossy function because we don't know the encoding at this point so anything outside utf8 will be substituted out.

Finally we print our data to the screen.

You can see all this in action by trying to visit gemini.circumlunar.space

Now this function should print just a blank line. This is because of a key part of the specification. One mandatory rule for Gemini is that we must use TLS. Right now, we aren't and so we aren't really talking to the Gemini server yet.

TLS

You can sub in the word SSL for TLS and the meaning would be identical, SSL was an older version of TLS, Transport Layer Security.

Now we will include some crates into our project. We will be using the rustls crate to do our TLS connections.

The first step is to add rustls to our Cargo.toml file.

./Cargo.toml

[dependencies]
rustls = { version="0.18", features=["dangerous_configuration"] }
webpki = "0.21"

I would have really liked to not use any dependencies but that's okay!

rustls will be the crate we rely on and in turn we also need webpki for a type, we may be able to strip it out but for now let's just use the entire thing.

The big thing here is that in our features flags for rustls, we enable dangerous_configuration. This is because we are about to do something dangerous. Gemini, as part of its spec, allows us to do anything with the TLS certificate, including not verifying. This is what we'll be implementing so we need to be able to turn off certificate validation in rustls and this is only done via the dangerous_configuration feature.

Let's get to the code!

We're going to split this up into two sections and go over them separately.

use std::io;
use std::io::{Read, Write};
use std::net::TcpStream;
use std::sync::Arc;
use rustls::Session;
use rustls::{RootCertStore, Certificate, ServerCertVerified, TLSError, ServerCertVerifier};
use webpki::{DNSNameRef};

struct DummyVerifier { }

impl DummyVerifier {
    fn new() -> Self {
        DummyVerifier { }
    }
}

impl ServerCertVerifier for DummyVerifier {
    fn verify_server_cert(
        &self,
        _: &RootCertStore,
        _: &[Certificate],
        _: DNSNameRef,
        _: &[u8]
    ) -> Result<ServerCertVerified, TLSError> {
        return Ok(ServerCertVerified::assertion());
    }
}

fn visit(tokens: Vec<&str>) {
...

Whew! We have a whole slew of things to get through. The first thing to note is that we have a few more crates and utilities added to our little client. We could and probably should move all this into a separate file for TLS connections but! I think mentally its easier if everything is in one file so you can hold everything in your head.

The next thing to note is that we create a struct called DummyVerifier. rustls comes with a default certificate verification utility that will use a certificate root store to verify server certificates. What this means is that rustls has a way of verifying that a server certificate is valid.

In Gemini, we don't need this, and in our basic client we can actually skip TLS validation. To do this we will need to override rustls' default behavior. This is why we enabled dangerous_configuration in our Cargo.toml file.

Once we have our DummyVerifier struct, we add the ability to create new instances of it.

Next is the key part of our override, we re-implement the ServerCertVerifier trait for our dummy. For Gemini, we simply take all the parameters in and return an Ok(). This means that regardless of what certificate the server uses, we will always return that it was valid.

! Voila! We now have a verifier that we can have rustls use that will ignore TLS certificate issues.

The next step is to use this verifier with rustls to make our connection.

Buckle in!

...
fn visit(tokens: Vec<&str>) {
    if tokens.len() < 2 {
        println!("Nowhere to visit.");
        return;
    }

    let destination = format!("{}:1965", tokens[1]);
    let dns_request = tokens[1];
    let request = format!("gemini://{}/\r\n", tokens[1]);

    println!("Attempting to visit....{}", destination);

    let mut cfg = rustls::ClientConfig::new();
    let mut config = rustls::DangerousClientConfig {cfg: &mut cfg};
    let dummy_verifier = Arc::new(DummyVerifier::new());
    config.set_certificate_verifier(dummy_verifier);

    let rc_config = Arc::new(cfg);
    let dns_name = webpki::DNSNameRef::try_from_ascii_str(dns_request).unwrap();

    let mut client = rustls::ClientSession::new(&rc_config, dns_name);
    let mut socket = TcpStream::connect(destination).unwrap();

    let mut stream = rustls::Stream::new(&mut client, &mut socket);
    stream.write(request.as_bytes()).unwrap();

    while client.wants_read() {
        client.read_tls(&mut socket).unwrap();
        client.process_new_packets().unwrap();
    }
    let mut data = Vec::new();
    let _ = client.read_to_end(&mut data);

    let status =  String::from_utf8_lossy(&data);

    println!("{}", status);

    client.read_tls(&mut socket).unwrap();
    client.process_new_packets().unwrap();
    let mut data = Vec::new();
    let _ = client.read_to_end(&mut data);

    let content =  String::from_utf8_lossy(&data);

    println!("{}", content);
}
...

Let's go over everything line by line.

The command we are entering is visit gemini.circumlunar.space

The first step is make sure we have a place to visit, this is the if statement.

Next we're going to format our destination in a few ways. The first is to create a url with our port number. This is the destination variable.

The next step is to create the DNS query url, this will be just the url without a port number.

The last variable we need is a gemini url which we're calling request. This adds the prefix, the closing /, and the carriage return line feed.

Now we begin the TLS dance.

We set up cfg which is how rustls will do the TLS validation. We create a new configuration. Normally we would add a root store to this config object which rustls will in turn use to validate certificates.

In our case, we need to use DangerousClientConfig to mutate the cfg object into something that can use our custom verifier.

Next we initialize our dummy_verifier. I'm not sure what Arc is so feel free to leave a comment!

Next we use set_certificate_verifier to override rustls' default certificate verifier.

Now we have a config object that won't bother validating certificates!

Now we can start working on the connection.

The next lis is setting up another Arc object of the cfg.

Then we create a DNSNameRef using the dns_request variable we set up earlier. This step seems to be superfluous and there probably is a way to remove the webpki dependency.

Now we will create the TLS connection!

We use ClientSession to start a TLS session using our cfg object and DNSNameRef.

We have our session, next we need our socket! We use TcpStream to connect to our destination variable, this is why we added port number to our destination variable earlier as TcpStream needs a port number to connect to.

At this point we have a socket and we a TLS session! We are going to marry the two by using the Stream module in rustls.

Now we can use the stream object to write our request variable. We need to convert our request to bytes. This means that rustls will encrypt our data and send it to the server in one step!

Now, ideally we would be able to use the stream.read option to simply read the stream for the server's response however for reasons unknown to me, we can't!

This part is a bit hacky, and unfortunately I can't explain it. It is a bit frustrating to not know why something is working or if it'll even work tomorrow. It very much exists on a prayer.

We start by checking out TLS session for data, this appears to take some variable amount of time and I haven't figured out what is dependent on. Because we don't know when the data will come in, we need to be polling the session for data in a while loop.

Each time we poll, we read from our session and process the packets on it.

process_new_packets will validate our certificate, using our DummyVerifier. Once verified, we can then use read_to_end() to read our TLS session for data.

We would normally have an unwrap here to panic if something breaks, in this case the TLS session throws a panic when the response has no length. We don't want rust to panic, we need it to keep checking the stream. This is why we have the read_to_end() go into an unused variable.

Then we read the data we received and convert it to a string using from_utf8_lossy.

The first part is the status that the Gemini server will send. For now we'll only worry about successful requests, this means that we will get a response body following our status. We then read from our session once more, decrypt the data using process_new_packets() and then we read the data into a variable.

We now have the response body as well!

Voila! We have the tiniest hint of a real working Gemini client now! That was a slog so feel free to bask in it. I apologize that the above explanation isn't very good and may be incorrect, it is completely made up based on how I think TLS and sockets work and the fact it works is flukey. I would love to tighten this up. The code feels brittle but! thats okay!

Now if you play with client a little, you'll quick find that it isn't very useful, we are stuck going only to base level gemini spaces right now. What we need to do and will do in the next chapter is handle URLs! If we could do that, we have the bare minimum needed to traverse gemini space!

See you soon.

Handling URLs

Hello! Welcome back after the mess that was getting out connection set up! I'll hold my thoughts to the end but I'll just say that it feels very much like there is a better way to do the connections to Gemini servers.

Anyway let's get started with this chapter! What we need to do now is clean up our URLs, currently we can do visit gemini.conman.org and that's it. We can't do something like visit gemini.conman.org/news.txt or vist gemini://gemini.conman.org.

The Gemini specification references RFC 3986 when talking about URLs. They key part we need to worry about is page 16 - Syntax.

https://tools.ietf.org/html/rfc3986#section-1.1.1

This page highlights what a URL is really made of and gives us the structure we need to parse the URL into.

So what we'll do is create a struct that contains our URL and then we'll have the generate the various URLs we need as we need them.

Let's get started!

...
#[derive(Debug)]
struct Url {
    scheme: String,
    address: String,
    port: String,
    path: String,
    query: String,
    fragment: String,
}

impl Url {
    fn new(url: &str) -> Self {
        let mut url_string = url.to_string();

        let scheme: String;
        let scheme_end_index  = url_string.find("://").unwrap_or(0);

        if scheme_end_index == 0 {
            scheme = String::from("gemini");
        } else {
            scheme = url_string.drain(..scheme_end_index).collect();
        }
        url_string = url_string.replacen("://", "", 1);

        let mut address_end_index  = url_string.find(":").unwrap_or(0);

        let address: String;
        let port: String;

        if address_end_index != 0 {
            url_string = url_string.replacen(":", "", 1);
            address = url_string.drain(..address_end_index).collect();

            let port_end_index = url_string.find("/").unwrap_or(url_string.len());
            port = url_string.drain(..port_end_index).collect();
            url_string = url_string.replacen("/", "", 1);

        } else {
            address_end_index = url_string.find("/").unwrap_or(url_string.len());
            address = url_string.drain(..address_end_index).collect();
            url_string = url_string.replacen("/", "", 1);

            match scheme.as_str() {
                "gemini" => port = "1965".to_string(),
                "http" => port = "80".to_string(),
                "https" => port = "443".to_string(),
                _ => port = "".to_string()
            }
        }

        let path_end_index = url_string.find("?").unwrap_or(url_string.len());
        let path: String = url_string.drain(..path_end_index).collect();
        url_string = url_string.replacen("?", "", 1);


        let query_end_index = url_string.find("#").unwrap_or(url_string.len());
        let query: String = url_string.drain(..query_end_index).collect();
        url_string = url_string.replacen("#", "", 1);

        let fragment = url_string;

        Url {
            scheme, address, port, path, query, fragment
        }
    }

}
...

Based off the syntax is RFC 3986 we know we need to capture the following things, we need the scheme, the authority which we will break into address and port, the path, the query, and lastly the fragment.

We first create a struct to match that.

Next what we need to do is create a constructor for our Url object that can take a wide variety of input and create a Url object out of it.

We do this by writing a new function for our Url. It looks complex but really we are just following a simple series of steps. Each portion of a URL is delimited by some character. The first step is to locate that character and everything before that character will the correspond to a portion of the URL.

Let's go over how to parse out the scheme. We first find where :// occurs. If it doesn't occur we will default the value to 0. This is what our unwrap_or() does. Next we drain our url_string, starting from the beginning to our end point. This means that url_string will lose characters and those characters will go into our scheme variable.

Now if there was no scheme set, then we will drain 0 elements. Lastly we need to remove :// from our URL string. We do this by using the replacen function. This way we remove just the first instance of this delimiter rather than all. We shouldn't see this text in the URL again anyway but I feel safer replacing just the first instance.

Now if we don't find a scheme, then we will default to gemini. This is specified in the Gemini specification.

We then repeat these steps for each portion of the URL.

Each portion of address has a delimiter we need to find, we then drain everything from the beginning to the delimiter. We then remove the delimiter from our URL and move to the next portion.

One thing to note here is that we don't get a port number, we will try to default one based on the scheme. If we can't match against the scheme we will leave the port number blank. (This will blow up but if don't have the scheme we shouldn't be connecting anyway!)

Now let's write our url formatter functions!

...
impl Url {
    fn new() -> Self { ... }
    fn for_tcp(&self) -> String {
        format!("{address}:{port}", address=self.address, port=self.port)
    }
    fn for_dns(&self) -> String {
        format!("{address}", address=self.address)
    }
    fn request(&self) -> String {
        format!("{scheme}://{address}:{port}/{path}\r\n",
            scheme = self.scheme,
            address = self.address,
            port = self.port,
            path = self.path
        )
    }
}
...

Here we have some very simple helper functions, we create a string for our TcpStream, we create one for a DNS lookup and lastly we create out full gemini request.

Something to note here is that the request has the rn, this is part of the gemini spec.

Now we can go ahead and update our visit function and main function to use our new Url object!

...
        match tokens[0] {
            "q" => break,
            "visit" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    return;
                }

                let url = Url::new(tokens[1]);
                visit(url)
            },
            _ => println!("{:?}", tokens),

        }
...

We moved out length check from our visit function to inside our command processor. I wanted to have the URL passed into the visit function so this logic needed to happen before we create the url.

We use our new constructor to create a Url object and we then pass that to our visit function.

Now let's update our visit function!

...
fn visit(url: Url) {

    println!("Attempting to visit....{}", url.address);
...
    let dns_url = url.for_dns();
    let dns_name = webpki::DNSNameRef::try_from_ascii_str(&dns_url).unwrap();
...
    let mut socket = TcpStream::connect(url.for_tcp()).unwrap();
...
    stream.write(url.request().as_bytes()).unwrap();
...

We have changed the parameter we pass into our function, we now will pass in just our Url object. We have also replaced the various URLs we constructed with strings from our Url object!

With that we should be done!

Try visit gemini.conman.org/news.txt.

This should now work! We now have a much better version of our original visit function. Now that we have all the parts of our URL in a struct we manipulate it and use them in any way!

! We have the core of our gemini client done! We can use it now as is to browse gemini space but we do have some things to fix up. Currently we print out the status and response body at all times, however that's not the only thing that can happen in Gemini! In the next chapter we will look at dealing with the various Gemini statuses!

See you soon!

PS. I'm not in love with the way we've done our parsing of the URL, let me know if you have better ideas. I may take a look at some URL parsing crates to see how they do things.

Gemini Statuses

Hello! We currently have our Gemini client working not badly. We have a visit function that can connect to Gemini pages over TLS and we have a URL object that can be used to navigate to pages. The next thing we need to do is start handling the various statuses that Gemini can throw!

There are 6 statuses for a full Gemini client but for basic clients we only need to implement 4.

So far, we haven't done anything to differentiate the different statuses. We simply check our TLS session for data and print it to the screen. We assume our request will succeed, so we print out the status and we immediately poll our session again to get the response body.

Now we will close our session if the request failed, follow redirect statuses to their new URLs and handle successful requests properly!

Let's get started!

Gemini Responses

Before we start dealing with statuses, we need to first deal with responses. This means that we should construct a response object out of what the Gemini servers send back.

...
#[derive(Debug)]
struct Status {
    code: String,
    meta: String,
}

impl Status {
    fn new(status: String) -> Self {
        let tokens: Vec<&str> = status.splitn(2, " ").collect();
        Status {
            code: tokens[0].to_string(),
            meta: tokens[1].to_string()
        }

    }
}

#[derive(Debug)]
struct Response {
    status: Status,
    mime_type: Option<String>,
    charset: Option<String>,
    lang: Option<String>,
    body: Option<String>,
}

impl Response {
    fn new(data: String) -> Self {
        let tokens: Vec<&str> = data.splitn(2, "\r\n").collect();
        let status = Status::new(tokens[0].to_string());

        match status.code.chars().next().unwrap() {
            '2' => {
                let mime_type = Some("text/gemini".to_string());
                let charset = Some("utf-8".to_string());
                let lang = Some("en".to_string());

                let body;
                if tokens[1] != "" {
                    body = Some(tokens[1].to_string());
                } else {
                    body = None;
                }

                Response { status, mime_type, charset, lang, body }
            },
            _ => {
                Response { status, mime_type: None, charset: None, lang: None, body: None }
            }
        }
    }
}
...

This may look like a lot but it really is straightforward. The first thing we need to look at is our Response struct. Here you can see the key pieces of information we need. We have a status object, a mime type, charset, language and finally the body.

Our status object in turn is really just the raw data we got from the Gemini server.

When we construct a Response, we take in a String.

Now the next line is key! Due to the way rustls is interacting with the Gemini server, we aren't always getting one just the status line, or the status line and body.

Depending on something I don't know, the server sometimes sense the status line and response body in one group and other times it sends the status line and the response body comes moments after.

This is why the first thing we do is split the String on carriage return line feed and we split it so we only get 2 attributes.

We make a Status object using the first part of the token. Had we not received a response body the second part of the tokens would be a empty string, "".

Now the next step is to process the status code.

For now we are going to only handle the success code starting with 2. We match on 2 and we will then set the various parts of the response. For now we will hard code in the values, and later on we will actually parse the status line.

Gemini status codes are 2 digits but because we are working on a basic client, we'll focus on just the first digit of statuses. This is enough information to get going.

The next step is figure out if we received the body or not. If the second element in the tokens array is blank, then we didn't receive a body. If it isn't blank then we did receive a body.

We then return the Response back.

If the status was anything other than 20, we set the response variables to None and only fill in the status.

Now let's look at how we use the Response in our visit function.

Status of 2x - Success

Let's first implement the handing of a status starting with 2. This means that we have a successful response and we should be getting a response body. Based on the mime type we may do different things but for now we will just print out what we get to the screen.

The test case I used for this was the Solderpunk's Gemini page but feel free to use any Gemini page to play with!

> visit gemini.circumlunar.space
...
fn visit(url: Url) {
    ...
    while client.wants_read() {
        client.read_tls(&mut socket).unwrap();
        client.process_new_packets().unwrap();
    }
    let mut data = Vec::new();
    let _ = client.read_to_end(&mut data);

    let mut response =  Response::new(String::from_utf8_lossy(&data).to_string());

    match response.status.code.chars().next().unwrap() {
        "2" => {
            if response.body == None {
                client.read_tls(&mut socket).unwrap();
                client.process_new_packets().unwrap();
                let mut data = Vec::new();
                let _ = client.read_to_end(&mut data);

                response.body =  Some(String::from_utf8_lossy(&data).to_string());
            }

            println!("{}", response.body.unwrap_or("".to_string()))
        },
        _ => println!("Error - {} - {}", response.status.code, response.status.meta)
    }
}
...

Our updated visit function will now process the first response it gets from the server and will make it into a Response object. Next we match against it to decide what we want to do. The first case is the easiest, if we get a code starting with 2 we have a successful status.

If the request was successful, we need to check the body. We may have received the body with the status or we may need to check our TLS session for more data.

If the response.body is None, this means we need to check our session and once we have the data we update the Response object with that information.

If there is something in the response.body then we can simply print the page.

The catch all in our match statement is for our errors, feel free to try requesting a page that doesn't exist and the Gemini server should respond with a not found error message.

Status of 3x - Redirect

Now that we have our 2s working, lets add our 3s. Statuses starting with a 3 mean that the page has moved and that this is a redirect. The meta field in the status is the new url we need to request.

> visit zaibatsu.circumlunar.space/spec-spec.txt
Error - 31 - gemini://gemini.circumlunar.space/docs/spec-spec.txt

Currently this is what happens when we request a page that has moved, we see the error type starts with a 3 and the meta field is the URL we should request.

Let's handle the redirects!

...
 '2' => { ... },
 '3' => visit(Url::new(&response.status.meta)),
  _ => println!("Error - {} - {}", response.status.code, response.status.meta)
...

! In this case handling redirects is very simple, if we get a status starting with 3, we can simply call the visit function again, passing in the meta field of the status.

Now we should be able to try the visit command again and this time our Gemini client will follow the redirect!

Status of 1x - Input

Now that we see how redirects work, we can now look at the final status we need to worry about. The statuses starting with 1 mean that the Gemini server is expecting input. The meta field is the prompt and once the user answers we then add it the URL we are currently on as a query parameter.

The generic way of adding parameters is to append the value with a question mark.

> visit gemini.conman.org/hilo/1078?50

Here we are submitting 50 to the Gemini page located at hilo/1078.

There is a guessing game at gemini.conman.org/hilo/ that is very helpful to test our Input status type.

Let's get started!

...
    fn for_dns(&self) -> String {
        format!("{address}", address=self.address)
    }
    fn input_request(&self, input: String) -> String {
        format!("{scheme}://{address}:{port}/{path}?{input}\r\n",
            scheme = self.scheme,
            address = self.address,
            port = self.port,
            path = self.path,
            input = input
        )
    }
...

Inside our Url object methods, we're going to add a new formatter. We are going to add a function that can append arguments to our url this way we can generate urls where we are passing back input.

The next step is to update our request function in our url object.

...
    fn request(&self) -> String {
        if self.query == "" {
            format!("{scheme}://{address}:{port}/{path}\r\n",
                scheme = self.scheme,
                address = self.address,
                port = self.port,
                path = self.path
            )
        } else {
            format!("{scheme}://{address}:{port}/{path}?{query}\r\n",
                scheme = self.scheme,
                address = self.address,
                port = self.port,
                path = self.path,
                query = self.query,
            )

        }
    }
...

Now we check to see if we have a query, it is probably better to make these attributes of the URL into Options so that we can check against None instead of blank but for now this is fine. If we don't have a query then we generate a url as usual. If we do have a query however, then we append a ? and our query variable.

Now let's look at how we use these 2 functions.

...
    match response.status.code.chars().next().unwrap() {
        '1' => {
            print!("{prompt} ", prompt=response.status.meta);
            io::stdout().flush().unwrap();

            let mut answer = String::new();
            io::stdin().read_line(&mut answer).unwrap();

            let dest = url.input_request(answer.trim().to_string());
            visit(Url::new(&dest))
        },
...

Inside our visit function we now have added logic to handle the Gemini responses with a status starting with 1. In this case we first print out the meta field as that is the prompt the user should see.

We then wait for input from the user. We then take this input and create a request that will add our the input as the query parameter.

Finally we call our visit function again.

The next time our visit function runs, when we get to the following line:

...
stream.write(url.request().as_bytes()).unwrap();
...

The request() function will see that we have a query available so it will create a request with the query parameter incorporated in it.

Voila! We have inputs working in Gemini! At this point we should be able to play the guessing game at gemini.conman.org/hilo/.

> visit gemini.conman.org/hilo/1100
Guess a number 10
Higher 50
Lower 25
Higher 40
Lower 35
Lower 32
Higher 33

Congratulations!  You guessed the number!

=> /hilo/1093 Try again?
=> / Nah, take me back home

>

The request for when we entered 10 would have looked like:

gemini://gemini.conman.org:1965/hilo/1100?10

Statuses of 4x, 5x, and 6x - General Errors.

Currently the remaining statuses fall into our catch alls in our match statement inside our visit function.

...
        _ => println!("Error - {} - {}", response.status.code, response.status.meta)
...

We're going to leave this as is for as the Gemini specification allows this and we are building a basic client.

! We are done! We have inputs, redirects, success and errors being handled now. We do have a hacky bit still left, in the creation of our Response object we hard coded the mime type, charset and lang values, we're going to leave this as is for now as, once we finish up the processing of a gemini page we can then circle back to fixing that up!

In the next chapter we're going to add a few more commands to our client.

See you soon!

Caching Pages

Hello! In this chapter we're not going to do anything major, we're going to add some quality of life things. Currently we need to visit each page directly, if we want to see a page we went to before, we need to reconnect to that gemini server and request the page again. Instead we should save that page, this way we can show the user from our cache instead of going across the internet. This means that we can also save a history of all the pages we traverse!

Let's get started!

...
struct Page {
    url: Url,
    content: String
}

impl Page {
    fn new(url: Url, content: String) -> Self {
        History { url, content }
    }
}
...

We're going to start by creating a Page struct that will store the content and the URL.

Next we need to update our main function.

...
fn main() {
    let prompt = ">";
    let mut cache :Vec<Page> = vec![];
...
            "ls" => {
                if cache.len() > 0 {
                    println!("{}", cache.last().unwrap().content);
                } else {
                    println!("Nothing cached.");
                }
            }
            "v" | "visit" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    return;
                }

                let url = Url::new(tokens[1]);
                let content = visit(&url);

                println!("{}", content);
                cache.push(Page::new(url, content));
            },
...

The first thing to note is that we note have a variable called cache which is a vector of Page objects. This way as we visit more and more Gemini pages we will continually add them to our cache variable.

The next thing to look at is our updated match on visit. I added "v" as an option because I wanted a short hand and you can see I added a whole slew to exit out of our client as well. (I had typed each one at some point and didn't get kicked out so I added them in!:))

Inside our visit match, we now create a Page object from the url and content. We are passing the ownership of the url to the cache variable so we need to update the way our visit function works. Before it worked by getting the url transferred to it, now that we want our cache to own it, we need to give our visit function a borrow.

Our visit function now uses the & symbol to show that it is only borrowing url.

...
fn visit(url: &Url) -> String {
...
'1' => {
...
visit(&Url::new(&dest))
},
...
'3' => visit(&Url::new(&response.status.meta)),
...

Anywhere we call our visit function will also need to use borrows instead of transferring ownership.

The next thing we need to look at is our ls option. I like the idea of Gopher being a file system and so I'm using some verbiage from file systems for Gemini as well. This may not make sense so feel free to use anything that does. A good word might be view or even last.

Inside our ls match we make sure we have something in our cache and if we do we simply show the last item in the cache.

Now we can use ls to view the last page we accessed!

We have a vector of Pages for a reason, because we have a vector, we can now display the history and access any page we visited by going through the cache!

...
            "h" | "history" => {
                for (index, page) in cache.iter().enumerate() {
                    println!("{}. {}", index, page.url.request().trim());
                }
            },
...

This is a very simple loop where if we type h or history, we will print out what's currently in the cache.

Now that we can see our cache we need to access it.

...
            "h" | "history" => {
                for (index, page) in cache.iter().enumerate() {
                    println!("{}. {}", index, page.url.request().trim());
                }
            },
            _ if tokens[0].starts_with("h") => {
                let option = tokens[0][1..].to_string().parse::<i32>().unwrap_or(-1);
                if option < 0 || option >= cache.len() as i32 {
                    println!("Invalid history option.");
                } else {
                    println!("{}", cache[option as usize].content);
                }
            },
...

Now we add another match statement, we have a match that will go through if the token starts with h. This is because we want to access our history by doing h1, which will mean to access the cache vector at element 1.

The first thing we need to do is remove the first character from our token, we then parse it into a variable. If the user had entered something else after the first h that couldn't be parsed into a number, we will return -1 instead. This is why we parse it into an i32.

Next we check to see if the option the user entered is with in the range of our cache vector. The cache.len() returns a usize so we need to read it as a i32. If the input is in the range, then we can print what we currently have in our cache!

A good thing to implement that would be very similar to the history option would be to use the cache to list our previous pages, but instead of reading from the cache we go to the Gemini server. This could be a function like r3 meaning reload the 3rd element in the cache vector.

For now however we are done! We have our history and our ls command working to show Gemini pages now. Now let's move to the next chapter where we will clean up the displaying of Gemini pages and allow for links in a Gemini page to be traversable.

See you soon!

Parsing Gemini Pages

Hello! Let's do a recap of what we currently have done. Our client now has a visit function that can travel to a Gemini server, it can handle URLs, so we can visit any Gemini page, and we have statuses so we can correctly handle the different types of responses Gemini can send. We also have caching and we can select pages we want to view from the cache.

So far we've been just displaying everything that the server sends to the user. If there are links or headings in a page, we leave them as is. This is really cumbersome, especially for links! The user needs to copy and paste the link to get to it.

In this chapter we're going to add some pizzaz!

Let's get started!

The Gemini type - text/gemini

We've currently hardcoded text/gemini as the mime type for our responses. For now we'll leave this in, but we can change the behavior of our client based on the mime type, for instance if it is anything other than text/*, we can save the file or save the file and open the correct program like an image viewer or browser.

text/gemini is a special format made for Gemini. It has a very simple rule set where the first 3 characters of a line decides the line's type.

  1. Lines started with => are links, the first space after the link is the splitter between the link and the user visible name
  2. 3 back ticks is the preformat toggle
  3. Lines started with ### are headings
  4. * is for bullets

5.> is for quotes6.Everything else is a regular lineWe are making a basic command line client so really the only thing we care about are the link lines. Everything else we can get away with by printing directly to the screen!

Let's look at some code!

...
fn display(page: &Page) {
    let lines: Vec<&str> = page.content.split("\n").collect();

    for line.trim() in lines {
        println!("{}", line.trim());
    }
}
...

The first thing we need to do is move our printing logic all to one place, we currently print out in our hx option, the ls, option and in the visit option. Now we'll use one general purpose display function!

This function splits the page based on new line characters, this is because Gemini allows lines to be delimited by rn or n. We then print each line onto the screen. This is where we will add the logic to do the parsing.

But before that, let's update our main function to use our new display function.

...
            _ if tokens[0].starts_with("h") => {
                let option = tokens[0][1..].to_string().parse::<i32>().unwrap_or(-1);
                if option < 0 || option >= cache.len() as i32 {
                    println!("Invalid history option.");
                } else {
                    display(&cache[option as usize]);
                }
            },
            "ls" => {
                if cache.len() > 0 {
                    display(&cache.last().unwrap());
                } else {
                    println!("Nothing cached.");
                }
            }
            "v" | "visit" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    return;
                }

                let url = Url::new(tokens[1]);
                let content = visit(&url);
                let page  = Page::new(url, content);
                display(&page);
                cache.push(page);
            },
...

We have changed our println! statements to calls to our display function passing in the Page object.

Now we should be able to visit a Gemini page and still have everything displaying as text.

Let's work on the link item type first!

Link Item Type

The Gemini specification says that a link is any line starting with =>, with or without a space followed by a relative or absolute path, which is then followed by a space and some text or nothing at all. If there is some text, that text should appear to the user and the link hidden, if there is no text then the link should appear.

Now the first thing we need to do is change the way our Page struct works. Currently we have a url field and a content field. Now we will remove the content field and instead save a list of lines.

Let's see the code!

...
struct Link {
    text: String,
    link: String,
}

struct Page {
    url: Url,
    lines: Vec<String>,
    links: Vec<Link>
}

impl Page {
    fn new(url: Url, content: String) -> Self {
        let content: Vec<&str> = content.split("\n").collect();

        let mut links: Vec<Link> = vec![];
        let mut lines: Vec<String> = vec![];

        for line in content {
            match line {
                _ if line.starts_with("=>") => {
                    let mut link_line = line.replacen("=>", "", 1);
                    link_line = link_line.replace("\t", " ");
                    let tokens: Vec<&str> = link_line.trim().splitn(2, " ").collect();

                    let link: String;
                    let text: String;

                    if tokens[0].starts_with("/") {
                        link = format!("{url}{relative_path}",
                            url=url.address,
                            relative_path=tokens[0].to_string()
                        );

                    } else if tokens[0].starts_with("gemini://")
                        || tokens[0].starts_with("http://")
                        || tokens[0].starts_with("https://")
                    {
                        link = tokens[0].to_string();

                    } else {
                        link = format!("{url}{relative_path}",
                            url=url.request().trim(),
                            relative_path=tokens[0].to_string()
                        );
                    }

                    if tokens.len() <= 1 {
                        text = tokens[0].to_string();
                    } else {
                        text = tokens[1].trim().to_string();
                    }

                    links.push(Link { text, link });
                    lines.push(line.to_string());
                },
                _ => { lines.push(line.to_string()); }
            }
        }
        Page { url, lines, links}

    }
}
...

The first thing to note is we now have a new struct for links, Link only contains a text that we show the user and the underlying link the text points to.

Next we update the Page struct so that instead of content it has a Vector of strings for lines and it has a vector for links.

Now the biggest change will be in our constructor. We still pass in the same variables so this means that our Page creation in various places doesn't need to get updated.

Let's go through what our constructor is doing.

We first split the content by new line characters. We then set up a list of links and lines. As we go through each line in the content, we will add it to these buckets.

We loop through each line in the content.

We then match the line. The base case is that we simply add the line to our lines variable.

The link case is when a line starts with =>.

First we will remove the => characters.

Next we replace any tabs with spaces.

We then tokenize the link line. We split the line into 2 on the first space we find and collect this into a vector.

Next we check to see if this is a relative path or if this is an absolute path.

If it is a relative path starting with / we will append the base url of the page we are on.

If it is an absolute path starting with gemini, http, or https we will leave it as is.

Anything else is a relative path, relative to our current location.

Next we check to see how many tokens we made when we split on the first space. If the link line didn't contain any user visible text then our tokens would just be the link. In that case we set the text that the user sees to the link itself. If we do have have a second part to our tokens variable then we use that as the the text we will show the user.

We finally add the link to our list of links.

We also add the line into our lines. This is so when we go to display everything the display routine can figure out when to use the links variable.

Let's now take a look at our updated display routine!

...
fn display(page: &Page) {
    let mut link_counter = 0;

    for line in &page.lines {
        match line {
            _ if line.starts_with("=>") => {
                println!("{link_counter} => {text}",
                    link_counter=link_counter,
                    text=page.links[link_counter].text
                );
                link_counter = link_counter + 1;
            },
            _ => println!("{}", line)
        }
    }
}
...

Instead of looping through page.content we loop through page lines.

We match the line against the link type identifier, =>, at which point if we do hit that, instead of printing the line we print the user visible text.

To know which link we need to print, we need to know the index of the link, this is where the link counter comes in. We keep track of how many => we've seen, which will tell us which link we need to display for the user.

  • This may be a bit weird, but make sure to understand this part, in our Page constructor, we added links as we found them, this means that when we display them, as we find links we can reference them in the Links list by knowing how many links we've seen!

The base case in our match statement is printing the line directly to the screen.Now we also print the link_counter out as well as we'll use that number to follow a link.

Following Links!

Whew! We have links getting formatted properly now. Let's add the ability to follow a link. We'll do something like cd x where x is the link number we want to go to. This is because I really enjoy thinking of everything in a file system. This works for me so using similar syntax is easier. The great thing is we can modify the code ever so slightly for any sort of syntax!

(I had originally had this with vx syntax but I didn't like it as much so I changed it.)

Let's get started!

...
            "cd" => {
                if tokens.len() < 2 {
                    println!("Didn't specify a destination.");
                    return;
                }

                if cache.len() <= 0 {
                    println!("Nothing cached.");
                    return;
                }

                let page = &cache.last().unwrap();
                let option = tokens[1].to_string().parse::<i32>().unwrap_or(-1);
                if option < 0 || option >= page.links.len() as i32 {
                    println!("Invalid link option.");
                } else {
                    let url = Url::new(&page.links[option as usize].link);
                    let content = visit(&url);
                    let new_page  = Page::new(url, content);
                    cache.push(new_page);
                }
            },
...

This option mixes a few different things that we have already done. The first thing we do is make sure we are on a page, we can do that by making sure we have something in the cache. We also need to make sure the number of tokens the user entered is correct.

Next we get the last page in the cache as that is where we really are. The last item in the cache is the last place we visited.

Next we parse the number that the user entered. v3 would mean that the user wants to visit the 3rd link on the current page.

We validate the option and then we do the same steps as our visit command. We first get the associated link with that option and we create a Url object from it. We then call the visit function on that url to get the content. We then create a new_page from the url and content. We display the content and then we cache this new page.

Voila! We can now handle links in Gemini! We have an honest to goodness Gemini client now. We can read Gemini pages and follow links as we find them.

One thing to note here is that when we do use our cd command, we don't immediately display the screen. Our visit command does. One is not better than other objectively but I like having to type in ls to display the page.

Let's bring out visit command in line with our cd command.

...
            "v" | "visit" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    return;
                }

                let url = Url::new(tokens[1]);
                let content = visit(&url);
                let page  = Page::new(url, content);
                cache.push(page);
            },
...

Now we don't display the page immediately, we simply add it to our cache!

Back Option

Let's do a couple more thing before we end this chapter. We should implement a back option so we can return to our previous page. This is actually very simple! Our cache is a stack, if pop off from it then we're immediately back one page!

...
            "cd" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    continue
                }

                if cache.len() <= 0 {
                    println!("Nothing cached.");
                    continue;
                }

                if tokens[1] == ".." {
                    cache.pop();
                    if cache.len() > 0 {
                        println!("Back to {}", cache.last().unwrap().url.request().trim());
                    }
                } else {

                    let page = &cache.last().unwrap();
                    let option = tokens[1].to_string().parse::<i32>().unwrap_or(-1);
                    if option < 0 || option >= page.links.len() as i32 {
                        println!("Invalid link option.");
                    } else {
                        let url = Url::new(&page.links[option as usize].link);
                        let content = visit(&url);
                        let new_page  = Page::new(url, content);
                        cache.push(new_page);
                    }
                }
            },
...

Now we can go back by doing cd ..! (Originally I had implemented this as it's own command, back, but I didn't like it as much)

Now that we have the back option, we should also implement a where option. This way we can tell what page we currently have as the last item in the cache.

            "where" => {
                if cache.len() > 0 {
                    println!("{}", cache.last().unwrap().url.request().trim());
                } else {
                    println!("Nowhere.");
                }
            },

Another simple little command that adds a bit of life to our client. We make sure we have something in our cache and if we do we print out the url request we made to get that page.

Now with where, cd, history and ls we have a functioning client that can do all sorts of things! History is a little wonky right now due to it not really being history but really being just the current path we took to get to where we are. If we cd we are modifying the history. Ideally, we would save the history of all the places we visit and then rename the current history to pwd. Let's leave that out for now but just a thought. :)

Now! We are done with this chapter! We hooked up links in Gemini and we can now traverse Gemini space comfortably. We can follow links and go back. We can use the ls command to reprint pages.

In the next chapter we will add some pagination to our pages, that way we don't have to manually scroll up and down!

See you soon!

Pagination

Hello! Now that we can traverse Gemini space, we should deal with very long Gemini pages. Currently we print everything to the screen which works fine but it'd be a little nicer if we only displayed what would fit on our screen and then hit enter for the next page.

Let's get started!

Page Height

The first thing we need to do is get out term height. Unfortunately there doesn't seem to be a way to do this purely in rust so we need to use libc. Please let me know below if there is a way get the height through rust!

Many of the solutions to getting the terminal's height involved getting a crate but I didn't want to add a dependency. So I went into the source code and luckily it was simple enough to steal!

https://docs.rs/crate/term_size/0.3.2/source/src/platform/unix.rs

This is crate I stole the term size logic from. As I am building very specifically for myself, we can get away with just using the parts applicable to us. I have no dreams of using this in windows so I didn't steal that part.

We still need to add a dependency to libc so let's go ahead and do that. (Relying on libc, I feel much more comfortable with than a full crate that in turn relies on libc)

./Cargo.toml

[dependencies]
rustls = { version="0.18", features=["dangerous_configuration"] }
webpki = "0.21"
libc = "0.2"

We now have 3 dependencies!Now for our rust code.

./src/main.rs

...
use webpki::{DNSNameRef};
use libc::{STDOUT_FILENO, c_int, c_ulong, winsize};
use std::mem::zeroed;
static TIOCGWINSZ: c_ulong = 0x5413;

extern "C" {
    fn ioctl(fd: c_int, request: c_ulong, ...) -> c_int;
}

unsafe fn get_dimensions_out() -> (usize, usize) {
    let mut window: winsize = zeroed();
    ioctl(STDOUT_FILENO, TIOCGWINSZ, &mut window);
    (window.ws_col as usize, window.ws_row as usize)
}

fn term_dimensions() -> (usize, usize) {
    unsafe { get_dimensions_out() }
}
...

We include a few modules from libc and we also set a magic variable. I did steal this code so I don't really understand this code. I know what it does and for now that is enough!

We now have a function term_dimesions() that will return a tuple of our window's width and window's height.

Now let's get to the important part where we use this function!

Pagination

In our display function, we are now going the height of the window and display only what can fit. This means that we'll need to have another input loop here so that a user can hit enter to get the next page.

...
fn display(page: &Page) {
    let (_, window_height) = term_dimensions();

    let mut link_counter = 0;
    let mut current_pos = 0;
    let mut done = false;

    while !done {
        let max = current_pos + window_height - 1;

        for i in current_pos..max {
            if i >= page.lines.len()  {
                done = true;
                break;
            }
            let line = &page.lines[i];
            match line {
                _ if line.starts_with("=>") => {
                    println!("{link_counter} => {text}",
                        link_counter=link_counter,
                        text=page.links[link_counter].text
                    );
                    link_counter = link_counter + 1;
                },
                _ => println!("{}", line)
            }

            if i == max - 1 {
                current_pos = i;
                if line.starts_with("=>") {
                    link_counter = link_counter - 1;
                }
            }
        }

        if !done {
            print!("\x1b[92m[Press Enter for Next page]\x1b[0m");
            io::stdout().flush().unwrap();

            let mut command = String::new();
            io::stdin().read_line(&mut command).unwrap();
            if command.trim() == "q" {
                done = true;
            }
        }
    }
}
...

Our display function has now become a little bit more complex but the core idea is simple. We have some number of lines to show on the screen, if our page has more lines than that then we need to chunk our page so it fits on the screen.

We now have an outer loop that keeps track of if were doing paging through our Gemini page.

We set the max based on how many lines we can show. So if have a 50 line terminal, then we can only display 50 lines.

We keep track of the current line we're on by updating our current_pos variable. We only need to update this variable when we get to the end of the page.

So we loop through from our current location to our max which would be our current_pos + max. We subtract one from our max so that we can leave space for our prompt.

Next we check to see if we've gone past the length of our page.lines. If we have we are done and can break out of our loop.

If we aren't done we then do our logic to display the line. If it is a link we need to show the user the link text and update the link_counter.

If the line anything else we just need to display the line.

The next thing we do is check if we're at the bottom of the page. If we are, we will update our current_pos to this, that way the next time the loop starts, the top of the page will be the end of the previous one.

We also need to check if are breaking on a link line. If are breaking on a link line then on the next iteration of the loop we will display the link again. This means we need to decrement our link_counter so our link numbering still works.

Now the next section will handle our prompt. If we haven't reached the end of our Gemini document, then we display a prompt. The funny escape codes are colour codes. I like green but you can find all sorts of terminal colour codes online.

We need to set the colour of the text first and then at the end reset it. That is why we have 2 fancy colour codes. We then read some input in from the user. Currently pressing anything will cause us to go to the next page.

If we press q we will immediately leave the page.

Voila! We have pagination working now!

...
            "ls" | "more" => {
                if cache.len() > 0 {
                    display(&cache.last().unwrap());
                } else {
                    println!("Nothing cached.");
                }
            },
...

I updated our ls command option to allow more as a keyword as well. more in my head makes more sense.:)

Now we have a much easier time reading Gemini pages! In the next chapter let's implement bookmarks so we don't have to keep typing out full gemini paths!

See you soon!

PS. Just to be fancy, let's update our regular prompt to be green as well.

...
fn main() {
    let prompt = "\x1b[92m>\x1b[0m";
    let mut cache :Vec<Page> = vec![];
...

The colour codes are longer than the text itself!

Bookmarks

Hello! At this point we have a functional client and features and enhancements can be done! Let's focus the biggest enhancement we can make. Bookmarks! Currently we need to type in the places we want to go and we have no real way of saving a place.

In this chapter we will add our bookmarking functionality and also save it to a file.

Let's get started!

Bookmarks

We'll first write our bookmarking logic and then once we have the wired up, we'll working on saving our bookmarks to the disk.

We're going to reuse the logic from our history functionality.

...
    let prompt = "\x1b[92m>\x1b[0m";
    let mut cache: Vec<Page> = vec![];
    let mut bookmarks: Vec<String> = vec![];
...

We add a new bookmarks list that we will push things onto.

...
            "add" => {
                if cache.len() > 0 {
                    bookmarks.push(cache.last().unwrap().url.request().trim().to_string());
                } else {
                    println!("Nothing to bookmark.");
                }
            },
...

We then implement the add function that will simply add the last item in the cache to our bookmarks.

...
            "b" => {
                for (index, bookmark) in bookmarks.iter().enumerate() {
                    println!("{}. {}", index, bookmark);
                }
            },
...

We implement another handler for the b option which will print our our bookmarks.

...
            "b" => {
                for (index, bookmark) in bookmarks.iter().enumerate() {
                    println!("{}. {}", index, bookmark);
                }
            },
            _ if tokens[0].starts_with("b") => {
                let option = tokens[0][1..].to_string().parse::<i32>().unwrap_or(-1);
                if option < 0 || option >= bookmarks.len() as i32 {
                    println!("Invalid bookmark option.");
                } else {
                let url = Url::new(&bookmarks[option as usize]);
                let content = visit(&url);
                let page  = Page::new(url, content);
                cache.push(page);

                }
            },
...

Lastly we add the ability to enter bx where x is the bookmark number we want to go to. Voila! We now have bookmarking functionality.

We were able to do this quickly because now we have the major structures of our client done and can now just focus on the tweaks.

Now let's look at writing the bookmarks out to the hard drive and then loading it in as well. This way we can have bookmarks that are persistent!

Persistent Bookmarks

The first thing we need to do is save bookmarks to a file.

...
use std::io;
use std::fs::{OpenOptions, File};
use std::io::{Read, Write};
...
fn save_in_file(path: &str, text: &String) {
    let mut file_handle = OpenOptions::new()
        .read(true)
        .append(true)
        .create(true)
        .open(path)
        .unwrap();

    writeln!(file_handle, "{}", text).unwrap();
}
...
fn main {
...
    let bookmark_path = "/home/nivethan/gemini.bookmarks";
...
            "add" => {
                if cache.len() > 0 {
                    let page = cache.last().unwrap().url.request().trim().to_string();
                    save_in_file(bookmark_path, &page);
                    println!("Added {}", page);
                    bookmarks.push(page);

                } else {
                    println!("Nothing to bookmark.");
                }
            },
...

We include the fs module and we write a save function that will take in a path to a file a string. Our bookmark file will be just a list of strings that we want to be able to quickly access.

In our main function we set up the bookmark path, make sure to create the bookmark file as rust doesn't seem to do this automatically if the file doesn't already exist. I'm not sure why as the create option in our save_in_file function is set to true. It may be a weird interaction with Windows Subsytem for Linux. Let me know in the comments!

Now the next thing is we update our add option so that along with saving the page in our bookmark list we also write it out to the file.

Now let's look at loading bookmarks in!

fn load_file(path: &str) -> Vec<String> {
    let mut lines :Vec<String> = vec![];

    let mut file_handle = OpenOptions::new()
        .read(true)
        .append(true)
        .create(true)
        .open(path)
        .unwrap();

    let mut data = String::new();
    file_handle.read_to_string(&mut data).unwrap();
    let content_lines: Vec<&str> = data.split("\n").collect();

    for text in content_lines {
        if text != "" {
            lines.push(text.to_string());
        }
    }

    lines
}

Our load function takes in a path and opens the file for just reading. We read the data in the file and split it by new lines. We then loop through the lines and add it to our vector of lines.

...
fn main() {
    let prompt = "\x1b[92m>\x1b[0m";
    let mut cache: Vec<Page> = vec![];

    let bookmark_path = "/home/nivethan/gemini.bookmarks";
    let mut bookmarks = load_file(bookmark_path);
...

Now instead of initializing our bookmarks with an empty list, we use our load_file function to initialize our bookmarks.

Voila! We now have bookmarks that are persistent. Test it out!

We will add one more option in the next chapter, we're almost done!

Saving Gemini Pages

Hello! Our journey is coming to an end! We are almost done now and I hope you'll build on our client as you use it. I think there is something very powerful about being able to modify your software to fit your very specific needs.

One such need for myself is to e-mail what I find in Gemini space. I can't exactly give someone a URL! So in this chapter we will implement a save function that will simply save the current page to a text file. If we wanted to get fancy, we could have the mailing happen right in our client!

Saving a Gemini Page

...
            "save" => {
                if cache.len() > 0 {
                    let page = cache.last().unwrap();
                    let file_name = format!("/home/nivethan/gemini/{}.txt",
                        page.url.request().trim().replace("/", "-"));
                    let mut f = File::create(&file_name).unwrap();
                    for line in &page.lines {
                        writeln!(f, "{}", line).unwrap();
                    }
                    println!("Saved {}!", file_name);

                } else {
                    println!("Nothing to save.");
                }
            },
...

Here we have a very simple function to save the current page we're on. We get the page we're on from the cache and we create a file naming it the page's url (thought with the / replaced with -). We then write out the lines of the Gemini page into our newly created file.

Once again, you'll need to make sure the path exists!

Now that we have our client working the way we want it to, we still have one glaring piece we need to sort out. We have currently hard coded the mime types to be text/gemini. We did all our page processing and caching logic on that premise which may not always be true!

Let's fix that!

Handling Mimetypes

Hello! We are now almost done. We just have one piece left to implement. Currently we have hardcoded that all responses we get from Gemini server are text/gemini.

In this chapter we are going parse the Gemini response properly and then based on mime type, do different things. If the mime type is text/gemini we are going to do what we currently do which is parse the page and display links properly.

If the mime type is text/*, we are going to just display the text on the screen with no formatting.

Anything else and we will save the the contents of that file to the disk and leave it up to the user to deal with.

Let's get started!

Handling Mime Types

We have a few different places we can handle mime types but the place that makes the most sense to me is in our visit function.

...
fn visit(url: &Url) -> String {
...
        '2' => {
            if response.body == None {
                client.read_tls(&mut socket).unwrap();
                client.process_new_packets().unwrap();
                let mut data = Vec::new();
                let _ = client.read_to_end(&mut data);

                response.body =  Some(String::from_utf8_lossy(&data).to_string());
            }

            let mime_type = response.mime_type;

            if mime_type != None {
                let mime = mime_type.unwrap();

                match  mime.as_str() {
                    "text/gemini" => response.body.unwrap_or("".to_string()),
                    _ if mime.starts_with("text/") => {
                        println!("{}", response.body.unwrap());
                        format!("Requested page was {}.", mime)
                    },
                    _ => {
                        let file_name = format!("/home/nivethan/gemini/unknown/{}.unknown",
                            url.request().trim().replace("/", "-"));
                        let mut f = File::create(&file_name).unwrap();
                        f.write(response.body.unwrap().as_bytes()).unwrap();
                        println!("Saved {}!", file_name);
                        format!("Requested page was {}.", mime)
                    }
                }

            } else {
                response.body.unwrap_or("".to_string())
            }
        },
...

In our visit function, we had set it so that on success we poll the TLS session for any data if we're missing the response body. Once we have the response.body, we return that back to wherever we called our visit function.

Now we are going to check the mimetype instead of immediately passing the data back. If the mimetype is anything other than text/gemini, we are going to send back a message explaining that the request wasn't a gemini request and that we did something else.

The first thing we do is make sure our mime type isn't None. If it is we assume the mime type is text/gemini. This is part of the gemini specification.

Next we match mime against the various types.

First we match against text/gemini, here we return the response.body so that the page can be parsed later on.

Next we match against text/*, any mime type starting with text/ will be printed directly to the screen. We send back a message letting the user know that we did something else. This message will now appear when the user enters ls or more.

The catch all case is that that we save the data directly to the hard drive. We save this with an extension, .unknown. We could map out the various mime types to their extensions but we'll stick with this to keep things simple.

With that we have the various mime types being handled! Not well but handled!

Now let's parse the Status!

Parsing the Status

Now that we have the handling of mime types done, we just need to parse the status information into the Response object.

...
impl Response {
    fn new(data: String) -> Self {
        let tokens: Vec<&str> = data.splitn(2, "\r\n").collect();
        let status = Status::new(tokens[0].to_string());

        match status.code.chars().next().unwrap() {
            '2' => {
                let meta_tokens: Vec<&str> = status.meta.split(";").collect();

                let mut mime_type = None;
                let mut charset = None;
                let mut lang = None;

                for meta_token in meta_tokens {
                    if meta_token.contains("/") {
                        mime_type = Some(meta_token.trim().to_string());
                    }
                    if meta_token.contains("charset") {
                        charset = Some(meta_token.trim().to_string());
                    }
                    if meta_token.contains("lang") {
                        lang = Some(meta_token.trim().to_string());
                    }
                }

                let body;
                if tokens[1] != "" {
                    body = Some(tokens[1].to_string());
                } else {
                    body = None;
                }

                Response { status, mime_type, charset, lang, body }
            },
            _ => {
                Response { status, mime_type: None, charset: None, lang: None, body: None }
            }
        }
    }
}
...

The key part is that the meta field can have 3 parts to it, separated by a semi-colon. We can have the mime type, charset and lang parameter. We can also have none of these!

We initialize these attributes to None first and we split the meta field on the semi-colon.

We loop through the meta_tokens and we check to see if we have the mime type, charset or lang parameter. If we do, we update our variables.

Voila! We have processed the Gemini meta line now.

! We can test everything by hard coding in a mime type and changing the status.meta.split() call to our hard coded mime type.

...
                let mime = "text/";
                let meta_tokens: Vec<&str> = mime.split(";").collect();
...

We should see our gemini responses being printed directly to the screen!

With that we are now done our basic client!

We have a Gemini client that can visit Gemini servers, display them properly, follow links, bookmark pages, save pages and handle the various mime types!

Whew! That was a trek, in the next chapter let's debrief.

See you soon!

Conclusion

Hello! We are now on our final chapter! We aren't going to be doing anything here so settle in and relax. We'll just go over the tutorial and go over some of my thoughts about it.

Thanks for sticking with me!

Thoughts on the Gemini Client

We now have a very basic Gemini client and we can traverse Gemini space easily. We also now know exactly how our client works so when we run into something we don't like in it's behavior, we can fix it! This is a powerful ability.

Surprisingly I ended up with around 550 lines of code for the Gemini client and about 500 lines of code for the Gopher client. I had expected the difference to be larger. I thought the Gemini client was much harder, dealing with TLS, parsing Gemini pages and handling the statuses was all cognitively harder to get a hold on but the code count doesn't seem to show that. Gemini is a more complex protocol than Gopher but it is still relatively simple. You can definitely hold the entire spec in your head!

The hardest part of Gemini was indeed the TLS logic, it was likely because we used rust and rustls that it took longer than I thought it would. I'm curious how other languages and crates deal with TLS and if maybe I had made it more difficult for myself.

One thing I wish I did was to handle mime types earlier. Full disclosure, I had forgotten the mime types! If it was earlier I think I could have found a better place to handle them, currently it is a bit of hack. We do our processing in the visit function and then send back a message via the response.body. I don't like reusing parts of an object to send back messages. That isn't what that part of the object was intended for.

Some enhancement ideas, feel free to leave your own below as well!

  • Support for arrow keys to go through a command history
  • Command completion
  • Different colours for valid and invalid commands
  • Ability to follow a link and then when we do cd .. to go back, to have the page display from where we left off.
  • Display location in the prompt
  • Word wrapping. I really like Gopher's 80 character style.
  • Instead of showing the next page, allow use for arrow keys for line by line

Making the Gopher client and the Gemini client was fun, the Gopher client was done in a weekend, the Gemini client I would say would take 2 weekends, maybe 3 if you run into problems with TLS.I still like Gopher more, the idea of everything being part of the filesystem I resonate with whereas Gemini feels a little too web like. But I need to give Gemini more of a chance.

I am enjoying working with Rust, and I think string handling and pattern matching I have a light handle on. File IO is straightforward and even web scraping is doable. It's not as intuitive as python or javascript but I think more time spent with rust as a primary language could be good.

Thank you for reading!