Tuesday | 05 NOV 2024
[ previous ]
[ next ]

Forking in ScarletDME

Title:
Date: 2023-11-13
Tags:  scarletdme, zig

Terrifying! I have successfully managed to **** things up enough that I have forking working in ScarletDME and accessible via the FORK() function.

The reason why this is so cursed is that I got it working by ignoring all the stuff that goes into closing the qm kernel. The hope is that the parent is doing all the clean up and that the child is free to make a mess.

Let's step back though and I'll walk you through how I got this hare-brained idea and why I added it.

History

The year is 2022 and I'm writing a web server in Pick BASIC for D3. The server works and I'm happy with it as an API end point. 2023 rolls around and I write the web server for UniVerse and do a full rewrite. I add support for D3 and ScarletDME though it's only really tested on UniVerse and ScarletDME. The big problem is the traditional problem of a web server. Take an end point and put a sleep in it and make a request. Now make another one. That second request has to wait for the first to finish before it will get taken care of.

The solution to this problem is some form of threading or events system. You want to be able to handle multiple connections in an application like a web server.

Phantoms

Pick and BASIC don't have the concept of threading. Everything is by and large single threaded and if you really need to spin off a job, you call PHANTOM and give it a program to run. PHANTOM under the hood is a fork and execl which means that you can only pass things that can be serialized.

An example of this is sending out 500 e-mails. You can generate the entire e-mail and write it out to a file and then you can call your mailing routine with the PHANTOM keyword. The mailing routine will pick the first thing in the file and mail it. If you call 10 phantoms then you can run 10 e-mails at the same time. However you are passing information through the file.

This doesn't work for sockets. You can't serialize a socket and pick it up in another context somwhere else.

I originally came up with a design where I have a server socket listening for connections. When a client connects, you would fork the client connection off and let the child process handle the client. The parent process will whirr along waiting for the next client to connect. In Pick, however you can't do that. Instead what I did was close the server socket when the client connected and then called the PHANTOM on the web server. This way the parent process would continue processing the client. The phantom process will start the server and listen for new requests.

This worked well enough but it meant the server would go down and be unavailable for moments. This required me to have caddy in front of my server so that it would hold on to requests that came in when the server went down and it would retry them once the server came back up.

This strategy worked and I used it for some time.

Secure Sockets

Then came SSL. I wanted to do the SSL termination in my web server and to do that required secure sockets. Now UniVerse and D3 have secure sockets but QM and ScarletDME don't have secure sockets. This meant that I would need to first implement secure sockets in ScarletDME and then I could make my web server handle SSL connections.

This was a relatively involved process of adding mbedtls and making sure the SSL context that gets generated is properly passed around the different layers of ScarletDME. However once I got it working, it was actually a very easy update to add support for SSL inside my web server.

This is where I hit the snag that leads to this post. Now that I have a context that gets passed from the server socket to the client socket, this means that my strategy of shutting down the server and phantoming a new one no longer works. When the server socket is shutdown so is the SSL context that it is holding. This means the client's context is also lost and the connection is screwed.

In the single threaded mode my server works perfectly but when the phantoming is turned on, everything starts to break because the context is getting lost.

This ultimately drove my to remove the phantoming logic as it simply doesn't work for secure sockets. I was happy to see the code go as I wasn't happy with the way I was doing the phantoming. It required the use of caddy to hold requests and so I always considered this solution as a stop-gap until I found the real solution. Now that I had secure sockets, this seemed like the perfect time to see if I could get something that was better.

In my eyes, the real solution was to somehow add threading to Pick. This is going to make my server very much specific to ScarletDME but I think that is a fair trade off to be able to run a web server properly from Pick BASIC.

Multithreading

Before I get to my solution, OpenQM now does have support for both previously mentioned styles of concurrency. You can pass a socket in through a phantom by setting the mode of a socket to inheritable. This is limited though.

You can also use select to handle multiple sockets where an event will be sent when something is ready to be processed.

Both of these options however don't exist in the open source version. The other thing is that both of these options are very specifically for sockets. They aren't general purpose solutions so you still can't create threads.

With that out of the way, my solution was to simply expose the fork system call directly to BASIC. I had originally looked at exposing pthreads by exposing the Zig standard library's Thread implementation but that looked complex. In my eyes, the problem is neatly solved by cloning the process memory and all and letting it just run wild. It also helped that fork and checking the pid is a very easy thing to mentally think about. Forget the footguns and pitfalls. Let's pretend the world is as simple as I'm saying.

I want to write something like the following in BASIC:

   PID = FORK()
*
   IF PID = 0 THEN
      SLEEP 2
      PRINT 'THIS IS THE CHILD PROCESS'
*
   END ELSE
      PRINT 'THIS IS THE PARENT PROCESS'
   END
*
   END

This is pretty much what you would write in C or in Zig. There isn't a real abstraction over the underlying fork.

I implemented this quickly as all this requires is adding a new function that I have done a few times now and then wiring it up in BASIC. Once I got the FORK function working, I could see that everything was working properly.

Then something reared its ugly head. ScarletDME. This program would completely break the environment and sometimes even stopping and starting scarletdme wasn't enough. I had to reboot my server to get everything to a clean state.

ScarletDME was a database and an environment. It wasn't just a program. When my FORK function got called and the fork system call ran, it wasn't cloning just the BASIC environment. It was cloning the entire process, everything from Linux, to the database, to the program.

This meant that when the child process ended, it wasn't just ending the program and releasing the memory that it had. It was also exiting the database. When the kernel, the core of ScarletDME, exits, it runs some clean up functions. This clean up process was wiping out shared memory segments, shared memory segments that the parent process still needed.

I realized that I needed to add an extra special exit. I need to somehow trigger an exit that would skip the clean up portion that happens after the kernel exits.

This involved wiring in a new exit, and also updating the core kernel of ScarletDME so that when it exits, it knows if it is exiting a child process or if it's exiting a regular process.

Surprisingly, especially to me, was that that this worked. On an intuitive level, it makes sense. If the child process does no clean up, then the parent process will still have access to everything it has access to.

   PID = FORK()
*
   IF PID = 0 THEN
      SLEEP 2
      PRINT 'THIS IS THE CHILD PROCESS'
      EXIT.CHILD
*
   END ELSE
      PRINT 'THIS IS THE PARENT PROCESS'
   END
*
   END

I added the EXIT.CHILD statement and this lets the kernel know that a child process is being closed out. I still need to do something to check if a process is a child as I don't want this statement to be accidently triggered in a parent process. That would result in the memory not being released and I imagine that would have all sorts of implications.

The Process

I'll outline the changes I made to get the EXIT.CHILD statement to work. You can read the other posts about adding a function to see how the FORK function was made but that one was simple.

ScarletDME - Adding a Function

The first step was to write the Zig code that is going be called when EXIT.CHILD is hit:

./src/op_misc.zig

export fn op_exitchild() void {
    qm.k_exit_cause = qm.K_EXIT_CHILD;
}

All this function does is set the cause of the exit. It takes no parameters and it doesn't return anything.

Update the opcodes:

./gplsrc/opcodes.h

_opc_(0xCFFA, OP_FORK,     "FORK",     op_fork,  OPCODE_BYTE,         -1)
_opc_(0xCFFB, OP_EXITCHILD,     "EXITCHILD",     op_exitchild,  OPCODE_BYTE,         0)
_opc_(0xCFFC, OP_CFFC,     "OPCFFC",     op_illegal2,  OPCODE_BYTE,         0)
_opc_(0xCFFD, OP_CFFD,     "OPCFFD",     op_illegal2,  OPCODE_BYTE,         0)

Copy the opcodes.h file to /usr/qmsys/gplsrc/. Then we start up ScarletDME and re-generate the OPCODES.H file in GPL.BP. This is done by running the OPGEN routine in GPL.BP. The key thing here is that you need to be in the system account.

cd /usr/qmsys
qm -Internal
RUN BP OPGEN

This will create the OPCODES.H file which is included in BCOMP, the BASIC compiler.

This takes care of the Zig side of the code. We need to also wire up BCOMP so that the BASIC compiler knows what to do with EXIT.CHILD.

The first thing is to update the list of statements. BASIC functions are updated in the list of intrinsics while the statements have their own variable:

./qmsys/GPL.BP/BCOMP

   statements := @fm:"ECHO":@fm:"ELSE":@fm:"END":@fm:"ENTER":@fm:"ERRMSG"
   statements := @fm:"EXECUTE":@fm:"EXIT":@fm:"EXIT.CHILD"
   statements := @fm:"FILE":@fm:"FILELOCK":@fm:"FILEUNLOCK":@fm:"FIND"

Next we add the statement to the switch:

./qmsys/GPL.BP/BCOMP

         locate u.token.string in statements<1> setting i then
            if debug then gosub generate.debug
            on i gosub st.abort,
                 ...
                 st.errmsg,
                 st.execute,
                 st.exit,
                 st.exitchild,
                 st.file,
                 st.filelock,

This switch will trigger an internal subroutine for each statement. In our case when EXIT.CHILD is hit, it will go to st.exitchild.

st.exitchild looks like:

./qmsys/GPL.BP/BCOMP

*****************************************************************************
* ST.EXITCHILD -  EXIT.CHILD statement

st.exitchild:
   opcode.byte = OP.EXITCHILD
   gosub emit.simple
   
   return
   
*****************************************************************************
* ST.CONVERT  -  CONVERT statement

st.convert:
   gosub exprf                  ;* Characters to replace
   
   if u.look.ahead.token.string # "TO" then goto err.to
   gosub get.token             ;* Skip TO
   gosub exprf                  ;* Replacement characters
   if u.look.ahead.token.string # "IN" then goto err.in
   gosub get.token             ;* Skip IN
   gosub simple.lvar.reference  ;* Variable to update
   opcode.byte = OP.CONVERT ; gosub emit.simple
   
   return
   
*****************************************************************************

This sets the opcode to OP.EXITCHILD which matches what we put in opcodes.h.

Now we are almost done. We need to add the new exit cause, K_EXIT_CHILD.

We need to update the kernel.h file:

./gplsrc/kernel.h

/* Normal exit causes (some also in BP int$keys.h) */
#define K_RETURN 0x0001         /* Return to caller */
#define K_STOP 0x0002           /* STOP or Q from break actions */
#define K_ABORT 0x0003          /* ABORT or A from break actions */
#define K_CHAIN 0x0004          /* Chain a new CPROC command, not a PROC */
#define K_EXIT_RECURSIVE 0x0005 /* RETURN from recursive code */
#define K_LOGOUT 0x0006         /* Logout process on final return */
#define K_TOGGLE_TRACER 0x0007  /* Enter or leave trace mode */
#define K_CHAIN_PROC 0x0010     /* Chain to a PROC */
#define K_EXIT_CHILD 0x0011     /* Exit a forked child */

Now that we have a new exit, we can the update kernel.c. In kernel.c we need to check the exit cause in two places. We need to first check it in k_run and if we get this cause then we want to exit k_run and go back to the kernel. Then in the kernel we will check the exit cause one more time.

./gplsrc/kernel.c

      case K_ABORT:
      case K_LOGOUT:
      case K_TERMINATE:
        if (my_uptr->lockwait_index)
          clear_lockwait();
        Element(process.syscom, SYSCOM_ITYPE_MODE)->data.value = 0;
        longjmp(k_exit, k_exit_cause);
        break;
        
      case K_EXIT_CHILD:
        longjmp(k_exit, k_exit_cause);
        break;
        

In the case of K_EXIT_CHILD, we will break out to the jmp point k_exit.

This chunk of code will get us out of k_run, k_run is what runs BASIC programs.

Once we exit the program, we still need to exit the database. This is where we want to skip all the clean up.

./gplsrc/kernel.c

      case K_TERMINATE: /* Forced logout of process */
        txn_abort();
        /* Cast off all but bottom level process (which must be a command
           processor), decrementing command level for each stacked processor */
           
        while (process.call_depth > 1) {
          if (process.program.flags & HDR_IS_CPROC)
            cproc_level--;
          k_return();
        }
        process.k_abort_code = 3; /* Set @ABORT.CODE */
        break;
        
      case K_LOGOUT: /* Immediate termination of process */
        txn_abort();
        kill_process();
        goto exit_kernel;
        
      case K_EXIT_CHILD: /* Immediate termination of forked child*/
        is_forked = 1;
        goto exit_kernel;

Here we set the is_forked flag to 1.

The big change here is that the kernel() function is currently void but we need to make it return an int. I want to return is_forked higher up so that the function that starts the kernel can handle is_forked.

./gplsrc/qm.c

int main(int argc, char *argv[]) {
  ...
  
  int is_forked = kernel(); /* Run the command processor */
  
  if (is_forked == 0) {
    s_free_all(); /* Only really needed for MEMTRACE */
    status = exit_status;
    clean_stop();
  }
  
  return status;
}

Here we update the main function such that it gets the is_forked value and based on that it triggers the clean up.

With that we are done! ScarletDME can now be built and it will have have the fork function that spins off a child process and an EXIT.CHILD statement that will forego cleaning things up.

Now the above program should work and it shouldn't break anything in the ScarletDME environment.

Closing Thoughts

My biggest issue with my implementation is that the child process does no clean up. I can already see this being a problem as I'm going to be forking from my web server and the goal is to have the parent process running indefinitely. This means that whatever stuff the child processes pollutes won't be getting cleaned up ever.

This is probably going to result in a memory leak.

The other thing is that I have FORK working in the general case. I can write a BASIC program and call FORK and this is going to be wild. What happens when there is a MATREADU before a FORK? Two processes are going to have exclusive access to a record. This implementation of FORK is dangerous and it is very much up to the programmer to handle it with care and to think through the reprecussions. Based on my self, this is a bad idea.

However, I did use this logic in my web server and now I have a truly multithreaded server written in Pick BASIC. At this point I think I could serve my blog out fully with my web server with no intermediary!

I need to still think about what a perfect solution might be and hopefully this post will become outdated.