Monday | 15 APR 2024
[ previous ]
[ next ]

# Notes - Shunting Yard Algorithm

Title:
Date: 2023-11-26
Tags:  algorithms

The shunting yard algorithm is an algorithm to take an infix expression and convert it to reverse polish notation. This way we can then use a stack to process the expression and make evaluation much simpler.

I've written this algorithm before but usually I take what is on wikipedia and translate it. I'm hoping that writing it down this time will help intuit what is going on with the stack juggling and taking an infix expression and converting it.

I also want to extend this algorithm to handle functions, array subscripts and string subscripts for BASIC specifically.

My main resources are the wikipedia pages for the Shunting Yard Algorithm and the Operator Precedence page.

This was much simpler to write and reason about once I wrote my stack implementation.

The first step is to set up two stacks. The first stack will be our output stack, this is what will hold the final reverse polish notation. The second stack is the operator stack. This is where operators will move in an out of.

``````   DIM OUT.STACK(STACK.SIZE)
MAT OUT.STACK = ''
*
DIM OP.STACK(STACK.SIZE)
MAT OP.STACK = ''
*
PARSE.TOKENS.LEN = DCOUNT(EXPRESSION,@AM)
*
``````

Now that we have everything set up, we can then loop through a list of tokens and process them. The first thing we need to do is get the current token and check if it is an operator or a regular value. If it is a regular value, we will push it on to the output stack. If it's an operator we will push it to the operator stack.

``````      LOCATE(PARSE.TOKEN,OPERATORS,1;OP.POS) ELSE
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,PARSE.TOKEN)
CONTINUE
END
*
``````

This is for illustration purposes, the full code is more involved and is below. This get's point across though. If the token isn't an operator, then we add it to the output stack.

Now if the token is an operator we have something to do. We will add the token to the operator stack. However before we do, we'll check if there are any operators in the stack that have a higher precedence than the token we are currently processing. If there is a token with a higher precedence, we will move it to the output stack. We will keep doing this until we get to a token with a lower precedence.

Once we have a token left on the operator stack that is lower than the new token, we will then add the new token to the operator stack.

That is the core of the shunting yard algorithm.

``````      LOOP
IF OP.STACK(1) > 1 THEN
CALL STACK.PEEK(MAT OP.STACK,STACK.SIZE,VALUE)
CURRENT.PRECEDENCE = VALUE<1,2>
END ELSE
CURRENT.PRECEDENCE = 0
END
*
IF NEW.ASSOC = RIGHT THEN
POP.FLAG = CURRENT.PRECEDENCE <= NEW.PRECEDENCE
END ELSE
POP.FLAG = CURRENT.PRECEDENCE < NEW.PRECEDENCE
END
*
UNTIL POP.FLAG DO
CALL STACK.POP(MAT OP.STACK,STACK.SIZE,VALUE)
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,VALUE<1,1>)
REPEAT
*
VALUE = PARSE.TOKEN : @VM : NEW.PRECEDENCE
CALL STACK.PUSH(MAT OP.STACK,STACK.SIZE,VALUE)
``````

Here we loop over the operator stack and move an operator if it has a higher precedence than the new token we are adding.

We also have some logic to check the associativity. If it is right associative like exponents, we want to move the operator if it equal precedence, if it's left associative then we want it to be when the operator precedence is less than the new token we are adding.

We follow this logic for all the tokens in the expression. At the end we will have an output stack and an operator stack.

The final step is to move everything in the operator stack to the output stack.

``````   FOR OP.STACK.CTR = OP.STACK(1) TO 2 STEP -1
CALL STACK.POP(MAT OP.STACK,STACK.SIZE,VALUE)
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,VALUE<1,1>)
NEXT OP.CTR
*
``````

Voila! With that the infix expression is now made into a postfix expression that will be easier to evaluate.

The full code is below and this handles the [] substring operator, <> the array operator and the () function operators. This is better than my previous attempt at parsing and evaluating expressions as this moves the logic into the postfix notation rather than having more parsing logic deeper down in the evaluate function.

``````*
EQU DEBUG.STACK TO 1
*
EQU TRUE TO 1
EQU FALSE TO 0
*
EQU LEFT TO 0
EQU RIGHT TO 1
*
FUNCTION.PRECEDENCE = 15
*
OPERATORS = ''
OPERATORS<1> = '+ - * / ** [] <>'
OPERATORS<2> = '12 12 13 13 14 15 15'
OPERATORS<3> = '0 0 0 0 1 0'
*
CONVERT ' ' TO @VM IN OPERATORS<1>
CONVERT ' ' TO @VM IN OPERATORS<2>
CONVERT ' ' TO @VM IN OPERATORS<3>
*
EQU STACK.SIZE TO 15
*
EXPRESSION = ''
*
EXPRESSION<-1> = '3'
EXPRESSION<-1> = '+'
EXPRESSION<-1> = '4'
EXPRESSION<-1> = '*'
EXPRESSION<-1> = '2'
EXPRESSION<-1> = '/'
EXPRESSION<-1> = '('
EXPRESSION<-1> = '1'
EXPRESSION<-1> = '-'
EXPRESSION<-1> = '5'
EXPRESSION<-1> = ')'
EXPRESSION<-1> = '**'
EXPRESSION<-1> = '2'
EXPRESSION<-1> = '**'
EXPRESSION<-1> = '3'
*
PRINT 'Expression: ' : EXPRESSION
*
GOSUB PARSE.EXPRESSION
*
CALL PRINT.STACK(MAT OUT.STACK,STACK.SIZE)
*
STOP
*
*********************  S U B R O U T I N E  *********************
*
PARSE.EXPRESSION:NULL
*
DIM OUT.STACK(STACK.SIZE)
MAT OUT.STACK = ''
*
DIM OP.STACK(STACK.SIZE)
MAT OP.STACK = ''
*
PARSE.TOKENS.LEN = DCOUNT(EXPRESSION,@AM)
*
FOR PARSE.CTR = 1 TO PARSE.TOKENS.LEN
IF DEBUG.STACK THEN
PRINT 'Output:' :
CALL PRINT.STACK(MAT OUT.STACK,STACK.SIZE)
*
PRINT 'Operator:' :
CALL PRINT.STACK(MAT OP.STACK,STACK.SIZE)
PRINT
END
*
PARSE.TOKEN = EXPRESSION<PARSE.CTR>
NEXT.TOKEN = EXPRESSION<PARSE.CTR+1>
*
IF PARSE.TOKEN = ',' THEN
CONTINUE
END
*
IF PARSE.TOKEN = '(' THEN
CALL STACK.PUSH(MAT OP.STACK,STACK.SIZE,PARSE.TOKEN)
CONTINUE
END
*
IF PARSE.TOKEN = ')' THEN
LOOP
CALL STACK.POP(MAT OP.STACK,STACK.SIZE,VALUE)
UNTIL VALUE<1,1> = '(' DO
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,VALUE<1,1>)
REPEAT
CONTINUE
END
*
IF PARSE.TOKEN = '[' THEN
VALUE = PARSE.TOKEN : @VM : 0
CALL STACK.PUSH(MAT OP.STACK,STACK.SIZE,VALUE)
CONTINUE
END
*
IF PARSE.TOKEN = ']' THEN
LOOP
CALL STACK.POP(MAT OP.STACK,STACK.SIZE,VALUE)
UNTIL VALUE<1,1> = '[' DO
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,VALUE<1,1>)
REPEAT
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,'[]')
CONTINUE
END
*
IF PARSE.TOKEN = '<' THEN
VALUE = PARSE.TOKEN : @VM : 0
CALL STACK.PUSH(MAT OP.STACK,STACK.SIZE,VALUE)
CONTINUE
END
*
IF PARSE.TOKEN = '>' THEN
LOOP
CALL STACK.POP(MAT OP.STACK,STACK.SIZE,VALUE)
UNTIL VALUE<1,1> = '<' DO
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,VALUE<1,1>)
REPEAT
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,'<>')
CONTINUE
END
*
LOCATE(PARSE.TOKEN,OPERATORS,1;OP.POS) ELSE
*
IS.FUNCTION = FALSE
IF NEXT.TOKEN = '(' THEN
IS.FUNCTION = TRUE
END
*
IF IS.FUNCTION THEN
VALUE = PARSE.TOKEN : @VM : FUNCTION.PRECEDENCE
CALL STACK.PUSH(MAT OP.STACK,STACK.SIZE,VALUE)
END ELSE
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,PARSE.TOKEN)
END
*
CONTINUE
END
*
NEW.PRECEDENCE = OPERATORS<2,OP.POS>
NEW.ASSOC = OPERATORS<3,OP.POS>
*
LOOP
IF OP.STACK(1) > 1 THEN
CALL STACK.PEEK(MAT OP.STACK,STACK.SIZE,VALUE)
CURRENT.PRECEDENCE = VALUE<1,2>
END ELSE
CURRENT.PRECEDENCE = 0
END
*
IF NEW.ASSOC = RIGHT THEN
POP.FLAG = CURRENT.PRECEDENCE <= NEW.PRECEDENCE
END ELSE
POP.FLAG = CURRENT.PRECEDENCE < NEW.PRECEDENCE
END
*
UNTIL POP.FLAG DO
CALL STACK.POP(MAT OP.STACK,STACK.SIZE,VALUE)
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,VALUE<1,1>)
REPEAT
*
VALUE = PARSE.TOKEN : @VM : NEW.PRECEDENCE
CALL STACK.PUSH(MAT OP.STACK,STACK.SIZE,VALUE)
NEXT PARSE.CTR
*
FOR OP.STACK.CTR = OP.STACK(1) TO 2 STEP -1
CALL STACK.POP(MAT OP.STACK,STACK.SIZE,VALUE)
CALL STACK.PUSH(MAT OUT.STACK,STACK.SIZE,VALUE<1,1>)
NEXT OP.CTR
*
RETURN
*
* END OF PROGRAM
*
END
*
``````