I want to write a GREP function for BASIC, there is a SEARCH keyword but it doesn't let you do wildcards and so it is quite limited. I think having a regex search would be helpful, especially when I usually already know what I'm looking for.
I translated Rob Pike's famous example of a regex to BASIC and it was pretty straightforward. The most frustrating thing however is that I don't understand it intuively. The translation looks good and seems to be worknig but the recursive elements are escaping me even now. I may have gotten dumber!
Here is Brian Kernighan's Exegesis which shows and explains Rob Pike's regular expression matcher. I want to understand this truly at some point, I want to be able to re-invent it now that I know it exists.
https://www.cs.princeton.edu/courses/archive/spr09/cos333/beautiful.html
A possible future goal would be to see if I can collapse this recursive element into a single file so that I can use it in my editor. Currently I do straight string matches but it would be cool to be able to do a regex from with in my editor.
This the the MATCH function. This is what starts of the regex routines.
SUBROUTINE MATCH(REGEXP,TEXT,RESULT)
*
IF REGEXP[1,1] = '^' THEN
CALL MATCH.HERE(REGEXP[2,999],TEXT,RESULT)
RETURN
END
*
TEXT.LEN = LEN(TEXT)
*
FOR I = 1 TO TEXT.LEN
CALL MATCH.HERE(REGEXP,TEXT[I,999],MH.RESULT)
*
IF MH.RESULT = 1 THEN
RESULT = 1
RETURN
END
NEXT I
*
RESULT = 0
*
RETURN
*
* END OF PROGRAM
*
END
*
This is the MATCH.HERE function and this is the core engine of the regex logic.
SUBROUTINE MATCH.HERE(REGEXP,TEXT,RESULT)
*
IF REGEXP[1,1] = '' THEN
RESULT = 1
RETURN
END
*
IF REGEXP[2,1] = '*' THEN
CALL MATCH.STAR(REGEXP[1,1],REGEXP[3,999],TEXT,RESULT)
RETURN
END
*
IF REGEXP[1,1] = '$' AND LEN(REGEXP) = 1 THEN
IF TEXT = '' THEN
RESULT = 1
RETURN
END ELSE
RESULT = 0
RETURN
END
END
*
IF TEXT # '' AND (REGEXP[1,1] = '.' OR REGEXP[1,1] = TEXT[1,1]) THEN
CALL MATCH.HERE(REGEXP[2,999],TEXT[2,999],RESULT)
RETURN
END
*
RESULT = 0
*
RETURN
*
* END OF PROGRAM
*
END
*
Lastly we have the MATCH.STAR function to handle the wildcards in regex.
SUBROUTINE MATCH.STAR(C,REGEXP,TEXT,RESULT)
*
TEXT.LEN = LEN(TEXT)
*
FOR I = 1 TO TEXT.LEN
CALL MATCH.HERE(REGEXP,TEXT[I,999],MH.RESULT)
*
IF MH.RESULT = 1 THEN
RESULT = 1
RETURN
END
*
IF TEXT[I,1] # C AND C # '.' THEN
EXIT
END
NEXT I
*
RESULT = 0
*
RETURN
*
* END OF PROGAM
*
END
*
I wired up the matcher to my GREP program and everything works! Too bad it is stupid slow. The recursive calls are taking too much time from a preliminary test and I'll need to figure out why. It might be that calling subroutines is pretty expensive.
I wired up the matcher in an internal subroutine so I could quickly switch between using the INDEX function and compare it to the matcher function. The difference is huge. I think an improvement will be to remove the recursve aspect and see if I can do it all within this one program.
*
GIT.FILENAME = 'GREP'
GIT.REPO = 'https://github.com/Krowemoh/TCL-Utilities.git'
*
* VERSION
*
VERSION = '1'
*
@USER1 = 'GREP'
@USER2 = 'GREP'
*
EQU TRUE TO 1
EQU FALSE TO 0
*
CALL GET.ARGUMENTS(ARGUMENTS)
*
DICT = ''
*
IF ARGUMENTS<2> = 'DICT' THEN
ARGUMENTS = DELETE(ARGUMENTS,2)
DICT = 'DICT'
END
*
ARGS.LEN = DCOUNT(ARGUMENTS,@AM)
*
IF ARGS.LEN = 1 THEN
PRINT 'GREP - Regular Expression Search'
PRINT
PRINT 'GREP [DICT] {FILE} {RECORD.ID} {REGEX}'
PRINT
STOP
END
*
IF ARGS.LEN = 2 THEN
PRINT 'Invalid number of args.'
STOP
END
*
FILE.NAME = ARGUMENTS<2>
*
OPEN DICT,FILE.NAME TO FILE ELSE
PRINT 'Unable to open file: ' : FILE.NAME
STOP
END
*
IF ARGS.LEN = 3 THEN
REGEXP = ARGUMENTS<3>
SELECT FILE
*
END ELSE IF ARGS.LEN = 4 THEN
RECORD.ID = ARGUMENTS<3>
REGEXP = ARGUMENTS<4>
SELECT RECORD.ID
END
*
IF REGEXP[1,1] = '"' AND REGEXP[LEN(REGEXP),1] = '"' THEN
REGEXP = REGEXP[2,LEN(REGEXP)-2]
END
*
DIM CONTENT(10000)
MAT CONTENT = ''
*
LOOP
READNEXT ITEM.ID ELSE ITEM.ID = ''
UNTIL ITEM.ID = '' DO
MATREAD CONTENT FROM FILE,ITEM.ID ELSE MAT CONTENT = ''
NUMBER.OF.LINES = INMAT()
*
FOR I = 1 TO NUMBER.OF.LINES
*
TEXT = CONTENT(I)
*
GOSUB MATCH.INDEX
*
IF RESULT THEN
PRINT ITEM.ID : ': ' : I : ': ' : TEXT
END
NEXT I
REPEAT
*
********************* S U B R O U T I N E *********************
*
MATCH.INDEX:NULL
*
RESULT = FALSE
*
IF INDEX(TEXT,REGEXP,1) # 0 THEN
RESULT = TRUE
END
*
RETURN
*
********************* S U B R O U T I N E *********************
*
MATCH.RECURSE:NULL
*
RESULT = FALSE
*
CALL MATCH(REGEXP,TEXT,RESULT)
*
RETURN
*
* END OF PROGRAM
*
END
*