Monday | 15 APR 2024
[ previous ]
[ next ]

Thoughts on RAG

Date: 2024-02-29
Tags:  AI

RAG stands for Retrieval Augmented Generation. The idea is that when prompting, you can fill the unused tokens with useful context that will let the AI cheat to generate an answer.

The proverbial example is asking for the current date. The AI model has a date stamped into it as it doesn't information post that date. So when you ask for the date, it won't be able to know what the current date really is. However you can added that the current date is today into the prompt when asking for the date. Now the model can respond with the current date.

It's hacky but everything about AI looks to be hacky. RAG takes this to the extreme. The core idea is take a base of knowledge and make it searchable. Right before the prompt goes to the AI, search the knowledge base for something similar and added it before the prompt. Then send the prompt to the AI. This way you can give the AI a good base to start from rather than letting it start from scratch.

It's relatively straightforward to implement in langchain, an llm framework, and it also looks simple to implement in raw python. I haven't tried the raw python method yet but the langchain was very quick.

I first took a pdf and split it out and pumped into a vector database. I then used langchain to search the vector database and appended the context to the prompt.

I did this with a coding manual and found RAG was a terrible fit for it. I realize now that the way the chunking and embedding stuff works doesnt work when code is very much based on the stuff around it. A chunk size that splits in the middle of a code example is probably going to cause more problems than anything. I can see RAG working much better where you are getting information from a paper or an article but for something like code, RAG looks to be pretty terrible.

I had wanted to use RAG with a PDF I was comfortable with but I think I'll need to use it against a book or an essay instead. That might have better results.

My biggest issue has been the fact that I can run a program and get 2 different answers. That is really strange for a program.