Retrieval Augmented Generation (RAG) Workflow for Chatbots with Featureform
The retrieval augmented generation workflow pulls information that’s relevant to the user’s query and feeds it into the LLM via the prompt. That information might be similar documents pulled from a vector database, or features looked up from an inference store.
To start using Featureform for RAG, we should register the documents that we plan to use as context.CSV FileFor a CSV file on our local system, we can do the following:
Copy
Ask AI
episodes = local.register_file( name="mlops-episodes", path="data/files/podcast1.csv", description="Transcripts from recent MLOps episodes",)
Directory of Text FilesFor a directory of files on our system, we can do:
Copy
Ask AI
episodes = local.register_directory( name="mlops-episodes", path="data/files", description="Transcripts from recent MLOps episodes",)
Our text files may be imperfect to use as context. We need to choose the right size and density of data to maximize the information we provide to our final prompt. If our text is too long, we may choose to chunk it. If it’s too short, we may choose to concatenate multiple into one. In almost every situation, there’s other sorts of cleaning of data that may have to be done.ChunkingWe can chunk by doing things like splitting on periods to create sentences or new lines for paragraphs. Langchain also has a set of text chunkers that can be used as well.ConcatenationTo concatenate, we can add together text that are relevant to each other. For example, we can choose to append all of the messages in a slack thread as one document.CleaningData is never clean. We may want to remove formatting and other imperfections using transformations.
Now that we have our text cleaned up, we need to index it for retrieval. To do so, we’ll create an embedding of the text. We’ve written a long form article on embeddings. However, for this situation, you can simply think of them as a specialized index for similarity search. Text that’s similar according to the embedding model, will be near each other in N-Dimensional space. We’ll use a vector database for retrieval.
We can use the ADA model from OpenAI to embed our documents. Note that this is a different model than GPT. It’s purpose it to convert text into embeddings for nearest neighbor lookup.
sentence_transformers with HuggingFace is our recommended way of embedding models. It’s fast and free! The quality of embeddings will likely be slightly worse than ADAs though.
Defining infrastructure providers like vector databases
In localmode, we’ll often want to use an external vector database to index and retrieve our vectors. Even though our embeddings will be computed locally (or via API if using ADA), the final embeddings will be written to a vector database.Pinecone
Now that we have our vector databases registered, we can specify the columns that make up our embeddings and our features. Note that embeddings are used to retrieve entity IDs, and that the actual text should be registered as a separate entity.
Copy
Ask AI
@ff.entityclass Speaker: comment_embeddings = ff.Embedding( vectorize_comments[["PK", "Vector"]], dims=384, vector_db=pinecone, description="Embeddings created from speakers' comments in episodes", variant="v1" ) comments = ff.Feature( speaker_primary_key[["PK", "Text"]], type=ff.String, description="Speakers' original comments", variant="v1" )
In your final prompt, you may also want to retrieve data by key. For example, if you’re providing a recommendation for someone, you might want to grab the last N items that they looked at from your feature store. This is the traditional way to use Featureform and you can learn more in our other documents
There are many tricks to prompt construction that may be useful.SummarizationBy starting prompts with “Summarize the following in four sentences “, we can use an LLM to summarize.DiffsIf building an application to improve a piece of text, we may write something like: “Improve the following text and show a diff of your recommendation and the original text”Playing a roleWe can tell the LLM to “think” like a specific role. For example “Answer the following as if you are an MLOps expert”Explain it like I’m 5You can tell LLMs to explain their reasoning or explain things to you like you’re five years old. This often results in easier to follow explanations when we don’t need all of the details.FormattingWe can tell and LLM how to format: Answer the following questions with five bullet points.
There is a finite amount of text that we can put in a prompt. Our goal is to maximize the amount of relevant information to allow the LLM to be as accurate as possible while fitting in our boundaries. We should chose of text chunking (mentioned above in pre-processing) and the right amount of nearest neighbors to achieve this.SummarizationA trick to fit more information in a context window is to use an LLM to summarize each relevant document that we retrieve. We can use an LLM to increase information density that we pass into our LLM! It’s inception :D
Now that we have defined our final prompt on-demand feature, we can use it to feed into an LLM like OpenAI to retrieve our response.
Copy
Ask AI
prompt = client.features([contextualized_prompt], {}, params={"query": "What should I know about MLOps for Enterprise"})print(openai.Completion.create( model="text-davinci-003", prompt=prompt,)["choices"][0]["text"])