All LLM workflow revolved around the following operation:

To achieve our desired output, we can do one of two things: Improve the prompt, especially by adding additional context (Retrieval Augmented Generation) Fine-tune the LLM

Though, there are prompt construction tricks that achieve better results in practice; the most reliable and most impactful way to improve a prompt is to add more context when it’s available. Fine-tuning is essentially providing this same context through a different API than the prompt.

So, in short, the best way to improve your LLM’s systems performance is to feed in data as context! Even with this new ML paradigm, it all comes down to data.

When to use Retrieval-Augmented Generation and when to use Fine-Tuning

There are two ways to make your LLM system better using your private data: Retrieval Augmented Generation and Fine-Tuning. There are a few things to take into consideration, when deciding:

Mimicking Style or Using Information

Fine-tuning does not easily “memorize” new information in practice. It often ends up mimicking the style of the content it’s being fine-tuned on. On the other hand, the typical use case of RAG is to use relevant information to help the LLM formulate a better response.

Hallucination Susceptibility

Similar to the reason above, if you’re using contextual information to try to eliminate hallucination, RAG is likely to work better since the information in the query will be heavily weighted. With fine-tuning, unless you’ve used a massive dataset, it’s likely to not memorize information and to still hallucinate.

Size of Data

Fine-tuning requires much more data to achieve desired results. RAG can work with sparse data or small data sets since it only pulls relevant data at inference time.

Security and Governance

Depending on the nature of your data, you may be very sensitive to revealing your training data. With fine-tuning, all the data you used has potential to be revealed given the right prompt. On the other hand, with RAG, only the data that you feed into that specific query may be revealed. If you’re using a user’s own data as context, this might not be a problem. RAG dramatically drops the scope of data that may be revealed to simply what’s in the prompt.

Complexity

RAG is a much more complex workflow than fine-tuning. Though it’s often far better in practice, it adds a lot of moving parts.

Use RAG for information and fine-tuning for style

We typically see far better results when using RAG. It’s much cleaner and controllable. The only exception is when you are trying to match a writing style. In that case fine-tuning often works much better. For example, if you want to use an LLM to respond to questions on slack, you can fine-tune it on all the answers you’ve given in the past to have it sound like you.

So in short: Fine-tuning to match a writing style RAG to inject context/information

Where to go from here

Check out our deep-dive into retrieval augmented generation!

Or check out a full project on github!