What is RAG?

I've been working to build out resources for people to learn more about retrieval-augmented generation (RAG) — and to learn more about it myself. This post covers what RAG is, what I found and suggestions for how to tackle the resources.

What is RAG?

Over the last few weeks, I've been working to build out resources for people to learn more about retrieval-augmented generation (RAG) — and to learn more about it myself. This post covers what RAG is, what I found and suggestions for how to tackle the resources.

Just want the notebooks to implement RAG? Head to the bottom of this post ⤵️

Let’s dig in!

When I started exploring this topic, I did a google search for "RAG is dead" and found that there were over five pages of results with that title or some variation of it.

So I set out to figure out why so many people were writing about this or if it's just click bait.

Spoiler alert: it is clickbait.

Because RAG really isn't dead. And it isn't just a buzzword. But it has evolved over the last year or so, because so have we, so have our AI use cases, and so has the tech behind it all. What I learned was that it's still the bridge between a generic AI app and one that is ready for your business, your private and proprietary data, your rapidly changing and growing data.

And here's what I found:
▪️Agent workloads have exploded and they only work if they are grounded in accurate and relevant data
▪️ Often data needs to be isolated, per agent, per user – no commingling Jenna's data with Joe Schmoe's data
▪️ Large context windows create a "lost in the middle" problem (check out this paper), increase costs linearly, and reduce accuracy compared to a targeted retrieval approach
▪️ Building or fine-tuning models requires an investment that not everyone is able to make and even if they are, RAG helps them be more effective

But that doesn't mean that you drop everything and only do RAG! RAG is just one tool in your toolbox and you'll likely even pair it up with other approaches, in order to create an even more effective system.

So what is RAG?

What is RAG?

Retrieval-augmented generation, or RAG, is a technique that uses authoritative, external data to improve the accuracy, relevancy, and usefulness of a model’s output. There are four core components making up a RAG system:

  1. Ingestion: authoritative data like your company's trade secrets or confidential data is loaded into a data source, like a Pinecone vector database
  2. Retrieval: relevant data is retrieved from an external data source based on a user query
  3. Augmentation: the retrieved data and the user's query are combined into a prompt (known as the context) and sent to the model for the generation step
  4. Generation: the model generates output based on the augmented prompt, using the context to drive a more accurate and relevant response.

Here's what a simple traditional RAG pipeline looks like:

Diagram showing traditional RAG from user query to output

With agents growing in popularity, they are now orchestrators of the core RAG components and an agent or team of agents execute operations as part of a larger plan. They can:

  • construct more effective queries
  • access additional retrieval tools
  • evaluate the accuracy and relevance of the retrieved context
  • apply reasoning to validate retrieved information, to trust or discard it.

Here's what that could look like:

Diagram of agentic RAG from user query through tool use and generation to response.

They can now make better decisions and take more informed actions, including more accurate and relevant output for your users.

Want to read more about what RAG is and why it’s important? I also wrote this: https://jenna.link/rag

Implement RAG

As part of this, I created a couple of Jupyter notebooks to better understand different implementations.

Simple RAG

This notebook shows you how to implement a traditional RAG pipeline and uses a form of hybrid search during retrieval and Anthropic Claude models for text generation.

📓 https://jenna.link/p88

Agentic RAG

This notebook shows how to implement a simple genetic RAG flow using tool use with Anthropic Claude models to retrieve data through a simple web search tool and a semantic search tool that searches over a Pinecone index.

📓 https://jenna.link/zx3

More RAG notebooks you can take for a spin.

These notebooks use LangChain, LangGraph, and OpenAI to show different ways to implement.

📓 https://jenna.link/dfb

Ready for more?

I hope this has been helpful. If it has, please share this with your friends 👯 or drop a comment below 💬. And if you're ready to learn more about RAG, agentic workflows, and retrieval or have suggestions on content that would help you, let me know.

Stay tuned here for more updates or follow along with me on Instagram or LinkedIn where I share short bits of what I'm working on and learning as I go.

Get the goods. In your inbox. Very very infrequently.