type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
 
✒️

Introduction

In this blog post, I'll discuss Contextual Retrieval and explain its implementation. I've consulted and learned several resources to write this blog:
Contextual Retrieval, a method proposed by Anthropic, significantly enhances the retrieval step in RAG systems. It utilizes two sub-techniques: "contextual embeddings" and "contextual BM25." This approach reduces failures by 49%, and when combined with reranking, by an impressive 67%.

Traditional RAG

To begin, let's examine the conventional RAG process. It comprises three essential steps:
  1. Segmentation of the knowledge base (referred to as the "corpus" of documents) into manageable text fragments, typically limited to a few hundred tokens in length
  1. Utilization of an embedding model to transform these text fragments into vector embeddings that capture semantic meaning
  1. Integration of these embeddings into a vector database, facilitating efficient semantic similarity searches
When a user submits a query, the vector database identifies the most relevant text chunks based on their semantic similarity to the query. These highly relevant chunks are then incorporated into the prompt sent to the Large Language Models.
By leveraging this technique, LLM(s) can provide more comprehensive and accurate results. However, this traditional system will lead to problems when individual chunks lack sufficient context.
So, that’s why we need Contextual Retrieval.

Contextual Retrieval

To be easily understood, Contextual Retrieval prepend chunk-specific explanatory context to each chunk before embedding and creating the BM25 index.
Now, let’s implement the Contextual Retrieval using the blog written by Dario Amodei, “Machines of Loving Grace” .
The whole process will be as follows:
notion image

1-Data Processing

The first step is to load the document file. For ease of implementation, I use Jinja to extract the blog content and convert it to a.txtformat.
 
The output appears as follows:
 
Next, we need to segment our document into chunks. Here's the create_chunks function to accomplish this:
 
Let's apply this function on the document.
 
For simplicity's sake, I'll only show the output of the first chunk here.
 

Generating Contextual Chunks

Now, I will use claude-3-5-sonnet-20241022 to generate context for each chunk.
First, let's initialize the client:
Then, I use the prompt adapted from the Anthropic official guide.
Let's concatenate the prompt with each chunk we generated.
 
Now, let's generate context for the first chunk.
Here's the result:
Great! We can apply this process to each chunk we've generated.

Embedding

I employ bge-large-en-v1.5 as the embedding model to encode the context chunks.
To begin, let's set up the TogetherAI client for utilizing the embedding model.
Now, I define a function calledgenerate_embeddings
After obtaining the context embeddings, we can apply the same process to the user's query. For this example, let's consider the user's query to be "What does the author expect in the future?"
Now, I calculate the cosine similarity between the query embedding and the context embeddings to determine their relationship.
For brevity, I won't display the contents of top_5_chunks here, but you can print them to examine the results.
Now, let's encapsulate the steps above into a function called vector_retrieval.
<ins/>

BM25 Search

Now, let's apply the same query to retrieve the top 5 results.
Let's examine the first result:
I can also encapsulate these steps into a function called bm25_retrieval.

Vector and BM25 Fusion

Let’s combine the list using Reciprocal Rank Fusion.
We now have indices for each context chunk, allowing us to retrieve the corresponding text.

Rerankers to improve

Reranking is a crucial step in RAG systems that significantly enhances the relevance and quality of retrieved information. I useLlama-Rank-V1as the reranker model to improve performance.
Let’s add the top 3 chunks to a string.

Call claude-3-5-sonnet-20241022 to answer

Now, let's feed the top 3 chunks to claude-3-5-sonnet-20241022 to generate our final answer.
Let's examine the response generated for this query.

Conclusion

In this blog, I take some notes about Contextual Retrieval.
It provides a powerful approach to enhance the retrieval step in RAG systems, significantly reducing failures and improving overall performance. By combining contextual embeddings, contextual BM25, and reranking techniques, we can achieve more accurate and relevant results. This method shows great promise for advancing the field of RAG.
<ins/>
 
Nov 15,Notes on OPENCODEROct 30, LLMs cannot Play the Snake Game
Loading...