type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
Introduction
In this blog post, I'll discuss Contextual Retrieval and explain its implementation. I've consulted and learned several resources to write this blog:
Contextual Retrieval, a method proposed by Anthropic, significantly enhances the retrieval step in RAG systems. It utilizes two sub-techniques: "contextual embeddings" and "contextual BM25." This approach reduces failures by 49%, and when combined with reranking, by an impressive 67%.
Traditional RAG
To begin, let's examine the conventional RAG process. It comprises three essential steps:
- Segmentation of the knowledge base (referred to as the "corpus" of documents) into manageable text fragments, typically limited to a few hundred tokens in length
- Utilization of an embedding model to transform these text fragments into vector embeddings that capture semantic meaning
- Integration of these embeddings into a vector database, facilitating efficient semantic similarity searches
When a user submits a query, the vector database identifies the most relevant text chunks based on their semantic similarity to the query. These highly relevant chunks are then incorporated into the prompt sent to the Large Language Models.
By leveraging this technique, LLM(s) can provide more comprehensive and accurate results. However, this traditional system will lead to problems when individual chunks lack sufficient context.
So, that’s why we need Contextual Retrieval.
Contextual Retrieval
To be easily understood, Contextual Retrieval prepend chunk-specific explanatory context to each chunk before embedding and creating the BM25 index.
Now, let’s implement the Contextual Retrieval using the blog written by Dario Amodei, “Machines of Loving Grace” .
The whole process will be as follows:
1-Data Processing
The first step is to load the document file. For ease of implementation, I use Jinja to extract the blog content and convert it to a
.txt
format. The output appears as follows:
Next, we need to segment our document into chunks. Here's the
create_chunks
function to accomplish this: Let's apply this function on the document.
For simplicity's sake, I'll only show the output of the first chunk here.
Generating Contextual Chunks
Now, I will use
claude-3-5-sonnet-20241022
to generate context for each chunk. First, let's initialize the client:
Then, I use the prompt adapted from the Anthropic official guide.
Let's concatenate the prompt with each chunk we generated.
Now, let's generate context for the first chunk.
Here's the result:
Great! We can apply this process to each chunk we've generated.
Embedding
I employ
bge-large-en-v1.5
as the embedding model to encode the context chunks. To begin, let's set up the TogetherAI client for utilizing the embedding model.
Now, I define a function called
generate_embeddings
After obtaining the context embeddings, we can apply the same process to the user's query. For this example, let's consider the user's query to be "What does the author expect in the future?"
Now, I calculate the cosine similarity between the query embedding and the context embeddings to determine their relationship.
For brevity, I won't display the contents of
top_5_chunks
here, but you can print them to examine the results.Now, let's encapsulate the steps above into a function called
vector_retrieval
.<ins/>
BM25 Search
Now, let's apply the same query to retrieve the top 5 results.
Let's examine the first result:
I can also encapsulate these steps into a function called
bm25_retrieval
. Vector and BM25 Fusion
Let’s combine the list using
Reciprocal Rank Fusion
. We now have indices for each context chunk, allowing us to retrieve the corresponding text.
Rerankers to improve
Reranking is a crucial step in RAG systems that significantly enhances the relevance and quality of retrieved information. I use
Llama-Rank-V1
as the reranker model to improve performance. Let’s add the top 3 chunks to a string.
Call claude-3-5-sonnet-20241022 to answer
Now, let's feed the top 3 chunks to
claude-3-5-sonnet-20241022
to generate our final answer. Let's examine the response generated for this query.
Conclusion
In this blog, I take some notes about Contextual Retrieval.
It provides a powerful approach to enhance the retrieval step in RAG systems, significantly reducing failures and improving overall performance. By combining contextual embeddings, contextual BM25, and reranking techniques, we can achieve more accurate and relevant results. This method shows great promise for advancing the field of RAG.
<ins/>
- Author:Chengsheng Deng
- URL:https://chengshengddeng.com/article/contextual-retrieval
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts