Skip to content

Commit 5d1a577

Browse files
WIP: RAG setup example (#1431)
1 parent f90894a commit 5d1a577

File tree

6 files changed

+147
-0
lines changed

6 files changed

+147
-0
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Source ChatGPT
2+
3+
from transformers import DPRContextEncoder, DPRTokenizer
4+
5+
# Load the pretrained DPR model
6+
tokenizer = DPRTokenizer.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")
7+
encoder = DPRContextEncoder.from_pretrained("facebook/dpr-ctx_encoder-single-nq-base")
8+
9+
# Example documents
10+
documents = [
11+
"The capital of France is Paris.",
12+
"Python is a programming language.",
13+
"Hugging Face is a popular platform for NLP models.",
14+
"The Eiffel Tower is located in Paris."
15+
]
16+
17+
# Tokenize and encode documents
18+
encoded_inputs = tokenizer(documents, padding=True, truncation=True, return_tensors="pt")
19+
document_embeddings = encoder(**encoded_inputs).pooler_output
20+
21+
# Print embeddings
22+
print(document_embeddings)
23+
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Source ChatGPT
2+
3+
from sentence_transformers import SentenceTransformer
4+
5+
# Load the pre-trained model (Sentence-BERT)
6+
model = SentenceTransformer('all-MiniLM-L6-v2') # or use a model like DPR for better retrieval
7+
8+
# Example documents
9+
documents = [
10+
"The capital of France is Paris.",
11+
"Python is a programming language.",
12+
"Hugging Face is a popular platform for NLP models.",
13+
"The Eiffel Tower is located in Paris."
14+
]
15+
16+
# Convert documents to embeddings
17+
document_embeddings = model.encode(documents)
18+
19+
# Print the embeddings (this will be a list of vectors)
20+
print(document_embeddings)
21+
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Source ChatGPT
2+
3+
from transformers import RagTokenizer, RagSequenceForGeneration
4+
5+
# Initialize the tokenizer, retriever, and RAG model
6+
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
7+
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq")
8+
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq")
9+
10+
# Example query
11+
query = "Where is the Eiffel Tower located?"
12+
13+
# Tokenize the input query
14+
inputs = tokenizer(query, return_tensors="pt")
15+
16+
# Retrieve relevant documents from your FAISS index
17+
query_embedding = model.encode([query]) # Use the same model for the query as before
18+
D, I = index.search(np.array(query_embedding).astype('float32'), k=5)
19+
20+
# Use the indices to get the top-k relevant documents
21+
retrieved_docs = [documents[i] for i in I[0]]
22+
23+
# Convert the retrieved documents into the appropriate format for the RAG model
24+
retrieved_docs_input = tokenizer(retrieved_docs, padding=True, truncation=True, return_tensors="pt")
25+
26+
# Generate the response using RAG with the retrieved context
27+
generated_output = model.generate(input_ids=inputs["input_ids"], context_input_ids=retrieved_docs_input["input_ids"])
28+
29+
# Decode and print the response
30+
answer = tokenizer.decode(generated_output[0], skip_special_tokens=True)
31+
print(answer)
32+
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Source ChatGPT
2+
3+
# Let's say you have a new query to retrieve similar documents
4+
query = "Where is the Eiffel Tower?"
5+
query_embedding = model.encode([query]) # Assuming you're using Sentence-BERT
6+
7+
# Perform the search for the top-k most similar documents
8+
k = 2 # Number of similar documents to retrieve
9+
D, I = index.search(np.array(query_embedding).astype('float32'), k)
10+
11+
# D contains the distances (lower is more similar), and I contains the indices of the retrieved documents
12+
print(f"Distances: {D}")
13+
print(f"Indices of retrieved documents: {I}")
14+
15+
# Retrieve the documents based on the indices
16+
retrieved_documents = [documents[i] for i in I[0]]
17+
print(f"Retrieved documents: {retrieved_documents}")
18+
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
RAG Architecture
2+
-----------------
3+
RAG stands for Retrieval Augmented Generation.
4+
5+
In Generative AI tasks like Question-Answering, RAG setups
6+
retrieve documents that are relevant to the query, and use them
7+
as context when generating the answer.
8+
9+
The steps involved in a RAG setup are:
10+
1. Choosing an embedding model (embedding-model-dpr.py)
11+
2. Store document embeddings in a Vector database (store-documentembeddings-vectordb.py)
12+
3. Search documents relevant to a query (search-documents.py)
13+
4. Integrate with RAG setup (rag-integration.py)
14+
15+
Consider an enterprise who wants to deploy chatbots for answering questions related to different
16+
aspects of the company, such as HR questions, internal coding standards used within the company, and social events.
17+
Imagine that information about these is available on various internal tools such as Workday,
18+
JIRA/Confluence, internal social media apps like Signal, etc.
19+
The chatbots need to use all this information, distributed in different document repositories,
20+
when answering questions.
21+
22+
The architecture for such a system can look as follows:
23+
We create one Vector database which is loaded with all the documents from different internal document sources
24+
(Workday, Confluence, Signal, etc.).
25+
We create different chatbots for different tasks (e.g.: askhr - for HR question/answers,
26+
codr - for coding question/answers , tgif - for social events). Each chatbot can have special prompts
27+
relevant to their task. All the chatbots use the same Vector database as part of their RAG setup.
28+
A single Vector database shared by all the chatbots ensures that a single team can work on setting up
29+
this database. Also, a single database can correctly answer questions that span multiple categories
30+
(e.g.: What are the coding standards used during internal Hackthons? Here the question spans two categories -
31+
coding standards and social events.)
32+
33+
We will have one Helm chart representing the Vector database, and separate Helm charts
34+
for the chatbots. We will deploy one instance of the Vector database CRD, which will run in its own namespace.
35+
Each chatbot will also be deployed as a single instance, and will run in its own namespace.
36+
In order to allow chatbots to use the Vector database in the RAG process, we will need to enable cross-namespace
37+
communication between the chatbot's namespace and the Vector database namespace.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Source ChatGPT
2+
3+
import faiss
4+
import numpy as np
5+
6+
# Convert the document embeddings to numpy arrays (FAISS requires numpy arrays)
7+
document_embeddings = np.array(document_embeddings).astype('float32')
8+
9+
# Create a FAISS index for similarity search
10+
index = faiss.IndexFlatL2(document_embeddings.shape[1]) # Using L2 distance (Euclidean)
11+
12+
# Add the document embeddings to the index
13+
index.add(document_embeddings)
14+
15+
# Now, the index can be used to search for similar documents
16+

0 commit comments

Comments
 (0)