Multi-Tenancy RAG with LlamaIndex
In this notebook you will look into building Multi-Tenancy RAG System using LlamaIndex.
- Setup
- Download Data
- Load Data
- Create Index
- Create Ingestion Pipeline
- Update Metadata and Insert documents
- Define Query Engines for each user
- Querying
You should ensure you have llama-index
and pypdf
is installed.
!pip install llama-index pypdf
Set OpenAI Key
Section titled “Set OpenAI Key”import os
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI API KEY"
from llama_index.core import VectorStoreIndexfrom llama_index.core.vector_stores import MetadataFilters, ExactMatchFilterfrom llama_index.core import SimpleDirectoryReaderfrom llama_index.core.ingestion import IngestionPipelinefrom llama_index.core.node_parser import SentenceSplitter
from IPython.display import HTML
Download Data
Section titled “Download Data”We will use An LLM Compiler for Parallel Function Calling
and Dense X Retrieval: What Retrieval Granularity Should We Use?
papers for the demonstartions.
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.04511.pdf" -O "llm_compiler.pdf"!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.06648.pdf" -O "dense_x_retrieval.pdf"
--2024-01-15 14:29:26-- https://arxiv.org/pdf/2312.04511.pdfResolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ...Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 755837 (738K) [application/pdf]Saving to: ‘llm_compiler.pdf’
llm_compiler.pdf 0%[ ] 0 —.-KB/s
llm_compiler.pdf 100%[===================>] 738.12K —.-KB/s in 0.004s
2024-01-15 14:29:26 (163 MB/s) - ‘llm_compiler.pdf’ saved [755837/755837]
--2024-01-15 14:29:26-- https://arxiv.org/pdf/2312.06648.pdfResolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.67.42, 151.101.3.42, ...Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 1103758 (1.1M) [application/pdf]Saving to: ‘dense_x_retrieval.pdf’
dense_x_retrieval.p 100%[===================>] 1.05M --.-KB/s in 0.005s
2024-01-15 14:29:26 (208 MB/s) - ‘dense_x_retrieval.pdf’ saved [1103758/1103758]
Load Data
Section titled “Load Data”reader = SimpleDirectoryReader(input_files=["dense_x_retrieval.pdf"])documents_jerry = reader.load_data()
reader = SimpleDirectoryReader(input_files=["llm_compiler.pdf"])documents_ravi = reader.load_data()
Create an Empty Index
Section titled “Create an Empty Index”index = VectorStoreIndex.from_documents(documents=[])
Create Ingestion Pipeline
Section titled “Create Ingestion Pipeline”pipeline = IngestionPipeline( transformations=[ SentenceSplitter(chunk_size=512, chunk_overlap=20), ])
Update Metadata and Insert Documents
Section titled “Update Metadata and Insert Documents”for document in documents_jerry: document.metadata["user"] = "Jerry"
nodes = pipeline.run(documents=documents_jerry)# Insert nodes into the indexindex.insert_nodes(nodes)
for document in documents_ravi: document.metadata["user"] = "Ravi"
nodes = pipeline.run(documents=documents_ravi)# Insert nodes into the indexindex.insert_nodes(nodes)
Define Query Engines
Section titled “Define Query Engines”Define query engines for both the users with necessary filters.
# For Jerryjerry_query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ ExactMatchFilter( key="user", value="Jerry", ) ] ), similarity_top_k=3,)
# For Raviravi_query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ ExactMatchFilter( key="user", value="Ravi", ) ] ), similarity_top_k=3,)
Querying
Section titled “Querying”# Jerry has Dense X Rerieval paper and should be able to answer following question.response = jerry_query_engine.query( "what are propositions mentioned in the paper?")# Print responsedisplay(HTML(f'<p style="font-size:20px">{response.response}</p>'))
The paper mentions propositions as an alternative retrieval unit choice. Propositions are defined as atomic expressions of meanings in text that correspond to distinct pieces of meaning in the text. They are minimal and cannot be further split into separate propositions. Each proposition is contextualized and self-contained, including all the necessary context from the text to interpret its meaning. The paper demonstrates the concept of propositions using an example about the Leaning Tower of Pisa, where the passage is split into three propositions, each corresponding to a distinct factoid about the tower.
# Ravi has LLMCompiler paperresponse = ravi_query_engine.query("what are steps involved in LLMCompiler?")
# Print responsedisplay(HTML(f'<p style="font-size:20px">{response.response}</p>'))
LLMCompiler consists of three key components: an LLM Planner, a Task Fetching Unit, and an Executor. The LLM Planner identifies the execution flow by defining different function calls and their dependencies based on user inputs. The Task Fetching Unit dispatches the function calls that can be executed in parallel after substituting variables with the actual outputs of preceding tasks. Finally, the Executor executes the dispatched function calling tasks using the associated tools. These components work together to optimize the parallel function calling performance of LLMs.
# This should not be answered as Jerry does not have information about LLMCompilerresponse = jerry_query_engine.query("what are steps involved in LLMCompiler?")
# Print responsedisplay(HTML(f'<p style="font-size:20px">{response.response}</p>'))
The steps involved in LLMCompiler are not mentioned in the given context information.