Skip to content

Advanced RAG with temporal filters using LlamaIndex and KDB.AI vector store

KDB.AI is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.

This example demonstrates how to use KDB.AI to run semantic search, summarization and analysis of financial regulations around some specific moment in time.

To access your end point and API keys, sign up to KDB.AI here.

To set up your development environment, follow the instructions on the KDB.AI pre-requisites page.

The following examples demonstrate some of the ways you can interact with KDB.AI through LlamaIndex.

In order to successfully run this sample, note the following steps depending on where you are running this notebook:

-Run Locally / Private Environment: The Setup steps in the repository’s README.md will guide you on prerequisites and how to run this with Jupyter.

-Colab / Hosted Environment: Open this notebook in Colab and run through the cells.

!pip install llama-index llama-index-llms-openai llama-index-embeddings-openai llama-index-readers-file llama-index-vector-stores-kdbai
!pip install kdbai_client pandas
from getpass import getpass
import re
import os
import shutil
import time
import urllib
import datetime
import pandas as pd
from llama_index.core import (
Settings,
SimpleDirectoryReader,
StorageContext,
VectorStoreIndex,
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.kdbai import KDBAIVectorStore
import kdbai_client as kdbai
OUTDIR = "pdf"
RESET = True

Set OpenAI API key and choose the LLM and Embedding model to use:

Section titled “Set OpenAI API key and choose the LLM and Embedding model to use:”
# os.environ["OPENAI_API_KEY"] = getpass("OpenAI API key: ")
os.environ["OPENAI_API_KEY"] = (
os.environ["OPENAI_API_KEY"]
if "OPENAI_API_KEY" in os.environ
else getpass("OpenAI API Key: ")
)
import os
from getpass import getpass
# Set OpenAI API
if "OPENAI_API_KEY" in os.environ:
KDBAI_API_KEY = os.environ["OPENAI_API_KEY"]
else:
# Prompt the user to enter the API key
OPENAI_API_KEY = getpass("OPENAI API KEY: ")
# Save the API key as an environment variable for the current session
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
EMBEDDING_MODEL = "text-embedding-3-small"
GENERATION_MODEL = "gpt-4o-mini"
llm = OpenAI(model=GENERATION_MODEL)
embed_model = OpenAIEmbedding(model=EMBEDDING_MODEL)
Settings.llm = llm
Settings.embed_model = embed_model
# vector DB imports
import os
from getpass import getpass
import kdbai_client as kdbai
import time

To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key. To get these you can sign up for free here.

You can connect to a KDB.AI Cloud session using kdbai.Session and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables KDBAI_ENDPOINTS and KDBAI_API_KEY exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect. If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

# Set up KDB.AI endpoint and API key
KDBAI_ENDPOINT = (
os.environ["KDBAI_ENDPOINT"]
if "KDBAI_ENDPOINT" in os.environ
else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
os.environ["KDBAI_API_KEY"]
if "KDBAI_API_KEY" in os.environ
else getpass("KDB.AI API key: ")
)
session = kdbai.Session(endpoint=KDBAI_ENDPOINT, api_key=KDBAI_API_KEY)

To use KDB.AI Server, you will need download and run your own container. To do this, you will first need to sign up for free here.

You will receive an email with the required license file and bearer token needed to download your instance. Follow instructions in the signup email to get your session up and running.

Once the setup steps are complete you can then connect to your KDB.AI Server session using kdbai.Session and passing your local endpoint.

# session = kdbai.Session()

!!! Note: The ‘dims’ parameter in the embedding column must reflect the output dimensions of the embedding model you choose.

  • OpenAI ‘text-embedding-3-small’ outputs 1536 dimensions.
schema = [
{"name": "document_id", "type": "bytes"},
{"name": "text", "type": "bytes"},
{"name": "embeddings", "type": "float32s"},
{"name": "title", "type": "str"},
{"name": "publication_date", "type": "datetime64[ns]"},
]
indexFlat = {
"name": "flat_index",
"type": "flat",
"column": "embeddings",
"params": {"dims": 1536, "metric": "L2"},
}
KDBAI_TABLE_NAME = "reports"
database = session.database("default")
# First ensure the table does not already exist
for table in database.tables:
if table.name == KDBAI_TABLE_NAME:
table.drop()
break
# Create the table
table = database.create_table(
KDBAI_TABLE_NAME, schema=schema, indexes=[indexFlat]
)
INPUT_URLS = [
"https://www.govinfo.gov/content/pkg/PLAW-106publ102/pdf/PLAW-106publ102.pdf",
"https://www.govinfo.gov/content/pkg/PLAW-111publ203/pdf/PLAW-111publ203.pdf",
]
METADATA = {
"pdf/PLAW-106publ102.pdf": {
"title": "GRAMM–LEACH–BLILEY ACT, 1999",
"publication_date": pd.to_datetime("1999-11-12"),
},
"pdf/PLAW-111publ203.pdf": {
"title": "DODD-FRANK WALL STREET REFORM AND CONSUMER PROTECTION ACT, 2010",
"publication_date": pd.to_datetime("2010-07-21"),
},
}
%%time
CHUNK_SIZE = 512 * 1024
def download_file(url):
print("Downloading %s..." % url)
out = os.path.join(OUTDIR, os.path.basename(url))
try:
response = urllib.request.urlopen(url)
except urllib.error.URLError as e:
logging.exception("Failed to download %s !" % url)
else:
with open(out, "wb") as f:
while True:
chunk = response.read(CHUNK_SIZE)
if chunk:
f.write(chunk)
else:
break
return out
if RESET:
if os.path.exists(OUTDIR):
shutil.rmtree(OUTDIR)
os.mkdir(OUTDIR)
local_files = [download_file(x) for x in INPUT_URLS]
local_files[:10]
Downloading https://www.govinfo.gov/content/pkg/PLAW-106publ102/pdf/PLAW-106publ102.pdf...
Downloading https://www.govinfo.gov/content/pkg/PLAW-111publ203/pdf/PLAW-111publ203.pdf...
CPU times: user 52.6 ms, sys: 1.2 ms, total: 53.8 ms
Wall time: 7.86 s
%%time
def get_metadata(filepath):
return METADATA[filepath]
documents = SimpleDirectoryReader(
input_files=local_files,
file_metadata=get_metadata,
)
docs = documents.load_data()
len(docs)
CPU times: user 8.22 s, sys: 9.04 ms, total: 8.23 s
Wall time: 8.23 s
994

Setup LlamaIndex RAG pipeline using KDB.AI vector store

Section titled “Setup LlamaIndex RAG pipeline using KDB.AI vector store”
%%time
# llm = OpenAI(temperature=0, model=LLM)
vector_store = KDBAIVectorStore(table)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
docs,
storage_context=storage_context,
transformations=[SentenceSplitter(chunk_size=2048, chunk_overlap=0)],
)
CPU times: user 3.67 s, sys: 31.9 ms, total: 3.7 s
Wall time: 22.3 s
table.query()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
document_id text embeddings title publication_date
0 b'272d7d24-c232-41b6-823e-27aa6203c100' b'PUBLIC LAW 106\xc2\xb1102\xc3\x90NOV. 12, 19... [0.034452137, 0.03166917, -0.011892043, 0.0184... GRAMM–LEACH–BLILEY ACT, 1999 1999-11-12
1 b'89e3f2ee-f5a6-4e40-bb81-0632f08341f0' b"113 STAT. 1338 PUBLIC LAW 106\xc2\xb1102\xc3... [0.02164333, 1.0030156e-05, 0.0028665832, 0.02... GRAMM–LEACH–BLILEY ACT, 1999 1999-11-12
2 b'56fbe82a-5458-4a4a-a5ed-026d9399151d' b'113 STAT. 1339 PUBLIC LAW 106\xc2\xb1102\xc3... [0.01380091, 0.026945233, 0.02838467, 0.043132... GRAMM–LEACH–BLILEY ACT, 1999 1999-11-12
3 b'b6bf9e48-51b6-45d9-9259-b6346f93831f' b'113 STAT. 1340 PUBLIC LAW 106\xc2\xb1102\xc3... [0.0070182937, 0.014063503, 0.026525516, 0.040... GRAMM–LEACH–BLILEY ACT, 1999 1999-11-12
4 b'f398b133-b4f5-4a34-94d1-9a97fdb658e5' b"113 STAT. 1341 PUBLIC LAW 106\xc2\xb1102\xc3... [0.025041763, 0.01968024, 0.030940715, 0.02899... GRAMM–LEACH–BLILEY ACT, 1999 1999-11-12
... ... ... ... ... ...
989 b'8e84d1d5-d87d-4351-b7eb-5d569fdb8d9c' b'124 STAT. 2219 PUBLIC LAW 111\xe2\x80\x93203... [0.024505286, 0.015549232, 0.0536601, 0.028532... DODD-FRANK WALL STREET REFORM AND CONSUMER PRO... 2010-07-21
990 b'0c47f590-050c-4374-bf8c-2a4502dc980f' b'124 STAT. 2220 PUBLIC LAW 111\xe2\x80\x93203... [0.014071382, -0.0044553108, 0.03662071, 0.035... DODD-FRANK WALL STREET REFORM AND CONSUMER PRO... 2010-07-21
991 b'63a2235f-d368-43b8-a1a9-a5a11d497245' b'124 STAT. 2221 PUBLIC LAW 111\xe2\x80\x93203... [0.0005448305, 0.013075933, 0.044821188, 0.031... DODD-FRANK WALL STREET REFORM AND CONSUMER PRO... 2010-07-21
992 b'bac4d75e-4867-4d89-a71e-09a6762bf3c4' b'124 STAT. 2222 PUBLIC LAW 111\xe2\x80\x93203... [0.032077603, 0.016817383, 0.04507993, 0.03376... DODD-FRANK WALL STREET REFORM AND CONSUMER PRO... 2010-07-21
993 b'e262e4da-f6e1-4b9d-9232-77fc3f0c81a7' b'124 STAT. 2223 PUBLIC LAW 111\xe2\x80\x93203... [0.0387719, -0.025150038, 0.030345473, 0.04303... DODD-FRANK WALL STREET REFORM AND CONSUMER PRO... 2010-07-21

994 rows × 5 columns

%%time
# Using gpt-4o-mini, the 128k tokens context size can take 100 pages.
K = 15
query_engine = index.as_query_engine(
similarity_top_k=K,
vector_store_kwargs={
"index": "flat_index",
"filter": [["<", "publication_date", datetime.date(2008, 9, 15)]],
"sort_columns": "publication_date",
},
)
CPU times: user 512 μs, sys: 23 μs, total: 535 μs
Wall time: 550 μs
%%time
result = query_engine.query(
"""
What was the main financial regulation in the US before the 2008 financial crisis ?
"""
)
print(result.response)
The main financial regulation in the US before the 2008 financial crisis was the Gramm-Leach-Bliley Act, enacted in 1999. This act facilitated the affiliation among banks, securities firms, and insurance companies, effectively repealing parts of the Glass-Steagall Act, which had previously separated these financial services. The Gramm-Leach-Bliley Act aimed to enhance competition in the financial services industry by providing a framework for the integration of various financial institutions.
CPU times: user 61.8 ms, sys: 0 ns, total: 61.8 ms
Wall time: 4.24 s
%%time
result = query_engine.query(
"""
Is the Gramm-Leach-Bliley Act of 1999 enough to prevent the 2008 crisis. Search the document and explain its strenghts and weaknesses to regulate the US stock market.
"""
)
print(result.response)
The Gramm-Leach-Bliley Act of 1999 aimed to enhance competition in the financial services industry by allowing affiliations among banks, securities firms, and insurance companies. Its strengths include the repeal of the Glass-Steagall Act, which had previously separated commercial banking from investment banking, thereby enabling financial institutions to diversify their services and potentially increase competition. This diversification could lead to more innovative financial products and services.
However, the Act also has notable weaknesses. By allowing greater affiliations and reducing regulatory barriers, it may have contributed to the creation of "too big to fail" institutions, which posed systemic risks to the financial system. The lack of stringent oversight and the ability for financial holding companies to engage in a wide range of activities without adequate regulation may have led to excessive risk-taking. Additionally, the Act did not sufficiently address the complexities of modern financial products, such as derivatives, which played a significant role in the 2008 financial crisis.
In summary, while the Gramm-Leach-Bliley Act aimed to foster competition and innovation in the financial sector, its regulatory framework may have inadvertently facilitated the conditions that led to the financial crisis, highlighting the need for a more robust regulatory approach to oversee the interconnectedness and risks within the financial system.
CPU times: user 45.7 ms, sys: 255 μs, total: 46 ms
Wall time: 21.9 s
%%time
# Using gpt-4o-mini, the 128k tokens context size can take 100 pages.
K = 15
query_engine = index.as_query_engine(
similarity_top_k=K,
vector_store_kwargs={
"index": "flat_index",
"filter": [[">=", "publication_date", datetime.date(2008, 9, 15)]],
"sort_columns": "publication_date",
},
)
CPU times: user 171 μs, sys: 0 ns, total: 171 μs
Wall time: 175 μs
%%time
result = query_engine.query(
"""
What happened on the 15th of September 2008 ?
"""
)
print(result.response)
On the 15th of September 2008, Lehman Brothers, a major global financial services firm, filed for bankruptcy. This event marked one of the largest bankruptcies in U.S. history and was a significant moment in the financial crisis of 2007-2008, leading to widespread panic in financial markets and contributing to the global economic downturn.
CPU times: user 51.4 ms, sys: 0 ns, total: 51.4 ms
Wall time: 3.6 s
%%time
result = query_engine.query(
"""
What was the new US financial regulation enacted after the 2008 crisis to increase the market regulation and to improve consumer sentiment ?
"""
)
print(result.response)
The new US financial regulation enacted after the 2008 crisis to increase market regulation and improve consumer sentiment is the Dodd-Frank Wall Street Reform and Consumer Protection Act, which was signed into law on July 21, 2010. This legislation aimed to promote financial stability, enhance accountability and transparency in the financial system, and protect consumers from abusive financial practices.
CPU times: user 43.7 ms, sys: 0 ns, total: 43.7 ms
Wall time: 4.55 s
%%time
# Using gpt-4o-mini, the 128k tokens context size can take 100 pages.
K = 20
query_engine = index.as_query_engine(
similarity_top_k=K,
vector_store_kwargs={
"index": "flat_index",
"sort_columns": "publication_date",
},
)
CPU times: user 227 μs, sys: 10 μs, total: 237 μs
Wall time: 243 μs
%%time
result = query_engine.query(
"""
Analyse the US financial regulations before and after the 2008 crisis and produce a report of all related arguments to explain what happened, and to ensure that does not happen again.
Use both the provided context and your own knowledge but do mention explicitely which one you use.
"""
)
print(result.response)
The analysis of U.S. financial regulations before and after the 2008 financial crisis reveals significant changes aimed at preventing a recurrence of such a crisis.
Before the crisis, the regulatory framework was characterized by a lack of comprehensive oversight, particularly for nonbank financial institutions. The regulatory environment allowed for excessive risk-taking, inadequate capital requirements, and insufficient transparency in financial transactions. This environment contributed to the housing bubble and the subsequent collapse of major financial institutions, leading to widespread economic turmoil.
In response to the crisis, the Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010 was enacted. This legislation introduced several key reforms:
1. **Creation of the Financial Stability Oversight Council (FSOC)**: This body was established to monitor systemic risks and coordinate regulatory efforts across different financial sectors. It has the authority to recommend heightened standards and safeguards for financial activities that could pose risks to financial stability.
2. **Enhanced Regulatory Oversight**: Dodd-Frank imposed stricter regulations on bank holding companies and nonbank financial companies, particularly those with significant assets. This includes requirements for stress testing, capital planning, and the submission of resolution plans to ensure orderly wind-downs in case of failure.
3. **Consumer Protection Measures**: The establishment of the Consumer Financial Protection Bureau (CFPB) aimed to protect consumers from predatory lending practices and ensure transparency in financial products.
4. **Volcker Rule**: This provision restricts proprietary trading by banks and limits their investments in hedge funds and private equity funds, thereby reducing conflicts of interest and excessive risk-taking.
5. **Increased Transparency and Reporting Requirements**: Financial institutions are now required to disclose more information regarding their risk exposures and financial health, which enhances market discipline and investor confidence.
The arguments for these reforms center around the need for a more resilient financial system that can withstand economic shocks. The reforms aim to address the systemic risks that were prevalent before the crisis, ensuring that financial institutions maintain adequate capital buffers and engage in prudent risk management practices.
In conclusion, the regulatory landscape has shifted significantly since the 2008 crisis, with a focus on preventing excessive risk-taking, enhancing transparency, and protecting consumers. These measures are designed to create a more stable financial environment and mitigate the likelihood of future crises.
CPU times: user 180 ms, sys: 437 μs, total: 180 ms
Wall time: 10.5 s

Once finished with the table, it is best practice to drop it.

table.drop()

We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.

Take the Survey