Kùzu Graph Store
This notebook walks through configuring Kùzu
to be the backend for graph storage in LlamaIndex.
%pip install llama-index%pip install llama-index-llms-openai%pip install llama-index-graph-stores-kuzu%pip install pyvis
# My OpenAI Keyimport os
os.environ["OPENAI_API_KEY"] = "API_KEY_HERE"
Prepare for Kùzu
Section titled “Prepare for Kùzu”# Clean up all the directories used in this notebookimport shutil
shutil.rmtree("./test1", ignore_errors=True)shutil.rmtree("./test2", ignore_errors=True)shutil.rmtree("./test3", ignore_errors=True)
import kuzu
db = kuzu.Database("test1")
Using Knowledge Graph with KuzuGraphStore
Section titled “Using Knowledge Graph with KuzuGraphStore”from llama_index.graph_stores.kuzu import KuzuGraphStore
graph_store = KuzuGraphStore(db)
Building the Knowledge Graph
Section titled “Building the Knowledge Graph”from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndexfrom llama_index.llms.openai import OpenAIfrom llama_index.core import Settingsfrom IPython.display import Markdown, displayimport kuzu
documents = SimpleDirectoryReader( "../../../examples/data/paul_graham").load_data()
# define LLM
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")Settings.llm = llmSettings.chunk_size = 512
from llama_index.core import StorageContext
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# NOTE: can take a while!index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, storage_context=storage_context,)# # To reload from an existing graph store without recomputing each time, use:# index = KnowledgeGraphIndex(nodes=[], storage_context=storage_context)
Querying the Knowledge Graph
Section titled “Querying the Knowledge Graph”First, we can query and send only the triplets to the LLM.
query_engine = index.as_query_engine( include_text=False, response_mode="tree_summarize")response = query_engine.query( "Tell me more about Interleaf",)
display(Markdown(f"<b>{response}</b>"))
Interleaf was involved in making software, added a scripting language, was inspired by Emacs, taught what not to do, built impressive technology, and made software that became obsolete and was replaced by a service. Additionally, Interleaf made software that could launch as soon as it was done and was affected by rapid changes in the industry.
For more detailed answers, we can also send the text from where the retrieved tripets were extracted.
query_engine = index.as_query_engine( include_text=True, response_mode="tree_summarize")response = query_engine.query( "Tell me more about Interleaf",)
display(Markdown(f"<b>{response}</b>"))
Interleaf was a company that made software for creating documents. They added a scripting language inspired by Emacs, making it a dialect of Lisp. Despite having smart people and building impressive technology, Interleaf ultimately faced challenges due to the rapid advancements in technology, as they were affected by Moore’s Law. The software they created could be launched as soon as it was done, and they made use of software that was considered slick in 1996. Additionally, Interleaf’s experience taught valuable lessons about the importance of being run by product people rather than sales people in technology companies, the risks of editing code by too many people, the significance of office environment on productivity, and the impact of conventional office hours on optimal hacking times.
Query with embeddings
Section titled “Query with embeddings”# NOTE: can take a while!db = kuzu.Database("test2")graph_store = KuzuGraphStore(db)storage_context = StorageContext.from_defaults(graph_store=graph_store)new_index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, storage_context=storage_context, include_embeddings=True,)
# query using top 3 triplets plus keywords (duplicate triplets are removed)query_engine = index.as_query_engine( include_text=True, response_mode="tree_summarize", embedding_mode="hybrid", similarity_top_k=5,)response = query_engine.query( "Tell me more about what the author worked on at Interleaf",)
display(Markdown(f"<b>{response}</b>"))
The author worked on software at Interleaf, a company that made software for creating documents. The software the author worked on was an online store builder, which required a private launch before a public launch to recruit an initial set of users. The author also learned valuable lessons at Interleaf, such as the importance of having technology companies run by product people, the pitfalls of editing code by too many people, the significance of office environment on productivity, and the impact of big bureaucratic customers. Additionally, the author discovered that low-end software tends to outperform high-end software, emphasizing the importance of being the “entry level” option in the market.
Visualizing the Graph
Section titled “Visualizing the Graph”## create graphfrom pyvis.network import Network
g = index.get_networkx_graph()net = Network(notebook=True, cdn_resources="in_line", directed=True)net.from_nx(g)net.show("kuzugraph_draw.html")
kuzugraph_draw.html
[Optional] Try building the graph and manually add triplets!
Section titled “[Optional] Try building the graph and manually add triplets!”from llama_index.core.node_parser import SentenceSplitter
node_parser = SentenceSplitter()
nodes = node_parser.get_nodes_from_documents(documents)
# initialize an empty databasedb = kuzu.Database("test3")graph_store = KuzuGraphStore(db)storage_context = StorageContext.from_defaults(graph_store=graph_store)index = KnowledgeGraphIndex( [], storage_context=storage_context,)
# add keyword mappings and nodes manually# add triplets (subject, relationship, object)
# for node 0node_0_tups = [ ("author", "worked on", "writing"), ("author", "worked on", "programming"),]for tup in node_0_tups: index.upsert_triplet_and_node(tup, nodes[0])
# for node 1node_1_tups = [ ("Interleaf", "made software for", "creating documents"), ("Interleaf", "added", "scripting language"), ("software", "generate", "web sites"),]for tup in node_1_tups: index.upsert_triplet_and_node(tup, nodes[1])
query_engine = index.as_query_engine( include_text=False, response_mode="tree_summarize")response = query_engine.query( "Tell me more about Interleaf",)
str(response)
'Interleaf was involved in creating documents and also added a scripting language to its software.'