TiDB Graph Store

%pip install llama-index-llms-openai
%pip install llama-index-graph-stores-tidb
%pip install llama-index-embeddings-openai
%pip install llama-index-llms-azure-openai

# For OpenAI

import os

os.environ["OPENAI_API_KEY"] = "sk-xxxxxxx"

import logging
import sys
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

logging.basicConfig(stream=sys.stdout, level=logging.INFO)

# define LLM
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512

# For Azure OpenAI
import os
import openai
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

openai.api_type = "azure"
openai.api_base = "https://<foo-bar>.openai.azure.com"
openai.api_version = "2022-12-01"
os.environ["OPENAI_API_KEY"] = "<your-openai-key>"
openai.api_key = os.getenv("OPENAI_API_KEY")

llm = AzureOpenAI(
    deployment_name="<foo-bar-deployment>",
    temperature=0,
    openai_api_version=openai.api_version,
    model_kwargs={
        "api_key": openai.api_key,
        "api_base": openai.api_base,
        "api_type": openai.api_type,
        "api_version": openai.api_version,
    },
)

# You need to deploy your own embedding model as well as your own chat completion model
embedding_llm = OpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="<foo-bar-deployment>",
    api_key=openai.api_key,
    api_base=openai.api_base,
    api_type=openai.api_type,
    api_version=openai.api_version,
)

Settings.llm = llm
Settings.embed_model = embedding_llm
Settings.chunk_size = 512

Using Knowledge Graph with TiDB

Prepare a TiDB cluster

TiDB Cloud [Recommended], a fully managed TiDB service that frees you from the complexity of database operations.
TiUP, use `tiup playground“ to create a local TiDB cluster for testing.

Get TiDB connection string

For example: mysql+pymysql://user:password@host:4000/dbname, in TiDBGraphStore we use pymysql as the db driver, so the connection string should be mysql+pymysql://....

If you are using a TiDB Cloud serverless cluster with public endpoint, it requires TLS connection, so the connection string should be like mysql+pymysql://user:password@host:4000/dbname?ssl_verify_cert=true&ssl_verify_identity=true.

Replace user, password, host, dbname with your own values.

Initialize TiDBGraphStore

from llama_index.graph_stores.tidb import TiDBGraphStore

graph_store = TiDBGraphStore(
    db_connection_string="mysql+pymysql://user:password@host:4000/dbname"
)

Instantiate TiDB KG Indexes

from llama_index.core import (
    KnowledgeGraphIndex,
    SimpleDirectoryReader,
    StorageContext,
)

documents = SimpleDirectoryReader(
    "../../../examples/data/paul_graham/"
).load_data()

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
    documents=documents,
    storage_context=storage_context,
    max_triplets_per_chunk=2,
)

Querying the Knowledge Graph

query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
WARNING:llama_index.core.indices.knowledge_graph.retrievers:Index was not constructed with embeddings, skipping embedding usage...
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

from IPython.display import Markdown, display

display(Markdown(f"<b>{response}</b>"))

Interleaf was a software company that developed a scripting language and was known for its software products. It was inspired by Emacs and faced challenges due to Moore’s law. Over time, Interleaf’s prominence declined.