Storing
Once you have data loaded and indexed, you will probably want to store it to avoid the time and cost of re-indexing it. By default, your indexed data is stored only in memory.
Persisting to disk
Section titled “Persisting to disk”The simplest way to store your indexed data is to use the built-in .persist()
method of every Index, which writes all the data to disk at the location specified. This works for any type of index.
index.storage_context.persist(persist_dir="<persist_dir>")
Here is an example of a Composable Graph:
graph.root_index.storage_context.persist(persist_dir="<persist_dir>")
You can then avoid re-loading and re-indexing your data by loading the persisted index like this:
from llama_index.core import StorageContext, load_index_from_storage
# rebuild storage contextstorage_context = StorageContext.from_defaults(persist_dir="<persist_dir>")
# load indexindex = load_index_from_storage(storage_context)
!!! tip
Important: if you had initialized your index with a custom transformations
, embed_model
, etc., you will need to pass in the same options during load_index_from_storage
, or have it set as the global settings.
Using Vector Stores
Section titled “Using Vector Stores”As discussed in indexing, one of the most common types of Index is the VectorStoreIndex. The API calls to create the embeddings in a VectorStoreIndex can be expensive in terms of time and money, so you will want to store them to avoid having to constantly re-index things.
LlamaIndex supports a huge number of vector stores which vary in architecture, complexity and cost. In this example we’ll be using Chroma, an open-source vector store.
First you will need to install chroma:
pip install chromadb
To use Chroma to store the embeddings from a VectorStoreIndex, you need to:
- initialize the Chroma client
- create a Collection to store your data in Chroma
- assign Chroma as the
vector_store
in aStorageContext
- initialize your VectorStoreIndex using that StorageContext
Here’s what that looks like, with a sneak peek at actually querying the data:
import chromadbfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.vector_stores.chroma import ChromaVectorStorefrom llama_index.core import StorageContext
# load some documentsdocuments = SimpleDirectoryReader("./data").load_data()
# initialize client, setting path to save datadb = chromadb.PersistentClient(path="./chroma_db")
# create collectionchroma_collection = db.get_or_create_collection("quickstart")
# assign chroma as the vector_store to the contextvector_store = ChromaVectorStore(chroma_collection=chroma_collection)storage_context = StorageContext.from_defaults(vector_store=vector_store)
# create your indexindex = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
# create a query engine and queryquery_engine = index.as_query_engine()response = query_engine.query("What is the meaning of life?")print(response)
If you’ve already created and stored your embeddings, you’ll want to load them directly without loading your documents or creating a new VectorStoreIndex:
import chromadbfrom llama_index.core import VectorStoreIndexfrom llama_index.vector_stores.chroma import ChromaVectorStorefrom llama_index.core import StorageContext
# initialize clientdb = chromadb.PersistentClient(path="./chroma_db")
# get collectionchroma_collection = db.get_or_create_collection("quickstart")
# assign chroma as the vector_store to the contextvector_store = ChromaVectorStore(chroma_collection=chroma_collection)storage_context = StorageContext.from_defaults(vector_store=vector_store)
# load your index from stored vectorsindex = VectorStoreIndex.from_vector_store( vector_store, storage_context=storage_context)
# create a query enginequery_engine = index.as_query_engine()response = query_engine.query("What is llama2?")print(response)
!!! tip We have a more thorough example of using Chroma if you want to go deeper on this store.
You’re ready to query!
Section titled “You’re ready to query!”Now you have loaded data, indexed it, and stored that index, you’re ready to query your data.
Inserting Documents or Nodes
Section titled “Inserting Documents or Nodes”If you’ve already created an index, you can add new documents to your index using the insert
method.
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex([])for doc in documents: index.insert(doc)
See the document management how-to for more details on managing documents and an example notebook.