Deep Lake Vector Store Quickstart
Deep Lake can be installed using pip.
%pip install llama-index-vector-stores-deeplake
!pip install llama-index!pip install deeplake
Next, let’s import the required modules and set the needed environmental variables:
import osimport textwrap
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Documentfrom llama_index.vector_stores.deeplake import DeepLakeVectorStore
os.environ["OPENAI_API_KEY"] = "sk-********************************"os.environ["ACTIVELOOP_TOKEN"] = "********************************"
We are going to embed and store one of Paul Graham’s essays in a Deep Lake Vector Store stored locally. First, we download the data to a directory called data/paul_graham
import urllib.request
urllib.request.urlretrieve( "https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt", "data/paul_graham/paul_graham_essay.txt",)
We can now create documents from the source data file.
# load documentsdocuments = SimpleDirectoryReader("./data/paul_graham/").load_data()print( "Document ID:", documents[0].doc_id, "Document Hash:", documents[0].hash,)
Document ID: a98b6686-e666-41a9-a0bc-b79f0d666bde Document Hash: beaa54b3e9cea641e91e6975d2207af4f4200f4b2d629725d688f272372ce5bb
Finally, let’s create the Deep Lake Vector Store and populate it with data. We use a default tensor configuration, which creates tensors with text (str)
, metadata(json)
, id (str, auto-populated)
, embedding (float32)
. Learn more about tensor customizability here.
from llama_index.core import StorageContext
dataset_path = "./dataset/paul_graham"
# Create an index over the documentsvector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
Uploading data to deeplake dataset.
100%|██████████| 22/22 [00:00<00:00, 684.80it/s]
Dataset(path='./dataset/paul_graham', tensors=['text', 'metadata', 'embedding', 'id'])
tensor htype shape dtype compression ------- ------- ------- ------- ------- text text (22, 1) str None metadata json (22, 1) str None embedding embedding (22, 1536) float32 None id text (22, 1) str None
Performing Vector Search
Section titled “Performing Vector Search”Deep Lake offers highly-flexible vector search and hybrid search options discussed in detail in these tutorials. In this Quickstart, we show a simple example using default options.
query_engine = index.as_query_engine()response = query_engine.query( "What did the author learn?",)
print(textwrap.fill(str(response), 100))
The author learned that working on things that are not prestigious can be a good thing, as it canlead to discovering something real and avoiding the wrong track. The author also learned thatignorance can be beneficial, as it can lead to discovering something new and unexpected. The authoralso learned the importance of working hard, even at the parts of the job they don't like, in orderto set an example for others. The author also learned the value of unsolicited advice, as it can bebeneficial in unexpected ways, such as when Robert Morris suggested that the author should make sureY Combinator wasn't the last cool thing they did.
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))
The author experienced a hard moment when one of his programs on the IBM 1401 computer did notterminate. This was a social as well as a technical error, as the data center manager's expressionmade clear.
query_engine = index.as_query_engine()response = query_engine.query("What was a hard moment for the author?")print(textwrap.fill(str(response), 100))
The author experienced a hard moment when one of his programs on the IBM 1401 computer did notterminate. This was a social as well as a technical error, as the data center manager's expressionmade clear.
Deleting items from the database
Section titled “Deleting items from the database”To find the id of a document to delete, you can query the underlying deeplake dataset directly
import deeplake
ds = deeplake.load(dataset_path)
idx = ds.id[0].numpy().tolist()idx
./dataset/paul_graham loaded successfully.
['42f8220e-673d-4c65-884d-5a48a1a15b03']
index.delete(idx[0])