S3VectorStore Integration
This is a vector store integration for LlamaIndex that uses S3Vectors.
Find out more about S3Vectors.
This notebook will assume that you have already created a S3 vector bucket (and possibly also an index).
Installation
Section titled “Installation”%pip install llama-index-vector-stores-s3 llama-index-embeddings-openaiCreating the vector store object
Section titled “Creating the vector store object”You can create a new vector index in an existing S3 bucket.
If you don’t have S3 credentials configured in your environment, you can provide a boto3 session with credentials.
from llama_index.vector_stores.s3 import S3VectorStoreimport boto3
vector_store = S3VectorStore.create_index_from_bucket(    # S3 bucket name or ARN    bucket_name_or_arn="test-bucket",    # Name for the new index    index_name="my-index",    # Vector dimension (e.g., 1536 for OpenAI embeddings)    dimension=1536,    # Distance metric: "cosine", "euclidean", etc.    distance_metric="cosine",    # Data type for vectors    data_type="float32",    # Batch size for inserting vectors (max 500)    insert_batch_size=500,    # Metadata keys that won't be filterable    non_filterable_metadata_keys=["custom_field"],    # Optional: provide a boto3 session for custom AWS configuration    # sync_session=boto3.Session(...))Or, you can use an existing vector index in an existing S3 bucket.
from llama_index.vector_stores.s3 import S3VectorStoreimport boto3
vector_store = S3VectorStore(    # Index name or ARN    index_name_or_arn="my-index",    # S3 bucket name or ARN    bucket_name_or_arn="test-bucket",    # Data type for vectors (must match index)    data_type="float32",    # Distance metric (must match index)    distance_metric="cosine",    # Batch size for inserting vectors (max 500)    insert_batch_size=500,    # Optional: specify metadata field containing text if you already have a populated index    # text_field="content",    # Optional: provide a boto3 session for custom AWS configuration    # sync_session=boto3.Session(...),)Using the vector store with an index
Section titled “Using the vector store with an index”Once you have a vector store, you can use it with an index:
from llama_index.core import (    VectorStoreIndex,    StorageContext,    SimpleDirectoryReader,    Document,)from llama_index.embeddings.openai import OpenAIEmbedding
# Load some data# documents = SimpleDirectoryReader("data").load_data()
# Or create new documentsdocuments = [    Document(text="Hello, world!", metadata={"key": "1"}),    Document(text="Hello, world! 2", metadata={"key": "2"}),]
# Create a new indexindex = VectorStoreIndex.from_documents(    documents,    storage_context=StorageContext.from_defaults(vector_store=vector_store),    # optional: set the embed model    embed_model=OpenAIEmbedding(model="text-embedding-3-small", api_key="..."),)
# Or reload from an existing index# index = VectorStoreIndex.from_vector_store(#     vector_store=vector_store,#     # optional: set the embed model#     # embed_model=embed_model,# )nodes = index.as_retriever(similarity_top_k=1).retrieve("Hello, world!")print(nodes[0].text)Hello, world!You can also use filters to help you retrieve the correct nodes!
from llama_index.core.vector_stores.types import (    MetadataFilters,    MetadataFilter,    FilterOperator,    FilterCondition,)
nodes = index.as_retriever(    similarity_top_k=2,    filters=MetadataFilters(        filters=[            MetadataFilter(                key="key",                value="2",                operator=FilterOperator.EQ,            ),        ],        condition=FilterCondition.AND,    ),).retrieve("Hello, world!")
print(nodes[0].text)Hello, world! 2Using the vector store directly
Section titled “Using the vector store directly”You can also use the vector store directly:
from llama_index.core.schema import TextNodefrom llama_index.core.vector_stores.types import VectorStoreQueryfrom llama_index.embeddings.openai import OpenAIEmbedding
# embed nodesnodes = [    TextNode(text="Hello, world!"),    TextNode(text="Hello, world! 2"),]
embed_model = OpenAIEmbedding(model="text-embedding-3-small", api_key="...")embeddings = embed_model.get_text_embedding_batch([n.text for n in nodes])for node, embedding in zip(nodes, embeddings):    node.embedding = embedding
# add nodes to the vector storevector_store.add(nodes)
# query the vector storequery = VectorStoreQuery(    query_embedding=embed_model.get_query_embedding("Hello, world!"),    similarity_top_k=2,)results = vector_store.query(query)print(results.nodes[0].text)Hello, world!