Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))
### Creating the Search IndexCurrently, the Search index needs to be created from the Couchbase Capella or Server UI or using the REST interface.
Let us define a Search index with the name `vector-index` on the `testing` bucket
For this example, let us use the Import Index feature on the Search Service on the UI.
We are defining an index on the testing bucket’s `_default` scope on the `_default` collection with the vector field set to `embedding` with 1536 dimensions and the text field set to text. We are also indexing and storing all the fields under metadata in the document as a dynamic mapping to account for varying document structures. The similarity metric is set to `dot_product`.
#### How to Import an Index to the Full Text Search service?
- [Couchbase Server](https://docs.couchbase.com/server/current/search/import-search-index.html) - Click on Search -> Add Index -> Import - Copy the following Index definition in the Import screen - Click on Create Index to create the index.
- [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html) - Copy the index definition to a new file `index.json` - Import the file in Capella using the instructions in the documentation. - Click on Create Index to create the index.
#### Index Definition
{ “name”: “vector-index”, “type”: “fulltext-index”, “params”: { “doc_config”: { “docid_prefix_delim”: "", “docid_regexp”: "", “mode”: “type_field”, “type_field”: “type” }, “mapping”: { “default_analyzer”: “standard”, “default_datetime_parser”: “dateTimeOptional”, “default_field”: “_all”, “default_mapping”: { “dynamic”: true, “enabled”: true, “properties”: { “metadata”: { “dynamic”: true, “enabled”: true }, “embedding”: { “enabled”: true, “dynamic”: false, “fields”: [ { “dims”: 1536, “index”: true, “name”: “embedding”, “similarity”: “dot_product”, “type”: “vector”, “vector_index_optimized_for”: “recall” } ] }, “text”: { “enabled”: true, “dynamic”: false, “fields”: [ { “index”: true, “name”: “text”, “store”: true, “type”: “text” } ] } } }, “default_type”: “_default”, “docvalues_dynamic”: false, “index_dynamic”: true, “store_dynamic”: true, “type_field”: “_type” }, “store”: { “indexType”: “scorch”, “segmentVersion”: 16 } }, “sourceType”: “gocbcore”, “sourceName”: “testing”, “sourceParams”: {}, “planParams”: { “maxPartitionsPerPIndex”: 103, “indexPartitions”: 10, “numReplicas”: 0 } }
We will now set the bucket, scope, and collection names in the Couchbase cluster that we want to use for Vector Search.
For this example, we are using the default scope & collections.
```pythonBUCKET_NAME = "testing"SCOPE_NAME = "_default"COLLECTION_NAME = "_default"SEARCH_INDEX_NAME = "vector-index"
# Import required packagesfrom llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core import StorageContextfrom llama_index.core import Settingsfrom llama_index.vector_stores.couchbase import CouchbaseSearchVectorStore
For this tutorial, we will use OpenAI embeddings
import osimport getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········
import loggingimport sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
Download Data
Section titled “Download Data”!mkdir -p 'data/paul_graham/'!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-04-09 23:31:46-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txtResolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8003::154, ...Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 75042 (73K) [text/plain]Saving to: ‘data/paul_graham/paul_graham_essay.txt’
data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.008s
2024-04-09 23:31:46 (8.97 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
Load the documents
Section titled “Load the documents”# load documentsdocuments = SimpleDirectoryReader("./data/paul_graham/").load_data()
vector_store = CouchbaseSearchVectorStore( cluster=cluster, bucket_name=BUCKET_NAME, scope_name=SCOPE_NAME, collection_name=COLLECTION_NAME, index_name=SEARCH_INDEX_NAME,)
storage_context = StorageContext.from_defaults(vector_store=vector_store)index = VectorStoreIndex.from_documents( documents, storage_context=storage_context)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Basic Example
Section titled “Basic Example”We will ask the query engine a question about the essay we just indexed.
query_engine = index.as_query_engine()response = query_engine.query("What were his investments in Y Combinator?")print(response)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"His investments in Y Combinator were $6k per founder, totaling $12k in the typical two-founder case, in return for 6% equity.
Metadata Filters
Section titled “Metadata Filters”We will create some example documents with metadata so that we can see how to filter documents based on metadata.
from llama_index.core.schema import TextNode
nodes = [ TextNode( text="The Shawshank Redemption", metadata={ "author": "Stephen King", "theme": "Friendship", }, ), TextNode( text="The Godfather", metadata={ "director": "Francis Ford Coppola", "theme": "Mafia", }, ), TextNode( text="Inception", metadata={ "director": "Christopher Nolan", }, ),]vector_store.add(nodes)
['5abb42cf-7312-46eb-859e-60df4f92842a', 'b90525f4-38bf-453c-a51a-5f0718bccc98', '22f732d0-da17-4bad-b3cd-b54e2102367a']
# Metadata filterfrom llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters
filters = MetadataFilters( filters=[ExactMatchFilter(key="theme", value="Mafia")])
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?")
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[NodeWithScore(node=TextNode(id_='b90525f4-38bf-453c-a51a-5f0718bccc98', embedding=None, metadata={'director': 'Francis Ford Coppola', 'theme': 'Mafia'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='The Godfather', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.3068528194400547)]
Custom Filters and overriding Query
Section titled “Custom Filters and overriding Query”Couchbase supports ExactMatchFilters
only at the moment via LlamaIndex. Couchbase supports a wide range of filters, including range filters, geospatial filters, and more. To use these filters, you can pass them in as a list of dictionaries to the cb_search_options
parameter.
The different search/query possibilities for the search_options can be found here.
def custom_query(query, query_str): print("custom query", query) return query
query_engine = index.as_query_engine( vector_store_kwargs={ "cb_search_options": { "query": {"match": "growing up", "field": "text"} }, "custom_query": custom_query, })response = query_engine.query("what were his investments in Y Combinator?")print(response)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"His investments in Y Combinator were based on a combination of the deal he did with Julian ($10k for 10%) and what Robert said MIT grad students got for the summer ($6k). He invested $6k per founder, which in the typical two-founder case was $12k, in return for 6%.