Skip to content

Wait until the cluster is ready for use.

cluster.wait_until_ready(timedelta(seconds=5))

### Creating the Search Index
Currently, the Search index needs to be created from the Couchbase Capella or Server UI or using the REST interface.
Let us define a Search index with the name `vector-index` on the `testing` bucket
For this example, let us use the Import Index feature on the Search Service on the UI.
We are defining an index on the testing bucket’s `_default` scope on the `_default` collection with the vector field set to `embedding` with 1536 dimensions and the text field set to text. We are also indexing and storing all the fields under metadata in the document as a dynamic mapping to account for varying document structures. The similarity metric is set to `dot_product`.
#### How to Import an Index to the Full Text Search service?
- [Couchbase Server](https://docs.couchbase.com/server/current/search/import-search-index.html)
- Click on Search -> Add Index -> Import
- Copy the following Index definition in the Import screen
- Click on Create Index to create the index.
- [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html)
- Copy the index definition to a new file `index.json`
- Import the file in Capella using the instructions in the documentation.
- Click on Create Index to create the index.
#### Index Definition

{ “name”: “vector-index”, “type”: “fulltext-index”, “params”: { “doc_config”: { “docid_prefix_delim”: "", “docid_regexp”: "", “mode”: “type_field”, “type_field”: “type” }, “mapping”: { “default_analyzer”: “standard”, “default_datetime_parser”: “dateTimeOptional”, “default_field”: “_all”, “default_mapping”: { “dynamic”: true, “enabled”: true, “properties”: { “metadata”: { “dynamic”: true, “enabled”: true }, “embedding”: { “enabled”: true, “dynamic”: false, “fields”: [ { “dims”: 1536, “index”: true, “name”: “embedding”, “similarity”: “dot_product”, “type”: “vector”, “vector_index_optimized_for”: “recall” } ] }, “text”: { “enabled”: true, “dynamic”: false, “fields”: [ { “index”: true, “name”: “text”, “store”: true, “type”: “text” } ] } } }, “default_type”: “_default”, “docvalues_dynamic”: false, “index_dynamic”: true, “store_dynamic”: true, “type_field”: “_type” }, “store”: { “indexType”: “scorch”, “segmentVersion”: 16 } }, “sourceType”: “gocbcore”, “sourceName”: “testing”, “sourceParams”: {}, “planParams”: { “maxPartitionsPerPIndex”: 103, “indexPartitions”: 10, “numReplicas”: 0 } }

We will now set the bucket, scope, and collection names in the Couchbase cluster that we want to use for Vector Search.
For this example, we are using the default scope & collections.
```python
BUCKET_NAME = "testing"
SCOPE_NAME = "_default"
COLLECTION_NAME = "_default"
SEARCH_INDEX_NAME = "vector-index"
# Import required packages
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import StorageContext
from llama_index.core import Settings
from llama_index.vector_stores.couchbase import CouchbaseSearchVectorStore

For this tutorial, we will use OpenAI embeddings

import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-04-09 23:31:46-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’
data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.008s
2024-04-09 23:31:46 (8.97 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
vector_store = CouchbaseSearchVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
index_name=SEARCH_INDEX_NAME,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"

We will ask the query engine a question about the essay we just indexed.

query_engine = index.as_query_engine()
response = query_engine.query("What were his investments in Y Combinator?")
print(response)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
His investments in Y Combinator were $6k per founder, totaling $12k in the typical two-founder case, in return for 6% equity.

We will create some example documents with metadata so that we can see how to filter documents based on metadata.

from llama_index.core.schema import TextNode
nodes = [
TextNode(
text="The Shawshank Redemption",
metadata={
"author": "Stephen King",
"theme": "Friendship",
},
),
TextNode(
text="The Godfather",
metadata={
"director": "Francis Ford Coppola",
"theme": "Mafia",
},
),
TextNode(
text="Inception",
metadata={
"director": "Christopher Nolan",
},
),
]
vector_store.add(nodes)
['5abb42cf-7312-46eb-859e-60df4f92842a',
'b90525f4-38bf-453c-a51a-5f0718bccc98',
'22f732d0-da17-4bad-b3cd-b54e2102367a']
# Metadata filter
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters
filters = MetadataFilters(
filters=[ExactMatchFilter(key="theme", value="Mafia")]
)
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is inception about?")
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[NodeWithScore(node=TextNode(id_='b90525f4-38bf-453c-a51a-5f0718bccc98', embedding=None, metadata={'director': 'Francis Ford Coppola', 'theme': 'Mafia'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='The Godfather', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.3068528194400547)]

Couchbase supports ExactMatchFilters only at the moment via LlamaIndex. Couchbase supports a wide range of filters, including range filters, geospatial filters, and more. To use these filters, you can pass them in as a list of dictionaries to the cb_search_options parameter. The different search/query possibilities for the search_options can be found here.

def custom_query(query, query_str):
print("custom query", query)
return query
query_engine = index.as_query_engine(
vector_store_kwargs={
"cb_search_options": {
"query": {"match": "growing up", "field": "text"}
},
"custom_query": custom_query,
}
)
response = query_engine.query("what were his investments in Y Combinator?")
print(response)
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
His investments in Y Combinator were based on a combination of the deal he did with Julian ($10k for 10%) and what Robert said MIT grad students got for the summer ($6k). He invested $6k per founder, which in the typical two-founder case was $12k, in return for 6%.