Index V2 Configuration

Guide

Self-Hosting (BYOC)

Configuration

Configure the deployment-wide vector-store target for LlamaCloud Index V2 (MongoDB Atlas, Turbopuffer, PostgreSQL/pgvector, Azure AI Search, or custom) plus reranking, embedding, and chat-LLM settings via environment variables.

Self-Hosting Documentation Access Granted

LlamaCloud’s Index V2 resource ties a source directory to a vector store via a sync pipeline. The destination vector store — the index target — is configured deployment-wide via environment variables on the backend, jobs, and Temporal worker pods.

Index target storage is configured separately from the platform’s metadata databases (Postgres and MongoDB under Databases and Queues). The two serve different roles: the platform DB stores users, projects, jobs; the index target stores embeddings and chunks. Keeping them separate lets you pick a vector-optimized destination without coupling it to the rest of the deployment.

Supported Vector Store Exports

Target	When to use
MongoDB Atlas	Already using Atlas; want a managed service; need Atlas Search features alongside
Turbopuffer	High-scale vector workloads; per-namespace tenant isolation
PostgreSQL (pgvector)	No extra vector infrastructure — reuse your platform Postgres (or a dedicated one); the default for store-less (minimal profile) deployments
Azure AI Search	Already on Azure; want native hybrid search; need typed filterable metadata
Custom	Not seeing a native integration for your vector store? We encourage users to read the parsed output from a sync directly and write the export logic themselves. See our example repo for samples of multiple custom solutions.

The configured target is used for all indexes in the deployment. You can switch targets at any time by changing environment variables and re-syncing indexes. Index data is not portable across different targets, so a target switch triggers a full re-sync to the new destination.

The destination new indexes default to is set by config.defaultIndex.destination in Helm values (env DEFAULT_DIRECTORY_INDEX_DESTINATION; one of mongodb, turbopuffer, postgres, azureai_search). When unset, new indexes default to mongodb — or to postgres on deployments where MongoDB is disabled.

MongoDB Atlas

Requires a MongoDB Atlas cluster with Atlas Vector Search enabled. The Mongo target uses $vectorSearch aggregation and managed search indexes, both of which are Atlas-only features. Community MongoDB and mongo:7.0 images do not work as Mongo index targets; configure a different target instead, or use a real Atlas cluster.

Connection

Variable	Description
`DEFAULT_INDEX_MONGO_URI`	Full SRV or standard connection URI (e.g. `mongodb+srv://...`)
`DEFAULT_INDEX_MONGO_DB`	Database name
`DEFAULT_INDEX_MONGO_COLLECTION`	Collection name (a single collection holds all index data)

The URI is treated as a secret and stored in the rendered MongoDB connection Secret alongside MONGODB_URL. Avoid sharing the URI with the platform’s own MongoDB connection. Use a separate cluster or at minimum a separate database.

Database User Privileges

LlamaCloud creates collections, regular indexes, and Atlas Vector Search indexes programmatically on first use. The database user therefore needs two built-in roles scoped to the target database:

readWrite — covers insert, delete, find, aggregate (including $vectorSearch), createCollection, listCollections, and creating regular indexes.
dbAdmin — covers the Atlas Search commands createSearchIndexes and listSearchIndexes, which are not in readWrite.

readWrite alone is insufficient: the sync workflow fails on its first export with a privilege error when it tries to create the vector search index. The platform never drops indexes or collections, so dropSearchIndex and dropCollection are not required — a least-privilege custom Atlas role granting the readWrite actions plus only createSearchIndexes and listSearchIndexes is sufficient if your security review demands tighter scoping.

Turbopuffer

Requires a Turbopuffer account and API key. Each Index gets its own Turbopuffer namespace derived from the Index ID, so a single Turbopuffer account scales to arbitrarily many indexes without per-index infrastructure.

Variable	Description
`DEFAULT_INDEX_TPUF_API_KEY`	Turbopuffer API key
`DEFAULT_INDEX_TPUF_REGION`	Region for new namespaces (default `aws-us-east-1`)
`DEFAULT_INDEX_TPUF_NAMESPACE_PREFIX`	Prefix prepended to every namespace name (default `llamacloud`) — final namespace is `{prefix}-{index_id}`

PostgreSQL (pgvector)

Stores chunks and embeddings in a PostgreSQL database using the pgvector extension — no extra vector infrastructure. This is the default target for store-less (minimal profile) deployments, and the connection may point at the platform’s own Postgres or a dedicated database.

Variable	Description
`DEFAULT_INDEX_PG_HOST`	PostgreSQL host
`DEFAULT_INDEX_PG_PORT`	Port (default `5432`)
`DEFAULT_INDEX_PG_NAME`	Database name
`DEFAULT_INDEX_PG_USER`	Username
`DEFAULT_INDEX_PG_PASSWORD`	Password

In Helm values these render from config.defaultIndex.postgres.{host,port,database,username,password}. Any field left empty inherits the matching inline postgresql.* value, so deployments that pass platform-Postgres credentials inline (including the minimal profile) need no extra configuration. If the platform Postgres is supplied via an existing secret (postgresql.secret), set the config.defaultIndex.postgres.* fields explicitly or provide config.defaultIndex.secret.

Requirements and behavior:

The server must have the pgvector extension available. LlamaCloud runs CREATE EXTENSION IF NOT EXISTS vector on first export. pgvector is not a trusted extension, so creating it normally requires superuser-level rights — on managed services the standard mechanisms apply (Amazon RDS master user / rds_superuser, the azure.extensions allow-list on Azure, Cloud SQL’s supported-extension flow). If the LlamaCloud database user cannot create extensions, pre-create it once as an admin (CREATE EXTENSION vector;) — the app’s CREATE EXTENSION IF NOT EXISTS then no-ops.
One table is created per embedding model, named index_v2_<embedding-model>, with an HNSW index on the embedding column. Chunk text and metadata live alongside the vector in the same row.
Tenant isolation within the table is filter-based (per export config / tenant id), matching the other managed targets.

Azure AI Search

Requires an Azure AI Search service with vector search enabled. LlamaCloud creates one index per (export config, embedding model) on first sync.

Variable	Description
`DEFAULT_INDEX_AZUREAI_SEARCH_ENDPOINT`	Service endpoint, e.g. `https://<service>.search.windows.net`
`DEFAULT_INDEX_AZUREAI_SEARCH_AUTHENTICATION_CONFIGURATION`	JSON auth config; for API-key auth: `{"type": "api_key", "api_key": "<admin-key>"}`. The key must be allowed to create indexes.
`DEFAULT_INDEX_AZUREAI_SEARCH_INDEX_NAME_PREFIX`	Optional prefix for created index names (default `llamacloud`)

Arbitrary chunk metadata is written to a typed metadata complex collection on the index, with one entry per key: val_s (string, lowercase-normalized for case-insensitive equality), val_n (double), val_b (boolean), and val_d (datetime — populated when a string parses as ISO 8601). list[str] values expand to one entry per item. Bag fields are filter-only — Azure does not support $orderby or facet correlation across collection sub-fields — so query via OData any() lambdas:

$filter=metadata/any(m: m/key eq 'department' and m/val_s eq 'legal')
$filter=metadata/any(m: m/key eq 'created_at' and m/val_d ge 2026-01-01T00:00:00Z)

Azure caps complex-collection elements at 3000 per document; entries beyond that count are dropped silently at ingest. Adding a new top-level field to an existing index is seamless (existing documents see null), but changing the type or filterable/sortable/facetable attribute of an existing field requires dropping and recreating the index — a full re-sync.

Reranking

Index retrieval optionally runs a reranker over the top-K vector results. Reranking is enabled by default and uses Cohere rerank-v3.5 if a Cohere key is configured. To disable reranking entirely (e.g. for deployments without Cohere or Bifrost access):

Variable	Description
`RERANK_ENABLED`	Set to `false` to disable reranking globally. Defaults to `true`.

Cohere

Variable	Description
`COHERE_PRIVATE_KEY`	Cohere API key
`COHERE_RERANK_LANGUAGE`	One of `multi`, `en`, `foreign`. Defaults to `multi`.

Gateway (Optional)

If you’ve deployed an LLM gateway such as Bifrost or Portkey, you can route reranking through it instead of calling Cohere directly. Useful for centralized credential management or per-tenant routing.

Variable	Description
`RERANK_GATEWAY_ENABLED`	`true` to route reranking through the gateway
`RERANK_GATEWAY`	Gateway used for attribution headers: `bifrost` (default) or `portkey`
`RERANK_GATEWAY_BASE_URL`	Gateway base URL
`RERANK_GATEWAY_DEFAULT_MODEL`	Provider/model string (default `cohere/rerank-v3.5`)
`RERANK_GATEWAY_DEFAULT_TOP_N`	Default number of results to return (default `10`)
`RERANK_GATEWAY_TIMEOUT_SECONDS`	Request timeout in seconds (default `30`)
`COHERE_PRIVATE_KEY`	Credential sent as `Authorization: Bearer` by the Cohere SDK. Still required to build the client even when the gateway authenticates via a custom header (use a placeholder if unused).

When both a gateway and direct Cohere are configured, the gateway takes precedence.

The legacy BIFROST_RERANK_* variables remain supported as aliases. If both names are set for a field, the corresponding RERANK_GATEWAY* variable takes precedence.

Gateways that authenticate via a custom header (e.g. Portkey)

Some gateways do not accept Authorization: Bearer for gateway auth and instead require a custom header (Portkey uses x-portkey-api-key). Sending only the SDK’s bearer token makes the gateway forward that key upstream, producing a 401 from the provider (e.g. Cohere).

Reranking reuses the shared LLM_HEADERS setting (the same mechanism used for chat/embedding gateway auth). Set the gateway’s auth header there and it is attached to every rerank request alongside the observability headers:

LLM_HEADERS_ENABLED=true
LLM_HEADERS={"x-portkey-api-key": "<portkey-api-key>"}

For example, to rerank a Bedrock-hosted Cohere model through Portkey:

RERANK_GATEWAY_ENABLED=true
RERANK_GATEWAY=portkey
RERANK_GATEWAY_BASE_URL=https://<your-portkey-host>/v1
RERANK_GATEWAY_DEFAULT_MODEL=@bedrock-uswest2/cohere.rerank-v3-5
COHERE_PRIVATE_KEY=<placeholder-or-unused>
LLM_HEADERS_ENABLED=true
LLM_HEADERS={"x-portkey-api-key": "<portkey-api-key>"}

Embeddings and Chat LLM

Index sync and retrieval also need:

An embedding model to convert chunks into vectors at sync time
A chat LLM to answer questions against the index at query time

Both are resolved through LlamaCloud’s centralized LLM configuration. See Centralized Provider Configuration for the full setup. Models become available to Index operations as soon as the corresponding provider is configured under config.llms.* in your Helm values, or registered via config.llms.providerConfigs for fine-grained control.

Embedding Model

The embedding model is hardcoded to openai-text-embedding-3-small. It cannot be overridden per-index and there is no automatic fallback to another model.

If openai-text-embedding-3-small is registered in centralized LLM config, sync works.
If it is not, sync fails at workflow runtime with an EmbeddingConfigNotFoundError. There is no fallback to any other model.

This means an operator must either:

Configure the OpenAI provider directly (config.llms.openAi.apiKey), which auto-registers both embedding models, or
Register openai-text-embedding-3-small explicitly via providerConfigs — useful when routing through a gateway or a non-OpenAI-hosted endpoint (see Custom Endpoints below).

Chat LLM

The chat agent prefers OpenAI GPT-5-family models, in this order:

openai-gpt-5-4
openai-gpt-5-4-mini
openai-gpt-5-2
openai-gpt-5
openai-gpt-5-mini

The first available model in the list is used. If none of the preferred models is registered, the resolver falls back to any other LLM in the centralized config (matched by priority). If the centralized config holds no chat-capable LLM at all, the chat endpoint returns status=UNAVAILABLE with a message indicating that no chat LLM is registered.

Notes:

The chat agent currently uses the OpenAI raw client, so non-OpenAI models registered through providerConfigs need to be served via an OpenAI-compatible endpoint (see Custom Endpoints).
Index creation does not validate that a chat LLM is available — the failure surfaces only when a chat message is sent.

Custom Endpoints (OpenAI-Compatible Gateways)

Every provider in providerConfigs supports a credentials.base_url field. This makes it possible to point LlamaCloud at LiteLLM, Ollama, Azure AI Foundry proxies, or any other OpenAI-compatible endpoint while continuing to use the OpenAI model identifiers (openai-gpt-5-4, openai-text-embedding-3-small, etc.):

config:
  llms:
    providerConfigs:
      - id: "embedding-via-litellm"
        provider: "openai"
        model_id: "openai-text-embedding-3-small"
        enabled: true
        priority: 200
        credentials:
          api_key: "sk-litellm-..."
          base_url: "https://litellm.example.com/v1"
      - id: "chat-via-litellm"
        provider: "openai"
        model_id: "openai-gpt-5-4"
        enabled: true
        priority: 200
        credentials:
          api_key: "sk-litellm-..."
          base_url: "https://litellm.example.com/v1"

The same base-url override is honored for anthropic and gemini providers (set on their respective credentials blocks). Azure uses a separate credentials.endpoint field — see Centralized Provider Configuration for the full credentials schema by provider.

Verifying Configuration

After applying your Helm values, check that each target is correctly configured by hitting the configz endpoint:

curl -H "Authorization: Bearer $TOKEN" \
     -H "Project-Id: $PROJECT_ID" \
     https://<your-llamacloud-host>/api/v1/indexes/configz

The response lists which vector targets, embedding models, chat LLMs, and rerankers are available. Targets with missing or invalid env vars appear as unconfigured.

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/