Index V2 Configuration
Self-Hosting Documentation Access
This section requires a password to access. Interested in self-hosting? Contact sales to learn more.
LlamaCloud’s Index V2 resource ties a source directory to a vector store via a sync pipeline. The destination vector store — the index target — is configured deployment-wide via environment variables on the backend, jobs, and Temporal worker pods.
Index target storage is configured separately from the platform’s metadata databases (Postgres and MongoDB under Databases and Queues). The two serve different roles: the platform DB stores users, projects, jobs; the index target stores embeddings and chunks. Keeping them separate lets you pick a vector-optimized destination without coupling it to the rest of the deployment.
Supported Vector Store Exports
Section titled “Supported Vector Store Exports”| Target | When to use |
|---|---|
| MongoDB Atlas | Already using Atlas; want a managed service; need Atlas Search features alongside |
| Turbopuffer | High-scale vector workloads; per-namespace tenant isolation |
| Custom | Not seeing a native integration for your vector store? We encourage users to read the parsed output from a sync directly and write the export logic themselves. See our example repo for samples of multiple custom solutions. |
The configured target is used for all indexes in the deployment. You can switch targets at any time by changing environment variables and re-syncing indexes. Index data is not portable across different targets, so a target switch triggers a full re-sync to the new destination.
MongoDB Atlas
Section titled “MongoDB Atlas”Requires a MongoDB Atlas cluster with Atlas Vector Search enabled. The Mongo target uses $vectorSearch aggregation and managed search indexes, both of which are Atlas-only features. Community MongoDB and mongo:7.0 images do not work as Mongo index targets; configure a different target instead, or use a real Atlas cluster.
Connection
Section titled “Connection”| Variable | Description |
|---|---|
DEFAULT_INDEX_MONGO_URI | Full SRV or standard connection URI (e.g. mongodb+srv://...) |
DEFAULT_INDEX_MONGO_DB | Database name |
DEFAULT_INDEX_MONGO_COLLECTION | Collection name (a single collection holds all index data) |
The URI is treated as a secret and stored in the rendered MongoDB connection Secret alongside MONGODB_URL. Avoid sharing the URI with the platform’s own MongoDB connection. Use a separate cluster or at minimum a separate database.
Database User Privileges
Section titled “Database User Privileges”LlamaCloud creates collections, regular indexes, and Atlas Vector Search indexes programmatically on first use. The database user therefore needs two built-in roles scoped to the target database:
readWrite— coversinsert,delete,find,aggregate(including$vectorSearch),createCollection,listCollections, and creating regular indexes.dbAdmin— covers the Atlas Search commandscreateSearchIndexesandlistSearchIndexes, which are not inreadWrite.
readWrite alone is insufficient: the sync workflow fails on its first export with a privilege error when it tries to create the vector search index. The platform never drops indexes or collections, so dropSearchIndex and dropCollection are not required — a least-privilege custom Atlas role granting the readWrite actions plus only createSearchIndexes and listSearchIndexes is sufficient if your security review demands tighter scoping.
Turbopuffer
Section titled “Turbopuffer”Requires a Turbopuffer account and API key. Each Index gets its own Turbopuffer namespace derived from the Index ID, so a single Turbopuffer account scales to arbitrarily many indexes without per-index infrastructure.
| Variable | Description |
|---|---|
DEFAULT_INDEX_TPUF_API_KEY | Turbopuffer API key |
DEFAULT_INDEX_TPUF_REGION | Region for new namespaces (default aws-us-east-1) |
DEFAULT_INDEX_TPUF_NAMESPACE_PREFIX | Prefix prepended to every namespace name (default llamacloud) — final namespace is {prefix}-{index_id} |
Reranking
Section titled “Reranking”Index retrieval optionally runs a reranker over the top-K vector results. Reranking is enabled by default and uses Cohere rerank-v3.5 if a Cohere key is configured. To disable reranking entirely (e.g. for deployments without Cohere or Bifrost access):
| Variable | Description |
|---|---|
RERANK_ENABLED | Set to false to disable reranking globally. Defaults to true. |
Cohere
Section titled “Cohere”| Variable | Description |
|---|---|
COHERE_PRIVATE_KEY | Cohere API key |
COHERE_RERANK_LANGUAGE | One of multi, en, foreign. Defaults to multi. |
Bifrost (Optional)
Section titled “Bifrost (Optional)”If you’ve deployed the Bifrost LLM gateway as a subchart, you can route reranking through it instead of calling Cohere directly. Useful for centralized credential management or per-tenant routing.
| Variable | Description |
|---|---|
BIFROST_RERANK_ENABLED | true to route reranking through Bifrost |
BIFROST_RERANK_BASE_URL | Bifrost base URL |
BIFROST_RERANK_DEFAULT_MODEL | Provider/model string (default cohere/rerank-v3.5) |
When both Bifrost and Cohere are configured, Bifrost takes precedence.
Embeddings and Chat LLM
Section titled “Embeddings and Chat LLM”Index sync and retrieval also need:
- An embedding model to convert chunks into vectors at sync time
- A chat LLM to answer questions against the index at query time
Both are resolved through LlamaCloud’s centralized LLM configuration. See Centralized Provider Configuration for the full setup. Models become available to Index operations as soon as the corresponding provider is configured under config.llms.* in your Helm values, or registered via config.llms.providerConfigs for fine-grained control.
Embedding Model
Section titled “Embedding Model”The embedding model is hardcoded to openai-text-embedding-3-small. It cannot be overridden per-index and there is no automatic fallback to another model.
- If
openai-text-embedding-3-smallis registered in centralized LLM config, sync works. - If it is not, sync fails at workflow runtime with an
EmbeddingConfigNotFoundError. There is no fallback to any other model.
This means an operator must either:
- Configure the OpenAI provider directly (
config.llms.openAi.apiKey), which auto-registers both embedding models, or - Register
openai-text-embedding-3-smallexplicitly viaproviderConfigs— useful when routing through a gateway or a non-OpenAI-hosted endpoint (see Custom Endpoints below).
Chat LLM
Section titled “Chat LLM”The chat agent prefers OpenAI GPT-5-family models, in this order:
openai-gpt-5-4openai-gpt-5-4-miniopenai-gpt-5-2openai-gpt-5openai-gpt-5-mini
The first available model in the list is used. If none of the preferred models is registered, the resolver falls back to any other LLM in the centralized config (matched by priority). If the centralized config holds no chat-capable LLM at all, the chat endpoint returns status=UNAVAILABLE with a message indicating that no chat LLM is registered.
Notes:
- The chat agent currently uses the OpenAI raw client, so non-OpenAI models registered through
providerConfigsneed to be served via an OpenAI-compatible endpoint (see Custom Endpoints). - Index creation does not validate that a chat LLM is available — the failure surfaces only when a chat message is sent.
Custom Endpoints (OpenAI-Compatible Gateways)
Section titled “Custom Endpoints (OpenAI-Compatible Gateways)”Every provider in providerConfigs supports a credentials.base_url field. This makes it possible to point LlamaCloud at LiteLLM, Ollama, Azure AI Foundry proxies, or any other OpenAI-compatible endpoint while continuing to use the OpenAI model identifiers (openai-gpt-5-4, openai-text-embedding-3-small, etc.):
config: llms: providerConfigs: - id: "embedding-via-litellm" provider: "openai" model_id: "openai-text-embedding-3-small" enabled: true priority: 200 credentials: api_key: "sk-litellm-..." base_url: "https://litellm.example.com/v1" - id: "chat-via-litellm" provider: "openai" model_id: "openai-gpt-5-4" enabled: true priority: 200 credentials: api_key: "sk-litellm-..." base_url: "https://litellm.example.com/v1"The same base-url override is honored for anthropic and gemini providers (set on their respective credentials blocks). Azure uses a separate credentials.endpoint field — see Centralized Provider Configuration for the full credentials schema by provider.
Verifying Configuration
Section titled “Verifying Configuration”After applying your Helm values, check that each target is correctly configured by hitting the configz endpoint:
curl -H "Authorization: Bearer $TOKEN" \ -H "Project-Id: $PROJECT_ID" \ https://<your-llamacloud-host>/api/v1/indexes/configzThe response lists which vector targets, embedding models, chat LLMs, and rerankers are available. Targets with missing or invalid env vars appear as unconfigured.