---
title: Index V2 Configuration | Developer Documentation
---

## Self-Hosting Documentation Access

This section requires a password to access. Interested in self-hosting? [Contact sales](https://www.llamaindex.ai/contact) to learn more.

Password:

Access Documentation

Self-Hosting Documentation Access Granted Logout

LlamaCloud’s Index V2 resource ties a source directory to a vector store via a sync pipeline. The destination vector store — the *index target* — is configured deployment-wide via environment variables on the backend, jobs, and Temporal worker pods.

Index target storage is configured separately from the platform’s metadata databases (Postgres and MongoDB under [Databases and Queues](./db_and_queues/overview)). The two serve different roles: the platform DB stores users, projects, jobs; the index target stores embeddings and chunks. Keeping them separate lets you pick a vector-optimized destination without coupling it to the rest of the deployment.

## Supported Vector Store Exports

| Target            | When to use                                                                                                                                                                                                                                                                        |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **MongoDB Atlas** | Already using Atlas; want a managed service; need Atlas Search features alongside                                                                                                                                                                                                  |
| **Turbopuffer**   | High-scale vector workloads; per-namespace tenant isolation                                                                                                                                                                                                                        |
| **Custom**        | Not seeing a native integration for your vector store? We encourage users to read the parsed output from a sync directly and write the export logic themselves. [See our example repo for samples of multiple custom solutions](https://github.com/run-llama/index-v2-data-sinks). |

The configured target is used for all indexes in the deployment. You can switch targets at any time by changing environment variables and re-syncing indexes. Index data is not portable across different targets, so a target switch triggers a full re-sync to the new destination.

## MongoDB Atlas

Requires a [MongoDB Atlas](https://www.mongodb.com/cloud/atlas) cluster with **Atlas Vector Search** enabled. The Mongo target uses `$vectorSearch` aggregation and managed search indexes, both of which are Atlas-only features. Community MongoDB and `mongo:7.0` images do not work as Mongo index targets; configure a different target instead, or use a real Atlas cluster.

### Connection

| Variable                         | Description                                                    |
| -------------------------------- | -------------------------------------------------------------- |
| `DEFAULT_INDEX_MONGO_URI`        | Full SRV or standard connection URI (e.g. `mongodb+srv://...`) |
| `DEFAULT_INDEX_MONGO_DB`         | Database name                                                  |
| `DEFAULT_INDEX_MONGO_COLLECTION` | Collection name (a single collection holds all index data)     |

The URI is treated as a secret and stored in the rendered MongoDB connection Secret alongside `MONGODB_URL`. Avoid sharing the URI with the platform’s own MongoDB connection. Use a separate cluster or at minimum a separate database.

### Database User Privileges

LlamaCloud creates collections, regular indexes, and Atlas Vector Search indexes programmatically on first use. The database user therefore needs **two built-in roles** scoped to the target database:

- `readWrite` — covers `insert`, `delete`, `find`, `aggregate` (including `$vectorSearch`), `createCollection`, `listCollections`, and creating regular indexes.
- `dbAdmin` — covers the Atlas Search commands `createSearchIndexes` and `listSearchIndexes`, which are not in `readWrite`.

`readWrite` alone is insufficient: the sync workflow fails on its first export with a privilege error when it tries to create the vector search index. The platform never drops indexes or collections, so `dropSearchIndex` and `dropCollection` are not required — a least-privilege [custom Atlas role](https://www.mongodb.com/docs/atlas/security-add-mongodb-roles/#custom-roles) granting the `readWrite` actions plus only `createSearchIndexes` and `listSearchIndexes` is sufficient if your security review demands tighter scoping.

## Turbopuffer

Requires a [Turbopuffer](https://turbopuffer.com) account and API key. Each Index gets its own Turbopuffer namespace derived from the Index ID, so a single Turbopuffer account scales to arbitrarily many indexes without per-index infrastructure.

| Variable                              | Description                                                                                                |
| ------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| `DEFAULT_INDEX_TPUF_API_KEY`          | Turbopuffer API key                                                                                        |
| `DEFAULT_INDEX_TPUF_REGION`           | Region for new namespaces (default `aws-us-east-1`)                                                        |
| `DEFAULT_INDEX_TPUF_NAMESPACE_PREFIX` | Prefix prepended to every namespace name (default `llamacloud`) — final namespace is `{prefix}-{index_id}` |

## Reranking

Index retrieval optionally runs a reranker over the top-K vector results. Reranking is enabled by default and uses Cohere `rerank-v3.5` if a Cohere key is configured. To disable reranking entirely (e.g. for deployments without Cohere or Bifrost access):

| Variable         | Description                                                       |
| ---------------- | ----------------------------------------------------------------- |
| `RERANK_ENABLED` | Set to `false` to disable reranking globally. Defaults to `true`. |

### Cohere

| Variable                 | Description                                           |
| ------------------------ | ----------------------------------------------------- |
| `COHERE_PRIVATE_KEY`     | Cohere API key                                        |
| `COHERE_RERANK_LANGUAGE` | One of `multi`, `en`, `foreign`. Defaults to `multi`. |

### Bifrost (Optional)

If you’ve deployed the [Bifrost LLM gateway](https://github.com/maximhq/bifrost) as a subchart, you can route reranking through it instead of calling Cohere directly. Useful for centralized credential management or per-tenant routing.

| Variable                       | Description                                          |
| ------------------------------ | ---------------------------------------------------- |
| `BIFROST_RERANK_ENABLED`       | `true` to route reranking through Bifrost            |
| `BIFROST_RERANK_BASE_URL`      | Bifrost base URL                                     |
| `BIFROST_RERANK_DEFAULT_MODEL` | Provider/model string (default `cohere/rerank-v3.5`) |

When both Bifrost and Cohere are configured, Bifrost takes precedence.

## Embeddings and Chat LLM

Index sync and retrieval also need:

- An **embedding model** to convert chunks into vectors at sync time
- A **chat LLM** to answer questions against the index at query time

Both are resolved through LlamaCloud’s centralized LLM configuration. See [Centralized Provider Configuration](./llm_integrations/centralized-config) for the full setup. Models become available to Index operations as soon as the corresponding provider is configured under `config.llms.*` in your Helm values, or registered via `config.llms.providerConfigs` for fine-grained control.

### Embedding Model

The embedding model is **hardcoded** to `openai-text-embedding-3-small`. It cannot be overridden per-index and there is no automatic fallback to another model.

- If `openai-text-embedding-3-small` is registered in centralized LLM config, sync works.
- If it is not, sync fails at workflow runtime with an `EmbeddingConfigNotFoundError`. There is no fallback to any other model.

This means an operator must either:

1. Configure the OpenAI provider directly (`config.llms.openAi.apiKey`), which auto-registers both embedding models, **or**
2. Register `openai-text-embedding-3-small` explicitly via `providerConfigs` — useful when routing through a gateway or a non-OpenAI-hosted endpoint (see Custom Endpoints below).

### Chat LLM

The chat agent prefers OpenAI GPT-5-family models, in this order:

1. `openai-gpt-5-4`
2. `openai-gpt-5-4-mini`
3. `openai-gpt-5-2`
4. `openai-gpt-5`
5. `openai-gpt-5-mini`

The first available model in the list is used. If **none** of the preferred models is registered, the resolver falls back to any other LLM in the centralized config (matched by priority). If the centralized config holds no chat-capable LLM at all, the chat endpoint returns `status=UNAVAILABLE` with a message indicating that no chat LLM is registered.

Notes:

- The chat agent currently uses the OpenAI raw client, so non-OpenAI models registered through `providerConfigs` need to be served via an OpenAI-compatible endpoint (see Custom Endpoints).
- Index creation does **not** validate that a chat LLM is available — the failure surfaces only when a chat message is sent.

### Custom Endpoints (OpenAI-Compatible Gateways)

Every provider in `providerConfigs` supports a `credentials.base_url` field. This makes it possible to point LlamaCloud at LiteLLM, Ollama, Azure AI Foundry proxies, or any other OpenAI-compatible endpoint while continuing to use the OpenAI model identifiers (`openai-gpt-5-4`, `openai-text-embedding-3-small`, etc.):

```
config:
  llms:
    providerConfigs:
      - id: "embedding-via-litellm"
        provider: "openai"
        model_id: "openai-text-embedding-3-small"
        enabled: true
        priority: 200
        credentials:
          api_key: "sk-litellm-..."
          base_url: "https://litellm.example.com/v1"
      - id: "chat-via-litellm"
        provider: "openai"
        model_id: "openai-gpt-5-4"
        enabled: true
        priority: 200
        credentials:
          api_key: "sk-litellm-..."
          base_url: "https://litellm.example.com/v1"
```

The same base-url override is honored for `anthropic` and `gemini` providers (set on their respective `credentials` blocks). Azure uses a separate `credentials.endpoint` field — see [Centralized Provider Configuration](./llm_integrations/centralized-config) for the full credentials schema by provider.

## Verifying Configuration

After applying your Helm values, check that each target is correctly configured by hitting the `configz` endpoint:

Terminal window

```
curl -H "Authorization: Bearer $TOKEN" \
     -H "Project-Id: $PROJECT_ID" \
     https://<your-llamacloud-host>/api/v1/indexes/configz
```

The response lists which vector targets, embedding models, chat LLMs, and rerankers are available. Targets with missing or invalid env vars appear as unconfigured.