Centralized Provider Configuration

Guide

Self-Hosting (BYOC)

Configuration

LLM Integrations

Configure self-hosted LlamaCloud LLM and embedding providers through config.llms.providerConfigs, supporting API gateways, custom credentials, and per-tier LlamaParse model ordering.

Self-Hosting Documentation Access Granted

Centralized LLM provider configuration is the deploy-time registry of model endpoints your self-hosted LlamaParse deployment can call.

The simpler provider settings under config.llms.* register the default endpoints for each provider. config.llms.providerConfigs adds explicit entries when you need multiple credentials, custom endpoints, custom headers, provider-native model names, or an API gateway such as Bifrost, Portkey, or LiteLLM.

Each entry maps a LlamaParse model_id to one provider endpoint:

model_id is the LlamaParse model identifier that platform features request.
provider says which client implementation to use.
provider_model_name is the model or deployment name sent to the upstream provider or gateway. If omitted, LlamaParse uses its built-in provider-native name for that model_id.
priority and tags are routing hints. They only affect features that look at them, such as parsing tier ordering or product-specific model preference.

A recognized model_id is not an upstream availability guarantee. The provider account, region, Azure deployment, Vertex location, or gateway must still be able to serve the configured provider-native model.

Use Cases

Custom API Gateways: Route LLM requests through gateways like Bifrost, Portkey, or LiteLLM
Custom Endpoints: Use custom base URLs for proxies or regional endpoints
Custom Headers: Add custom HTTP headers per provider instance (e.g., for gateway authentication)
Multiple Credentials: Configure multiple provider instances with different API keys
Managed Embeddings: Route managed Index embeddings through centrally configured OpenAI-compatible embedding providers

Configuration Structure

Add provider configurations to your Helm values under config.llms.providerConfigs:

config:
  llms:
    providerConfigs:
      - id: "my-config-name"               # User-defined identifier (can be anything)
        provider: "openai"                 # Provider type accepted by this config entry
        model_id: "openai-gpt-4o"          # Recognized LlamaParse model identifier (fixed values, see below)
        provider_model_name: "gpt-4o"      # Optional: Provider-specific model name override, usually only needed if using an LLM gateway that requires a specific model name
        enabled: true                      # Enable/disable this configuration
        tags:                              # Optional: see "Controlling Parsing Model Order"
          - "llamaparse-tier:agentic"
        priority: 100                      # Optional: higher = preferred within a tier (default 100)
        credentials:                       # Provider-specific credentials
          api_key: "sk-..."
          base_url: "https://custom.api.endpoint"  # Optional custom endpoint
        headers:                           # Custom HTTP headers (optional)
          X-Custom-Header: "value"

Most entries do not need tags. Use tags when a feature documents a specific tag convention, such as parsing tier tags.

Supported Providers

These provider sections document the provider values covered by this guide and the recognized model_id values for each provider.

OpenAI

- id: "openai-primary"
  provider: "openai"
  model_id: "openai-gpt-4o"
  credentials:
    api_key: "sk-..."                    # Required
    org_id: "org-..."                    # Optional
    base_url: "https://api.openai.com/v1"  # Optional
  headers:                               # Optional
    X-Custom-Header: "value"

Recognized model_id values:

openai-gpt-4o
openai-gpt-4o-mini
openai-gpt-4o-mini-text-only
openai-gpt-4o-mini-multimodal
openai-gpt-4-1
openai-gpt-4-1-mini
openai-gpt-4-1-nano
openai-gpt-5
openai-gpt-5-mini
openai-gpt-5-nano
openai-gpt-5-2
openai-gpt-5-4
openai-gpt-5-4-mini
openai-gpt-5-4-nano
openai-text-embedding-3-small
openai-text-embedding-3-large

The openai-gpt-4o-mini-text-only and openai-gpt-4o-mini-multimodal ids both route to the gpt-4o-mini model; they select the text-only vs screenshot-based cost_effective parsing path, while the bare openai-gpt-4o-mini remains the legacy text-only binding.

Anthropic

- id: "anthropic-primary"
  provider: "anthropic"
  model_id: "anthropic-sonnet-4.5"
  credentials:
    api_key: "sk-ant-..."               # Required
    base_url: "https://api.anthropic.com"  # Optional
  headers:                               # Optional
    X-Custom-Header: "value"

Recognized model_id values:

anthropic-sonnet-4.6
anthropic-sonnet-4.5
anthropic-sonnet-4.0
anthropic-sonnet-3.7
anthropic-sonnet-3.5
anthropic-sonnet-3.5-v2
anthropic-haiku-4.5
anthropic-haiku-3.5
anthropic-opus-4.6
anthropic-opus-4.5

Azure OpenAI

- id: "azure-sweden"
  provider: "azure"
  model_id: "openai-gpt-4o"
  credentials:
    api_key: "..."                      # Required
    endpoint: "https://your-resource.openai.azure.com"  # Required
    deployment_id: "gpt-4o"             # Optional
    api_version: "2024-08-06"           # Optional
  headers:                               # Optional
    X-Custom-Header: "value"

Recognized model_id values:

openai-gpt-4o
openai-gpt-4o-mini
openai-gpt-4o-mini-text-only
openai-gpt-4o-mini-multimodal
openai-gpt-4-1
openai-gpt-4-1-mini
openai-gpt-4-1-nano
openai-gpt-5
openai-gpt-5-mini
openai-gpt-5-nano
openai-gpt-5-2
openai-gpt-5-4
openai-gpt-5-4-mini
openai-gpt-5-4-nano
openai-text-embedding-3-small
openai-text-embedding-3-large

The openai-gpt-4o-mini-text-only and openai-gpt-4o-mini-multimodal ids both route to the gpt-4o-mini model (default deployment name gpt-4o-mini); they select the text-only vs screenshot-based cost_effective parsing path, while the bare openai-gpt-4o-mini remains the legacy text-only binding.

For Azure-hosted embedding models, set model_id to the LlamaParse model identifier and credentials.deployment_id to your Azure deployment name.

Google Gemini

- id: "gemini-primary"
  provider: "gemini"
  model_id: "gemini-2.5-flash"
  credentials:
    api_key: "AIza..."                   # Required
    base_url: "https://generativelanguage.googleapis.com"  # Optional
  headers:                               # Optional
    X-Custom-Header: "value"

Recognized model_id values:

gemini-3.1-pro
gemini-3.1-flash-lite
gemini-3.0-pro
gemini-3.0-flash
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.0-flash
gemini-2.0-flash-lite

Google Vertex AI

Vertex AI serves both Gemini and Anthropic Claude models. By default, Vertex configs use service-account authentication with project_id + location + credentials (JSON-serialised service account key). Set credentials_type: "proxy" when a configured gateway handles Vertex authentication.

- id: "vertex-primary"
  provider: "vertexai"
  model_id: "gemini-2.5-flash"
  credentials:
    project_id: "your-gcp-project-id"    # Required
    location: "us-central1"              # Required
    credentials: |-                      # Required (service account key JSON)
      {
        "type": "service_account",
        "project_id": "your-gcp-project-id",
        "private_key_id": "...",
        "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
        "client_email": "...@your-gcp-project-id.iam.gserviceaccount.com",
        "client_id": "..."
      }
    base_url: "https://us-central1-aiplatform.googleapis.com"  # Optional
  headers:                               # Optional
    X-Custom-Header: "value"

Recognized model_id values (Gemini on Vertex):

gemini-3.1-pro
gemini-3.1-flash-lite
gemini-3.0-pro
gemini-3.0-flash
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.0-flash
gemini-2.0-flash-lite

Recognized model_id values (Anthropic on Vertex):

anthropic-sonnet-4.6
anthropic-sonnet-4.5
anthropic-sonnet-4.0
anthropic-sonnet-3.7
anthropic-haiku-4.5
anthropic-opus-4.6
anthropic-opus-4.5

Controlling Parsing Model Order (per tier)

Parsing runs each tier against an ordered list of models: it tries the first model and falls back to the next whenever a model is unavailable or fails. On a self-hosted / BYOC deployment you control which models a tier uses, and in what order, by tagging provider config entries and setting their priority.

This section only affects parsing tier membership and ordering. It does not control embedding, chat, or extraction model selection.

Tiers you can configure

Tag	Tier	Used for
`llamaparse-tier:cost_effective`	Cost Effective	the low-cost parsing tier
`llamaparse-tier:agentic`	Agentic	the default agentic transcription
`llamaparse-tier:agentic_plus`	Agentic Plus	agentic transcription with the self-critique judges
`llamaparse-tier:specialized_chart_parsing`	Chart parsing	the dedicated chart-to-table model

How ordering works

Add llamaparse-tier:<tier> to the tags of every config entry that should be part of that tier’s list. Only entries carrying the tag join the list.
Within a tier, models are tried in descending priority order — the highest priority value is the primary model, the next is the first fallback, and so on. priority defaults to 100, so give each entry in a tier a distinct value to make the order explicit.
A model still only runs when its credentials validate at startup and the provider is reachable; an unavailable model is skipped and the next one is tried.
If a tier has no tagged entries, the deployment falls back to the built-in default model list for that tier in the deployed application version. So you only need to tag the tiers you want to override.
The same entry can carry multiple tier tags (e.g. a model used by both agentic and agentic_plus).

Example: ordering the BYOC `agentic` tier

This makes Sonnet 4.5 the primary agentic model, GPT-4.1 the first fallback, and Haiku 4.5 the last resort:

config:
  llms:
    providerConfigs:
      - id: "agentic-primary-sonnet"
        provider: "anthropic"
        model_id: "anthropic-sonnet-4.5"
        enabled: true
        tags:
          - "llamaparse-tier:agentic"
          - "llamaparse-tier:agentic_plus"
        priority: 30                       # highest → tried first (primary)
        credentials:
          api_key: "sk-ant-..."

      - id: "agentic-fallback-gpt"
        provider: "openai"
        model_id: "openai-gpt-4-1"
        enabled: true
        tags:
          - "llamaparse-tier:agentic"
        priority: 20                       # middle → first fallback
        credentials:
          api_key: "sk-..."

      - id: "agentic-fallback-haiku"
        provider: "anthropic"
        model_id: "anthropic-haiku-4.5"
        enabled: true
        tags:
          - "llamaparse-tier:agentic"
        priority: 10                       # lowest → last fallback
        credentials:
          api_key: "sk-ant-..."

Resulting agentic order: anthropic-sonnet-4.5 → openai-gpt-4-1 → anthropic-haiku-4.5. The agentic_plus tier here resolves to just anthropic-sonnet-4.5 (the only entry tagged for it); add more llamaparse-tier:agentic_plus entries with their own priorities to extend it.

Example: vision-capable GPT-4o-mini for the `cost_effective` tier

The GPT-4o-mini runner-binding variants let a deployment run the cost_effective tier with the screenshot-based transcription path as primary and the text-only path as last resort — useful when documents contain rasterized tables that are invisible to text-only transcription:

config:
  llms:
    providerConfigs:
      - id: "ce-gpt-4o-mini-vision"
        provider: "openai"            # or "azure"
        model_id: "openai-gpt-4o-mini-multimodal"
        enabled: true
        tags:
          - "llamaparse-tier:cost_effective"
        priority: 20                       # highest → tried first (primary)
        credentials:
          api_key: "sk-..."

      - id: "ce-gpt-4o-mini-text"
        provider: "openai"
        model_id: "openai-gpt-4o-mini-text-only"
        enabled: true
        tags:
          - "llamaparse-tier:cost_effective"
        priority: 10                       # lowest → last fallback
        credentials:
          api_key: "sk-..."

Resulting cost_effective order: openai-gpt-4o-mini-multimodal → openai-gpt-4o-mini-text-only. With only these two entries tagged, the tier’s built-in default list is fully replaced — other default models are not attempted.

Common Use Cases

Custom API Gateway

Use a custom API gateway or proxy (e.g., Bifrost, Portkey, LiteLLM):

config:
  llms:
    providerConfigs:
      - id: "portkey-openai"
        provider: "openai"
        model_id: "openai-gpt-4o-mini"
        provider_model_name: "@openai/gpt-4o-mini"
        enabled: true
        credentials:
          api_key: "your-portkey-api-key"
          base_url: "https://api.portkey.ai/v1"
        headers:
          x-portkey-api-key: "your-portkey-api-key"

      - id: "portkey-anthropic"
        provider: "anthropic"
        model_id: "anthropic-sonnet-4.5"
        provider_model_name: "@anthropic/claude-sonnet-4-5"
        enabled: true
        credentials:
          api_key: "your-portkey-api-key"
          base_url: "https://api.portkey.ai"
        headers:
          x-portkey-api-key: "your-portkey-api-key"
          x-portkey-strict-open-ai-compliance: "False"

      - id: "portkey-vertex-gemini"
        provider: "vertexai"
        model_id: "gemini-2.5-flash"
        provider_model_name: "gemini-2.5-flash"
        enabled: true
        credentials:
          credentials_type: "proxy"
          project_id: "your-gcp-project-id"
          location: "us-central1"
          base_url: "https://api.portkey.ai/v1"
        headers:
          x-portkey-api-key: "your-portkey-api-key"
          x-portkey-provider: "@your-portkey-vertex-provider-id"
          x-portkey-strict-open-ai-compliance: "false"

Verification

After configuration, verify your setup:

Verify in Admin UI: Check the LlamaCloud admin interface for available models
Test parsing: Upload a document to LlamaParse to confirm the configured providers are working

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/

Centralized Provider Configuration

Self-Hosting Documentation Access

Use Cases

Configuration Structure

Supported Providers

OpenAI

Anthropic

Azure OpenAI

Google Gemini

Google Vertex AI

Controlling Parsing Model Order (per tier)

Tiers you can configure

How ordering works

Example: ordering the BYOC agentic tier

Example: vision-capable GPT-4o-mini for the cost_effective tier

Common Use Cases

Custom API Gateway

Verification

Example: ordering the BYOC `agentic` tier

Example: vision-capable GPT-4o-mini for the `cost_effective` tier