Skip to content
Guide
Extract
Examples

Using Saved Configurations

Save and reuse parse and extract configurations for consistent, repeatable extraction workflows.

Saved configurations let you define your parse and extract settings once — either in the LlamaCloud UI or via the API — and then reference them by ID when creating extraction jobs. This is useful when you want to:

  • Standardize extraction across your team with a shared configuration
  • Simplify job creation by replacing inline config with a single ID
  • Decouple parse settings from extract settings so you can mix and match
  • Iterate on configuration in the UI playground, then use the same settings programmatically

There are two types of saved configurations relevant to extraction:

Configuration TypeProduct TypeWhat It Controls
Parse configurationparse_v2How documents are parsed (tier, options) before extraction
Extract configurationextract_v2Full extraction settings: schema, tier, extraction target, and optionally a reference to a parse configuration

Both are managed through the Product Configurations API (/api/v1/beta/configurations).

A parse configuration saves your LlamaParse settings so they can be reused across multiple extraction jobs.

import os
from llama_cloud import LlamaCloud
client = LlamaCloud(api_key=os.environ["LLAMA_CLOUD_API_KEY"])
# Create a saved parse configuration
# Note: configurations API is in beta and not yet available as a typed SDK resource.
# Use the raw HTTP method on the client.
parse_config = client.post(
"/api/v1/beta/configurations",
body={
"name": "High Quality Parse",
"parameters": {
"product_type": "parse_v2",
"version": "latest",
"tier": "agentic",
},
},
cast_to=dict,
)
print(f"Parse config ID: {parse_config['id']}")
# e.g. "cfg-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

An extract configuration saves your schema, extraction tier, and other settings. You can optionally reference a saved parse configuration inside it.

from pydantic import BaseModel, Field
from typing import Optional
# Define your extraction schema
class InvoiceData(BaseModel):
vendor_name: str = Field(description="Name of the vendor or supplier")
invoice_number: str = Field(description="Unique invoice identifier")
total_amount: float = Field(description="Total amount due")
currency: str = Field(description="Currency code (e.g. USD, EUR)")
due_date: Optional[str] = Field(None, description="Payment due date")
# Create a saved extract configuration that references the parse config
extract_config = client.post(
"/api/v1/beta/configurations",
body={
"name": "Invoice Extraction",
"parameters": {
"product_type": "extract_v2",
"parse_config_id": parse_config["id"], # Reference the parse config
"data_schema": InvoiceData.model_json_schema(),
"extraction_target": "per_doc",
"tier": "agentic",
"cite_sources": True,
},
},
cast_to=dict,
)
print(f"Extract config ID: {extract_config['id']}")

Running Extraction with a Saved Configuration

Section titled “Running Extraction with a Saved Configuration”

Once you have a saved extract configuration, you can create extraction jobs by passing just the configuration_id — no inline config needed.

import time
# Upload a file
file_obj = client.files.create(file="./invoices/invoice_001.pdf", purpose="extract")
# Extract using the saved configuration — no inline config needed
job = client.extract.create(
file_input=file_obj.id,
configuration_id=extract_config["id"],
)
# Poll for completion
while job.status not in ("COMPLETED", "FAILED", "CANCELLED"):
time.sleep(2)
job = client.extract.get(job.id)
if job.status == "COMPLETED":
invoice = InvoiceData.model_validate(job.extract_result)
print(f"Vendor: {invoice.vendor_name}")
print(f"Total: {invoice.currency} {invoice.total_amount}")

You don’t need a saved extract configuration to use a saved parse configuration. You can reference a parse_config_id directly inside an inline configuration block:

# Use a saved parse config with an inline extract config
job = client.extract.create(
file_input=file_obj.id,
configuration={
"parse_config_id": parse_config["id"],
"data_schema": InvoiceData.model_json_schema(),
"extraction_target": "per_doc",
"tier": "agentic",
},
)

This is useful when you want consistent parsing across jobs but need different extraction schemas for different use cases.

Batch Processing with Saved Configurations

Section titled “Batch Processing with Saved Configurations”

Saved configurations simplify batch workflows — just pass the same configuration_id for every file:

import os
import asyncio
from pathlib import Path
from llama_cloud import AsyncLlamaCloud
async_client = AsyncLlamaCloud(api_key=os.environ["LLAMA_CLOUD_API_KEY"])
EXTRACT_CONFIG_ID = extract_config["id"] # Your saved config ID
async def process_file(file_path: Path) -> dict:
file_obj = await async_client.files.create(
file=str(file_path), purpose="extract"
)
job = await async_client.extract.create(
file_input=file_obj.id,
configuration_id=EXTRACT_CONFIG_ID,
)
while job.status not in ("COMPLETED", "FAILED", "CANCELLED"):
await asyncio.sleep(2)
job = await async_client.extract.get(job.id)
if job.status == "COMPLETED":
return {"file": file_path.name, "data": job.extract_result}
return {"file": file_path.name, "error": job.error_message}
async def main():
files = list(Path("./invoices").glob("*.pdf"))
semaphore = asyncio.Semaphore(10)
async def bounded(path):
async with semaphore:
return await process_file(path)
results = await asyncio.gather(*[bounded(f) for f in files])
for r in results:
if "data" in r:
print(f" {r['file']}: {r['data']}")
else:
print(f" {r['file']}: ERROR - {r['error']}")
asyncio.run(main())

You can list your saved configurations filtered by product type:

# List all extract configurations
extract_configs = client.get(
"/api/v1/beta/configurations",
cast_to=dict,
options={"params": {"product_type": "extract_v2"}},
)
for cfg in extract_configs.get("data", []):
print(f" {cfg['name']} ({cfg['id']})")
# List all parse configurations
parse_configs = client.get(
"/api/v1/beta/configurations",
cast_to=dict,
options={"params": {"product_type": "parse_v2"}},
)
for cfg in parse_configs.get("data", []):
print(f" {cfg['name']} ({cfg['id']})")
ScenarioApproach
Quick prototyping, one-off jobsInline configuration — fastest to get started
Consistent settings across many jobsSaved configuration_id — define once, use everywhere
Same parse settings, different extract schemasSaved parse_config_id in inline config
Team sharing a standard pipelineSaved configuration_id — everyone uses the same config
UI-to-code workflowConfigure in UI playground → save → use config ID in SDK