Skip to content

Getting Started with Batches

Run Parse V2 or Extract V2 over every file in a directory, poll for completion, and inspect per-file results.

Batches let you run the same product job over every file in a directory. A batch references:

  1. A source directory containing the files to process.
  2. A product configuration ID for the work to run on each file.

Use parse_v2 with a built-in Parse preset or saved Parse configuration to parse every file. Use extract_v2 with a saved Extract configuration to extract from every file.

Batch creation is limited to 1,000 source files.

  • A LlamaCloud account with a Pro or Enterprise plan
  • An API key (how to create one)
  • A configuration ID for the job to run:
    • For parse_v2, use a built-in preset such as cfg-PARSE_AGENTIC, or a saved Parse configuration ID.
    • For extract_v2, use a saved Extract configuration ID.

For one-off uploads, create an ephemeral directory so the source directory is automatically eligible for cleanup. Then upload files into that directory and create the batch.

import asyncio
from datetime import datetime, timedelta, timezone
from pathlib import Path
from llama_cloud import AsyncLlamaCloud
client = AsyncLlamaCloud(api_key="<your-api-key>")
configuration_id = "cfg-PARSE_AGENTIC"
expires_at = (datetime.now(timezone.utc) + timedelta(days=2)).isoformat()
directory = await client.beta.directories.create(
name="invoice-batch",
type="ephemeral",
expires_at=expires_at,
)
for path in Path("./invoices").glob("*.pdf"):
await client.beta.directories.files.upload(
directory.id,
upload_file=path,
display_name=path.name,
)
batch = await client.batches.create(
source_directory_id=directory.id,
config={
"job": {
"type": "parse_v2",
"configuration_id": configuration_id,
},
},
)
print(batch.id, batch.status)

To run extraction instead, use a saved extract_v2 configuration ID and set type to "extract_v2".

Batch processing is asynchronous. Poll the batch status until it reaches a terminal state.

terminal_statuses = {"COMPLETED", "FAILED", "CANCELLED"}
while batch.status not in terminal_statuses:
await asyncio.sleep(10)
batch = await client.batches.get(batch.id)
print(batch.status)

Use expand=results on the get endpoint to include the source-file to product-job mappings. Results may be null while the batch is still running. When available, results contains one entry per source file in the batch. Per-file failures are returned in results[*].error_message; successful files include a job_reference for the underlying Parse or Extract job.

A trimmed response can include successful and failed file-level entries in the same batch:

{
"id": "bat-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "COMPLETED",
"results": [
{
"source_directory_file_id": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"job_reference": {
"type": "parse_v2",
"id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
}
},
{
"source_directory_file_id": "dfl-bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
"error_message": "Unable to process source file."
}
]
}
batch = await client.batches.get(batch.id, expand=["results"])
if batch.status == "FAILED":
raise RuntimeError("Batch orchestration failed")
for result in batch.results or []:
if result.error_message:
print(result.source_directory_file_id, "failed:", result.error_message)
continue
if result.job_reference is None:
print(result.source_directory_file_id, "has no job yet")
continue
print(
result.source_directory_file_id,
result.job_reference.type,
result.job_reference.id,
)

results contains references to the underlying Parse or Extract jobs. Fetch those jobs through their product endpoints to inspect job status and outputs.

for result in batch.results or []:
ref = result.job_reference
if ref is None:
continue
if ref.type == "parse_v2":
job = await client.parsing.get(ref.id)
else:
job = await client.extract.get(ref.id)
print(ref.id, job.status)

You can list batches for the current project and filter by status or source directory.

async for item in client.batches.list(
status="RUNNING",
source_directory_id=directory.id,
):
print(item.id, item.status)

Batch-level FAILED means the orchestration failed and the batch cannot provide a reliable per-file result set.

Per-file failures are represented in results[*].error_message when a source file could not be processed or mapped to a product job. If a result has a job_reference, use the referenced Parse or Extract job endpoint for the underlying job status and output.

OperationEndpoint
Create batchPOST /api/v2/batches
List batchesGET /api/v2/batches
Get batchGET /api/v2/batches/{batch_id}