Getting Started with Batches
Run Parse V2 or Extract V2 over every file in a directory, poll for completion, and inspect per-file results.
Overview
Section titled “Overview”Batches let you run the same product job over every file in a directory. A batch references:
- A source directory containing the files to process.
- A product configuration ID for the work to run on each file.
Use parse_v2 with a built-in Parse preset or saved Parse configuration to parse every file. Use extract_v2 with a saved Extract configuration to extract from every file.
Batch creation is limited to 1,000 source files.
Prerequisites
Section titled “Prerequisites”- A LlamaCloud account with a Pro or Enterprise plan
- An API key (how to create one)
- A configuration ID for the job to run:
- For
parse_v2, use a built-in preset such ascfg-PARSE_AGENTIC, or a saved Parse configuration ID. - For
extract_v2, use a saved Extract configuration ID.
- For
Create a Batch
Section titled “Create a Batch”For one-off uploads, create an ephemeral directory so the source directory is automatically eligible for cleanup. Then upload files into that directory and create the batch.
import asynciofrom datetime import datetime, timedelta, timezonefrom pathlib import Path
from llama_cloud import AsyncLlamaCloud
client = AsyncLlamaCloud(api_key="<your-api-key>")
configuration_id = "cfg-PARSE_AGENTIC"expires_at = (datetime.now(timezone.utc) + timedelta(days=2)).isoformat()
directory = await client.beta.directories.create( name="invoice-batch", type="ephemeral", expires_at=expires_at,)
for path in Path("./invoices").glob("*.pdf"): await client.beta.directories.files.upload( directory.id, upload_file=path, display_name=path.name, )
batch = await client.batches.create( source_directory_id=directory.id, config={ "job": { "type": "parse_v2", "configuration_id": configuration_id, }, },)
print(batch.id, batch.status)import fs from "fs";import path from "path";import LlamaCloud from "@llamaindex/llama-cloud";
const client = new LlamaCloud({ apiKey: "<your-api-key>",});
const configurationId = "cfg-PARSE_AGENTIC";const expiresAt = new Date(Date.now() + 2 * 24 * 60 * 60 * 1000).toISOString();
const directory = await client.beta.directories.create({ name: "invoice-batch", type: "ephemeral", expires_at: expiresAt,});
for (const fileName of fs.readdirSync("./invoices")) { if (!fileName.endsWith(".pdf")) continue;
await client.beta.directories.files.upload(directory.id, { upload_file: fs.createReadStream(path.join("./invoices", fileName)), display_name: fileName, });}
let batch = await client.batches.create({ source_directory_id: directory.id, config: { job: { type: "parse_v2", configuration_id: configurationId, }, },});
console.log(batch.id, batch.status);To run extraction instead, use a saved extract_v2 configuration ID and set type to "extract_v2".
Poll for Completion
Section titled “Poll for Completion”Batch processing is asynchronous. Poll the batch status until it reaches a terminal state.
terminal_statuses = {"COMPLETED", "FAILED", "CANCELLED"}
while batch.status not in terminal_statuses: await asyncio.sleep(10) batch = await client.batches.get(batch.id)
print(batch.status)const terminalStatuses = new Set(["COMPLETED", "FAILED", "CANCELLED"]);
while (!terminalStatuses.has(batch.status)) { await new Promise((resolve) => setTimeout(resolve, 10_000)); batch = await client.batches.get(batch.id);}
console.log(batch.status);Inspect Per-File Results
Section titled “Inspect Per-File Results”Use expand=results on the get endpoint to include the source-file to product-job mappings. Results may be null while the batch is still running. When available, results contains one entry per source file in the batch. Per-file failures are returned in results[*].error_message; successful files include a job_reference for the underlying Parse or Extract job.
A trimmed response can include successful and failed file-level entries in the same batch:
{ "id": "bat-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "status": "COMPLETED", "results": [ { "source_directory_file_id": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "job_reference": { "type": "parse_v2", "id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee" } }, { "source_directory_file_id": "dfl-bbbbbbbb-cccc-dddd-eeee-ffffffffffff", "error_message": "Unable to process source file." } ]}batch = await client.batches.get(batch.id, expand=["results"])
if batch.status == "FAILED": raise RuntimeError("Batch orchestration failed")
for result in batch.results or []: if result.error_message: print(result.source_directory_file_id, "failed:", result.error_message) continue
if result.job_reference is None: print(result.source_directory_file_id, "has no job yet") continue
print( result.source_directory_file_id, result.job_reference.type, result.job_reference.id, )const detail = await client.batches.get(batch.id, { expand: ["results"],});
if (detail.status === "FAILED") { throw new Error("Batch orchestration failed");}
for (const result of detail.results ?? []) { if (result.error_message) { console.log(result.source_directory_file_id, "failed:", result.error_message); continue; }
if (!result.job_reference) { console.log(result.source_directory_file_id, "has no job yet"); continue; }
console.log( result.source_directory_file_id, result.job_reference.type, result.job_reference.id, );}results contains references to the underlying Parse or Extract jobs. Fetch those jobs through their product endpoints to inspect job status and outputs.
for result in batch.results or []: ref = result.job_reference if ref is None: continue
if ref.type == "parse_v2": job = await client.parsing.get(ref.id) else: job = await client.extract.get(ref.id)
print(ref.id, job.status)for (const result of detail.results ?? []) { const ref = result.job_reference; if (!ref) continue;
const job = ref.type === "parse_v2" ? await client.parsing.get(ref.id) : await client.extract.get(ref.id);
console.log(ref.id, job.status);}List Batches
Section titled “List Batches”You can list batches for the current project and filter by status or source directory.
async for item in client.batches.list( status="RUNNING", source_directory_id=directory.id,): print(item.id, item.status)for await (const item of client.batches.list({ status: "RUNNING", source_directory_id: directory.id,})) { console.log(item.id, item.status);}Failure Semantics
Section titled “Failure Semantics”Batch-level FAILED means the orchestration failed and the batch cannot provide a reliable per-file result set.
Per-file failures are represented in results[*].error_message when a source file could not be processed or mapped to a product job. If a result has a job_reference, use the referenced Parse or Extract job endpoint for the underlying job status and output.
REST Endpoints
Section titled “REST Endpoints”| Operation | Endpoint |
|---|---|
| Create batch | POST /api/v2/batches |
| List batches | GET /api/v2/batches |
| Get batch | GET /api/v2/batches/{batch_id} |