Getting Started with Batches

Guide

Batches

Run Parse V2 or Extract V2 over every file in a directory, poll for completion, and inspect per-file results.

Overview

Batches let you run the same product job over every file in a directory. A batch references:

A source directory containing the files to process.
A product configuration ID for the work to run on each file.

Use parse_v2 with a built-in Parse preset or saved Parse configuration to parse every file. Use extract_v2 with a saved Extract configuration to extract from every file.

Batch creation is limited to 10,000 source files.

Prerequisites

A LlamaCloud account with a Pro or Enterprise plan
An API key (how to create one)
A configuration ID for the job to run:
- For parse_v2, use a built-in preset such as cfg-PARSE_AGENTIC, or a saved Parse configuration ID.
- For extract_v2, use a saved Extract configuration ID.

Create a Batch

For one-off uploads, create an ephemeral directory so the source directory is automatically eligible for cleanup. Then upload files into that directory and create the batch. Files in ephemeral directories are also exempt from storage billing and the per-project storage limits.

Python
TypeScript

import asyncio
from pathlib import Path

from llama_cloud import AsyncLlamaCloud

client = AsyncLlamaCloud(api_key="<your-api-key>")

configuration_id = "cfg-PARSE_AGENTIC"

directory = await client.beta.directories.create(
    name="invoice-batch",
    type="ephemeral",
)

for path in Path("./invoices").glob("*.pdf"):
    await client.beta.directories.files.upload(
        directory.id,
        upload_file=path,
        display_name=path.name,
    )

batch = await client.batches.create(
    source_directory_id=directory.id,
    config={
        "job": {
            "type": "parse_v2",
            "configuration_id": configuration_id,
        },
    },
)

print(batch.id, batch.status)

import fs from "fs";
import path from "path";
import LlamaCloud from "@llamaindex/llama-cloud";

const client = new LlamaCloud({
  apiKey: "<your-api-key>",
});

const configurationId = "cfg-PARSE_AGENTIC";

const directory = await client.beta.directories.create({
  name: "invoice-batch",
  type: "ephemeral",
});

for (const fileName of fs.readdirSync("./invoices")) {
  if (!fileName.endsWith(".pdf")) continue;

  await client.beta.directories.files.upload(directory.id, {
    upload_file: fs.createReadStream(path.join("./invoices", fileName)),
    display_name: fileName,
  });
}

let batch = await client.batches.create({
  source_directory_id: directory.id,
  config: {
    job: {
      type: "parse_v2",
      configuration_id: configurationId,
    },
  },
});

console.log(batch.id, batch.status);

To run extraction instead, use a saved extract_v2 configuration ID and set type to "extract_v2".

Poll for Completion

Batch processing is asynchronous. Poll the batch status until it reaches a terminal state.

Python
TypeScript

terminal_statuses = {"COMPLETED", "FAILED", "CANCELLED"}

while batch.status not in terminal_statuses:
    await asyncio.sleep(10)
    batch = await client.batches.get(batch.id)

print(batch.status)

const terminalStatuses = new Set(["COMPLETED", "FAILED", "CANCELLED"]);

while (!terminalStatuses.has(batch.status)) {
  await new Promise((resolve) => setTimeout(resolve, 10_000));
  batch = await client.batches.get(batch.id);
}

console.log(batch.status);

Inspect Per-File Results

Use expand=results on the get endpoint to include the source-file to product-job mappings. Results may be null while the batch is still running. When available, results contains one entry per source file in the batch. Per-file failures are returned in results[*].error_message; successful files include a job_reference for the underlying Parse or Extract job.

A trimmed response can include successful and failed file-level entries in the same batch:

{
  "id": "bat-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "status": "COMPLETED",
  "results": [
    {
      "source_directory_file_id": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "job_reference": {
        "type": "parse_v2",
        "id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
      }
    },
    {
      "source_directory_file_id": "dfl-bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
      "error_message": "Unable to process source file."
    }
  ]
}

Python
TypeScript

batch = await client.batches.get(batch.id, expand=["results"])

if batch.status == "FAILED":
    raise RuntimeError("Batch orchestration failed")

for result in batch.results or []:
    if result.error_message:
        print(result.source_directory_file_id, "failed:", result.error_message)
        continue

    if result.job_reference is None:
        print(result.source_directory_file_id, "has no job yet")
        continue

    print(
        result.source_directory_file_id,
        result.job_reference.type,
        result.job_reference.id,
    )

const detail = await client.batches.get(batch.id, {
  expand: ["results"],
});

if (detail.status === "FAILED") {
  throw new Error("Batch orchestration failed");
}

for (const result of detail.results ?? []) {
  if (result.error_message) {
    console.log(result.source_directory_file_id, "failed:", result.error_message);
    continue;
  }

  if (!result.job_reference) {
    console.log(result.source_directory_file_id, "has no job yet");
    continue;
  }

  console.log(
    result.source_directory_file_id,
    result.job_reference.type,
    result.job_reference.id,
  );
}

results contains references to the underlying Parse or Extract jobs. Fetch those jobs through their product endpoints to inspect job status and outputs.

Python
TypeScript

for result in batch.results or []:
    ref = result.job_reference
    if ref is None:
        continue

    if ref.type == "parse_v2":
        job = await client.parsing.get(ref.id)
    else:
        job = await client.extract.get(ref.id)

    print(ref.id, job.status)

for (const result of detail.results ?? []) {
  const ref = result.job_reference;
  if (!ref) continue;

  const job =
    ref.type === "parse_v2"
      ? await client.parsing.get(ref.id)
      : await client.extract.get(ref.id);

  console.log(ref.id, job.status);
}

List Batches

You can list batches for the current project and filter by status or source directory.

Python
TypeScript

async for item in client.batches.list(
    status="RUNNING",
    source_directory_id=directory.id,
):
    print(item.id, item.status)

for await (const item of client.batches.list({
  status: "RUNNING",
  source_directory_id: directory.id,
})) {
  console.log(item.id, item.status);
}

Failure Semantics

Batch-level FAILED means the orchestration failed and the batch cannot provide a reliable per-file result set.

Per-file failures are represented in results[*].error_message when a source file could not be processed or mapped to a product job. If a result has a job_reference, use the referenced Parse or Extract job endpoint for the underlying job status and output.

REST Endpoints

Operation	Endpoint
Create batch	`POST /api/v2/batches`
List batches	`GET /api/v2/batches`
Get batch	`GET /api/v2/batches/{batch_id}`

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/