List Extract Jobs

GET/api/v2/extract

List extraction jobs with optional filtering and pagination.

Filter by configuration_id, status, file_input, or creation date range. Results are returned newest-first. Use expand=configuration to include the full configuration used, and expand=extract_metadata for per-field metadata.

Query ParametersExpand Collapse

configuration_id: optional string

Filter by configuration ID

created_at_on_or_after: optional string

Include items created at or after this timestamp (inclusive)

formatdate-time

created_at_on_or_before: optional string

Include items created at or before this timestamp (inclusive)

formatdate-time

document_input_type: optional string

Filter by document input type (file_id or parse_job_id)

Deprecateddocument_input_value: optional string

Deprecated: use file_input instead

expand: optional array of string

Additional fields to include: configuration, extract_metadata

file_input: optional string

Filter by file input value

job_ids: optional array of string

Filter by specific job IDs

organization_id: optional string

page_size: optional number

Number of items per page

page_token: optional string

Token for pagination

project_id: optional string

status: optional "PENDING" or "THROTTLED" or "RUNNING" or 3 more

Filter by status

One of the following:

"PENDING"

"THROTTLED"

"RUNNING"

"COMPLETED"

"FAILED"

"CANCELLED"

Cookie ParametersExpand Collapse

session: optional string

ReturnsExpand Collapse

ExtractV2JobQueryResponse = object { items, next_page_token, total_size }

Paginated list of extraction jobs.

items: array of ExtractV2Job { id, created_at, file_input, 9 more }

The list of items.

id: string

Unique job identifier (job_id)

created_at: string

Creation timestamp

formatdate-time

file_input: string

File ID or parse job ID that was extracted

project_id: string

Project this job belongs to

status: string

Current job status.

PENDING — queued, not yet started
RUNNING — actively processing
COMPLETED — finished successfully
FAILED — terminated with an error
CANCELLED — cancelled by user

updated_at: string

Last update timestamp

formatdate-time

configuration: optional ExtractConfiguration { data_schema, cite_sources, confidence_scores, 8 more }

Extract configuration combining parse and extract settings.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:

map[unknown]

array of unknown

string

number

boolean

cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:

"per_doc"

"per_page"

"per_table_row"

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1

parse_config_id: optional string

Saved parse configuration ID to control how the document is parsed before extraction

parse_tier: optional string

Parse tier to use before extraction. Defaults to the extract tier if not specified.

system_prompt: optional string

Custom system prompt to guide extraction behavior

target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

tier: optional "cost_effective" or "agentic" or "agentic_plus"

Extract tier: cost_effective (5 credits/page), agentic (15 credits/page), or agentic_plus (50 credits/page)

One of the following:

"cost_effective"

"agentic"

"agentic_plus"

version: optional string

Use ‘latest’ for the latest release for the selected tier or a date string (YYYY-MM-DD format) to pin to the nearest release at or before that date. Job responses always report the concrete resolved version the job runs, fixed at job creation; saved configurations keep the value as provided.

configuration_id: optional string

Saved extract configuration ID used for this job, if any

error_message: optional string

Error details when status is FAILED

extract_metadata: optional ExtractJobMetadata { field_metadata, parse_job_id, parse_tier }

Extraction metadata.

field_metadata: optional ExtractedFieldMetadata { document_metadata, page_metadata, row_metadata }

Metadata for extracted fields including document, page, and row level info.

document_metadata: optional map[map[unknown] or array of unknown or string or 2 more]

Per-field metadata keyed by field name from your schema. Scalar fields (e.g. vendor) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. items) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.

One of the following:

map[unknown]

array of unknown

string

number

boolean

page_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-page metadata when extraction_target is per_page

One of the following:

map[unknown]

array of unknown

string

number

boolean

row_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-row metadata when extraction_target is per_table_row

One of the following:

map[unknown]

array of unknown

string

number

boolean

parse_job_id: optional string

Reference to the ParseJob ID used for parsing

parse_tier: optional string

Parse tier used for parsing the document

extract_result: optional map[map[unknown] or array of unknown or string or 2 more] or array of map[map[unknown] or array of unknown or string or 2 more]

Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.

One of the following:

map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

array of map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

metadata: optional object { usage }

Job-level metadata.

usage: optional ExtractJobUsage { num_pages_billed, num_pages_extracted }

Extraction usage metrics.

num_pages_billed: optional number

Number of effective pages billed

num_pages_extracted: optional number

Number of pages extracted

next_page_token: optional string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

total_size: optional number

The total number of items available. This is only populated when specifically requested. The value may be an estimate and can be used for display purposes only.

List Extract Jobs

curl https://api.cloud.llamaindex.ai/api/v2/extract \
    -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"

{
  "items": [
    {
      "id": "ext-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "created_at": "2019-12-27T18:11:19.117Z",
      "file_input": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "status": "COMPLETED",
      "updated_at": "2019-12-27T18:11:19.117Z",
      "configuration": {
        "data_schema": {
          "foo": {
            "foo": "bar"
          }
        },
        "cite_sources": true,
        "confidence_scores": true,
        "extraction_target": "per_doc",
        "max_pages": 10,
        "parse_config_id": "cfg-11111111-2222-3333-4444-555555555555",
        "parse_tier": "fast",
        "system_prompt": "Extract all monetary values in USD. If a currency is not specified, assume USD.",
        "target_pages": "1,3,5-7",
        "tier": "cost_effective",
        "version": "latest"
      },
      "configuration_id": "cfg-11111111-2222-3333-4444-555555555555",
      "error_message": "error_message",
      "extract_metadata": {
        "field_metadata": {
          "document_metadata": {
            "items": [
              {
                "amount": {
                  "citation": [
                    {
                      "matching_text": "$10.00",
                      "page": 1
                    }
                  ],
                  "confidence": 1
                },
                "description": {
                  "citation": [
                    {
                      "matching_text": "$10/month",
                      "page": 1
                    }
                  ],
                  "confidence": 0.998
                }
              }
            ],
            "total": {
              "citation": "bar",
              "confidence": "bar"
            },
            "vendor": {
              "citation": "bar",
              "confidence": "bar",
              "extraction_confidence": "bar",
              "parsing_confidence": "bar"
            }
          },
          "page_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ],
          "row_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ]
        },
        "parse_job_id": "parse_job_id",
        "parse_tier": "parse_tier"
      },
      "extract_result": {
        "foo": {
          "foo": "bar"
        }
      },
      "metadata": {
        "usage": {
          "num_pages_billed": 0,
          "num_pages_extracted": 0
        }
      }
    }
  ],
  "next_page_token": "next_page_token",
  "total_size": 0
}

Returns Examples

{
  "items": [
    {
      "id": "ext-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "created_at": "2019-12-27T18:11:19.117Z",
      "file_input": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "status": "COMPLETED",
      "updated_at": "2019-12-27T18:11:19.117Z",
      "configuration": {
        "data_schema": {
          "foo": {
            "foo": "bar"
          }
        },
        "cite_sources": true,
        "confidence_scores": true,
        "extraction_target": "per_doc",
        "max_pages": 10,
        "parse_config_id": "cfg-11111111-2222-3333-4444-555555555555",
        "parse_tier": "fast",
        "system_prompt": "Extract all monetary values in USD. If a currency is not specified, assume USD.",
        "target_pages": "1,3,5-7",
        "tier": "cost_effective",
        "version": "latest"
      },
      "configuration_id": "cfg-11111111-2222-3333-4444-555555555555",
      "error_message": "error_message",
      "extract_metadata": {
        "field_metadata": {
          "document_metadata": {
            "items": [
              {
                "amount": {
                  "citation": [
                    {
                      "matching_text": "$10.00",
                      "page": 1
                    }
                  ],
                  "confidence": 1
                },
                "description": {
                  "citation": [
                    {
                      "matching_text": "$10/month",
                      "page": 1
                    }
                  ],
                  "confidence": 0.998
                }
              }
            ],
            "total": {
              "citation": "bar",
              "confidence": "bar"
            },
            "vendor": {
              "citation": "bar",
              "confidence": "bar",
              "extraction_confidence": "bar",
              "parsing_confidence": "bar"
            }
          },
          "page_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ],
          "row_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ]
        },
        "parse_job_id": "parse_job_id",
        "parse_tier": "parse_tier"
      },
      "extract_result": {
        "foo": {
          "foo": "bar"
        }
      },
      "metadata": {
        "usage": {
          "num_pages_billed": 0,
          "num_pages_extracted": 0
        }
      }
    }
  ],
  "next_page_token": "next_page_token",
  "total_size": 0
}

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/