Skip to content

List Extract Jobs

GET/api/v2/extract

List extraction jobs with optional filtering and pagination.

Query ParametersExpand Collapse
configuration_id: optional string

Filter by configuration ID

document_input_type: optional string

Filter by document input type (file_id or parse_job_id)

document_input_value: optional string

Filter by document input value

organization_id: optional string
page_size: optional number

Number of items per page

page_token: optional string

Token for pagination

project_id: optional string
status: optional "PENDING" or "THROTTLED" or "RUNNING" or 3 more

Filter by status

Accepts one of the following:
"PENDING"
"THROTTLED"
"RUNNING"
"COMPLETED"
"FAILED"
"CANCELLED"
Cookie ParametersExpand Collapse
session: optional string
ReturnsExpand Collapse
ExtractV2JobQueryResponse = object { items, next_page_token, total_size }

Paginated list of extraction jobs.

items: array of ExtractV2Job { id, created_at, parameters, 9 more }

The list of items.

id: string

Unique job identifier (job_id)

created_at: string

Creation timestamp

formatdate-time
parameters: map[map[unknown] or array of unknown or string or 2 more]

Job configuration parameters (includes parse_config_id, extract_options)

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
project_id: string

Project this job belongs to

status: "PENDING" or "THROTTLED" or "RUNNING" or 3 more

Current status of the job

Accepts one of the following:
"PENDING"
"THROTTLED"
"RUNNING"
"COMPLETED"
"FAILED"
"CANCELLED"
type: "url" or "file_id" or "parse_job_id"

Type of document input.

Accepts one of the following:
"url"
"file_id"
"parse_job_id"
updated_at: string

Last update timestamp

formatdate-time
value: string

Document identifier (URL, file ID, or parse job ID).

configuration_id: optional string

Extract configuration ID (ProductConfiguration) used for this job (if any)

error_message: optional string

Error message if failed

extract_metadata: optional ExtractJobMetadata { field_metadata, parse_job_id, parse_tier, usage }

Extraction metadata.

field_metadata: optional ExtractedFieldMetadata { document_metadata, page_metadata, row_metadata }

Metadata for extracted fields including document, page, and row level info.

document_metadata: optional map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
page_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
row_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
parse_job_id: optional string

Reference to the ParseJob ID used for parsing

parse_tier: optional string

Parse tier used for parsing the document

usage: optional ExtractJobUsage { num_document_tokens, num_output_tokens, num_pages_extracted }

Extraction usage metrics.

num_document_tokens: optional number

Number of document tokens

num_output_tokens: optional number

Number of output tokens

num_pages_extracted: optional number

Number of pages extracted

extract_result: optional map[map[unknown] or array of unknown or string or 2 more] or array of map[map[unknown] or array of unknown or string or 2 more]

Extracted data (object or array depending on extraction_target)

Accepts one of the following:
UnionMember0 = map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
UnionMember1 = array of map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
next_page_token: optional string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

total_size: optional number

The total number of items available. This is only populated when specifically requested. The value may be an estimate and can be used for display purposes only.

List Extract Jobs

curl https://api.cloud.llamaindex.ai/api/v2/extract \
    -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"
{
  "items": [
    {
      "id": "id",
      "created_at": "2019-12-27T18:11:19.117Z",
      "parameters": {
        "foo": {
          "foo": "bar"
        }
      },
      "project_id": "project_id",
      "status": "PENDING",
      "type": "url",
      "updated_at": "2019-12-27T18:11:19.117Z",
      "value": "value",
      "configuration_id": "configuration_id",
      "error_message": "error_message",
      "extract_metadata": {
        "field_metadata": {
          "document_metadata": {
            "foo": {
              "foo": "bar"
            }
          },
          "page_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ],
          "row_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ]
        },
        "parse_job_id": "parse_job_id",
        "parse_tier": "parse_tier",
        "usage": {
          "num_document_tokens": 0,
          "num_output_tokens": 0,
          "num_pages_extracted": 0
        }
      },
      "extract_result": {
        "foo": {
          "foo": "bar"
        }
      }
    }
  ],
  "next_page_token": "next_page_token",
  "total_size": 0
}
Returns Examples
{
  "items": [
    {
      "id": "id",
      "created_at": "2019-12-27T18:11:19.117Z",
      "parameters": {
        "foo": {
          "foo": "bar"
        }
      },
      "project_id": "project_id",
      "status": "PENDING",
      "type": "url",
      "updated_at": "2019-12-27T18:11:19.117Z",
      "value": "value",
      "configuration_id": "configuration_id",
      "error_message": "error_message",
      "extract_metadata": {
        "field_metadata": {
          "document_metadata": {
            "foo": {
              "foo": "bar"
            }
          },
          "page_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ],
          "row_metadata": [
            {
              "foo": {
                "foo": "bar"
              }
            }
          ]
        },
        "parse_job_id": "parse_job_id",
        "parse_tier": "parse_tier",
        "usage": {
          "num_document_tokens": 0,
          "num_output_tokens": 0,
          "num_pages_extracted": 0
        }
      },
      "extract_result": {
        "foo": {
          "foo": "bar"
        }
      }
    }
  ],
  "next_page_token": "next_page_token",
  "total_size": 0
}