Jobs
List Jobs
Run Job
Get Job
Run Job On File
Get Job Result
ModelsExpand Collapse
ExtractJob { id, extraction_agent, status, 3 more }
Schema for an extraction job.
id: string
The id of the extraction job
The agent that the job was run on.
id: string
The id of the extraction agent.
The configuration parameters for the extraction agent.
chunk_mode?: "PAGE" | "SECTION"
The mode to use for chunking the document.
Deprecatedcitation_bbox?: boolean
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources?: boolean
Whether to cite sources for the extraction.
confidence_scores?: boolean
Whether to fetch confidence scores for the extraction.
extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more
extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"
The extraction target specified.
high_resolution_mode?: boolean
Whether to use high resolution mode for the extraction.
invalidate_cache?: boolean
Whether to invalidate the cache for the extraction.
multimodal_fast_mode?: boolean
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context?: number | null
Number of pages to pass as context on long document extraction.
page_range?: string | null
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null
Public model names.
priority?: "low" | "medium" | "high" | "critical" | null
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt?: string | null
The system prompt to use for the extraction.
use_reasoning?: boolean
Whether to use reasoning for the extraction.
data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>
The schema of the data.
name: string
The name of the extraction agent.
project_id: string
The ID of the project that the extraction agent belongs to.
created_at?: string | null
The creation time of the extraction agent.
custom_configuration?: "default" | null
Custom configuration type for the extraction agent. Currently supports 'default'.
updated_at?: string | null
The last update time of the extraction agent.
status: "PENDING" | "SUCCESS" | "ERROR" | 2 more
The status of the extraction job
error?: string | null
The error that occurred during extraction
Schema for a file.
id: string
Unique identifier
project_id: string
The ID of the project that the file belongs to
created_at?: string | null
Creation datetime
data_source_id?: string | null
The ID of the data source that the file belongs to
expires_at?: string | null
The expiration date for the file. Files past this date can be deleted.
external_file_id?: string | null
The ID of the file in the external system
file_size?: number | null
Size of the file in bytes
file_type?: string | null
File type (e.g. pdf, docx, etc.)
last_modified_at?: string | null
The last modified time of the file
permission_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Permission information for the file
purpose?: string | null
The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')
resource_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Resource information for the file
updated_at?: string | null
Update datetime
file_id?: string | null
The id of the file that the extract was extracted from
WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }
Allows the user to configure webhook options for notifications and callbacks.
webhook_events?: Array<"extract.pending" | "extract.success" | "extract.error" | 14 more> | null
List of event names to subscribe to
webhook_headers?: Record<string, string> | null
Custom HTTP headers to include with webhook requests.
webhook_output_format?: string | null
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url?: string | null
The URL to send webhook notifications to.