Skip to content

Beta

BetaAgent Data

Get Agent Data
GET/api/v1/beta/agent-data/{item_id}
Update Agent Data
PUT/api/v1/beta/agent-data/{item_id}
Delete Agent Data
DELETE/api/v1/beta/agent-data/{item_id}
Create Agent Data
POST/api/v1/beta/agent-data
Search Agent Data
POST/api/v1/beta/agent-data/:search
Aggregate Agent Data
POST/api/v1/beta/agent-data/:aggregate
Delete Agent Data By Query
POST/api/v1/beta/agent-data/:delete
ModelsExpand Collapse
AgentData = object { data, deployment_name, id, 4 more }

API Result for a single agent data item

data: map[unknown]
deployment_name: string
id: optional string
collection: optional string
created_at: optional string
project_id: optional string
updated_at: optional string
AgentDataDeleteResponse = map[string]
AgentDataAggregateResponse = object { group_key, count, first_item }

API Result for a single group in the aggregate response

group_key: map[unknown]
count: optional number
first_item: optional map[unknown]
AgentDataDeleteByQueryResponse = object { deleted_count }

API response for bulk delete operation

deleted_count: number

BetaSheets

Create Spreadsheet Job
POST/api/v1/beta/sheets/jobs
List Spreadsheet Jobs
GET/api/v1/beta/sheets/jobs
Get Spreadsheet Job
GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}
Get Result Region
GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}/regions/{region_id}/result/{region_type}
Delete Spreadsheet Job
DELETE/api/v1/beta/sheets/jobs/{spreadsheet_job_id}
ModelsExpand Collapse
SheetsJob = object { id, config, created_at, 10 more }

A spreadsheet parsing job

id: string

The ID of the job

config: SheetsParsingConfig { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 5 more }

Configuration for the parsing job

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: optional string

Optional specialization mode for domain-specific extraction. Supported values: ‘financial-standard’, ‘financial-enhanced’, ‘financial-precise’. Default None uses the general-purpose pipeline.

table_merge_sensitivity: optional "strong" or "weak"

Influences how likely similar-looking regions are merged into a single table. Useful for spreadsheets that either have sparse tables (strong merging) or many distinct tables close together (weak merging).

One of the following:
"strong"
"weak"
use_experimental_processing: optional boolean

Enables experimental processing. Accuracy may be impacted.

created_at: string

When the job was created

file_id: string

The ID of the input file

formatuuid
project_id: string

The ID of the project

formatuuid
status: StatusEnum

The status of the parsing job

One of the following:
"PENDING"
"SUCCESS"
"ERROR"
"PARTIAL_SUCCESS"
"CANCELLED"
updated_at: string

When the job was last updated

user_id: string

The ID of the user

errors: optional array of string

Any errors encountered

Deprecatedfile: optional File { id, name, project_id, 11 more }

Schema for a file.

id: string

Unique identifier

formatuuid
name: string
project_id: string

The ID of the project that the file belongs to

formatuuid
created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

The ID of the data source that the file belongs to

formatuuid
expires_at: optional string

The expiration date for the file. Files past this date can be deleted.

formatdate-time
external_file_id: optional string

The ID of the file in the external system

file_size: optional number

Size of the file in bytes

minimum0
file_type: optional string

File type (e.g. pdf, docx, etc.)

maxLength3000
minLength1
last_modified_at: optional string

The last modified time of the file

formatdate-time
permission_info: optional map[map[unknown] or array of unknown or string or 2 more]

Permission information for the file

One of the following:
map[unknown]
array of unknown
string
number
boolean
purpose: optional string

The intended purpose of the file (e.g., ‘user_data’, ‘parse’, ‘extract’, ‘split’, ‘classify’)

resource_info: optional map[map[unknown] or array of unknown or string or 2 more]

Resource information for the file

One of the following:
map[unknown]
array of unknown
string
number
boolean
updated_at: optional string

Update datetime

formatdate-time
regions: optional array of object { location, region_type, sheet_name, 3 more }

All extracted regions (populated when job is complete)

location: string

Location of the region in the spreadsheet

region_type: string

Type of the extracted region

sheet_name: string

Worksheet name where region was found

description: optional string

Generated description for the region

region_id: optional string

Unique identifier for this region within the file

title: optional string

Generated title for the region

success: optional boolean

Whether the job completed successfully

worksheet_metadata: optional array of object { sheet_name, description, title }

Metadata for each processed worksheet (populated when job is complete)

sheet_name: string

Name of the worksheet

description: optional string

Generated description of the worksheet

title: optional string

Generated title for the worksheet

SheetsParsingConfig = object { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 5 more }

Configuration for spreadsheet parsing and region extraction

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: optional string

Optional specialization mode for domain-specific extraction. Supported values: ‘financial-standard’, ‘financial-enhanced’, ‘financial-precise’. Default None uses the general-purpose pipeline.

table_merge_sensitivity: optional "strong" or "weak"

Influences how likely similar-looking regions are merged into a single table. Useful for spreadsheets that either have sparse tables (strong merging) or many distinct tables close together (weak merging).

One of the following:
"strong"
"weak"
use_experimental_processing: optional boolean

Enables experimental processing. Accuracy may be impacted.

SheetDeleteJobResponse = unknown

BetaDirectories

Create Directory
POST/api/v1/beta/directories
List Directories
GET/api/v1/beta/directories
Get Directory
GET/api/v1/beta/directories/{directory_id}
Update Directory
PATCH/api/v1/beta/directories/{directory_id}
Delete Directory
DELETE/api/v1/beta/directories/{directory_id}
ModelsExpand Collapse
DirectoryCreateResponse = object { id, name, project_id, 5 more }

API response schema for a directory.

id: string

Unique identifier for the directory.

name: string

Human-readable name for the directory.

minLength1
project_id: string

Project the directory belongs to.

created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source id the directory syncs from. Null if just manual uploads.

deleted_at: optional string

Optional timestamp of when the directory was deleted. Null if not deleted.

formatdate-time
description: optional string

Optional description shown to users.

updated_at: optional string

Update datetime

formatdate-time
DirectoryListResponse = object { id, name, project_id, 5 more }

API response schema for a directory.

id: string

Unique identifier for the directory.

name: string

Human-readable name for the directory.

minLength1
project_id: string

Project the directory belongs to.

created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source id the directory syncs from. Null if just manual uploads.

deleted_at: optional string

Optional timestamp of when the directory was deleted. Null if not deleted.

formatdate-time
description: optional string

Optional description shown to users.

updated_at: optional string

Update datetime

formatdate-time
DirectoryGetResponse = object { id, name, project_id, 5 more }

API response schema for a directory.

id: string

Unique identifier for the directory.

name: string

Human-readable name for the directory.

minLength1
project_id: string

Project the directory belongs to.

created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source id the directory syncs from. Null if just manual uploads.

deleted_at: optional string

Optional timestamp of when the directory was deleted. Null if not deleted.

formatdate-time
description: optional string

Optional description shown to users.

updated_at: optional string

Update datetime

formatdate-time
DirectoryUpdateResponse = object { id, name, project_id, 5 more }

API response schema for a directory.

id: string

Unique identifier for the directory.

name: string

Human-readable name for the directory.

minLength1
project_id: string

Project the directory belongs to.

created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source id the directory syncs from. Null if just manual uploads.

deleted_at: optional string

Optional timestamp of when the directory was deleted. Null if not deleted.

formatdate-time
description: optional string

Optional description shown to users.

updated_at: optional string

Update datetime

formatdate-time

BetaDirectoriesFiles

Add Directory File
POST/api/v1/beta/directories/{directory_id}/files
List Directory Files
GET/api/v1/beta/directories/{directory_id}/files
Get Directory File
GET/api/v1/beta/directories/{directory_id}/files/{directory_file_id}
Update Directory File
PATCH/api/v1/beta/directories/{directory_id}/files/{directory_file_id}
Delete Directory File
DELETE/api/v1/beta/directories/{directory_id}/files/{directory_file_id}
Upload File To Directory
POST/api/v1/beta/directories/{directory_id}/files/upload
ModelsExpand Collapse
FileAddResponse = object { id, directory_id, display_name, 8 more }

API response schema for a directory file.

id: string

Unique identifier for the directory file.

directory_id: string

Directory the file belongs to.

display_name: string

Display name for the file.

minLength1
project_id: string

Project the directory file belongs to.

unique_id: string

Unique identifier for the file in the directory

minLength1
created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source credential associated with the file.

deleted_at: optional string

Soft delete marker when the file is removed upstream or by user action.

formatdate-time
file_id: optional string

File ID for the storage location.

metadata: optional map[string or number or boolean]

Merged metadata from all sources. Higher-priority sources override lower.

One of the following:
string
number
boolean
updated_at: optional string

Update datetime

formatdate-time
FileListResponse = object { id, directory_id, display_name, 8 more }

API response schema for a directory file.

id: string

Unique identifier for the directory file.

directory_id: string

Directory the file belongs to.

display_name: string

Display name for the file.

minLength1
project_id: string

Project the directory file belongs to.

unique_id: string

Unique identifier for the file in the directory

minLength1
created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source credential associated with the file.

deleted_at: optional string

Soft delete marker when the file is removed upstream or by user action.

formatdate-time
file_id: optional string

File ID for the storage location.

metadata: optional map[string or number or boolean]

Merged metadata from all sources. Higher-priority sources override lower.

One of the following:
string
number
boolean
updated_at: optional string

Update datetime

formatdate-time
FileGetResponse = object { id, directory_id, display_name, 8 more }

API response schema for a directory file.

id: string

Unique identifier for the directory file.

directory_id: string

Directory the file belongs to.

display_name: string

Display name for the file.

minLength1
project_id: string

Project the directory file belongs to.

unique_id: string

Unique identifier for the file in the directory

minLength1
created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source credential associated with the file.

deleted_at: optional string

Soft delete marker when the file is removed upstream or by user action.

formatdate-time
file_id: optional string

File ID for the storage location.

metadata: optional map[string or number or boolean]

Merged metadata from all sources. Higher-priority sources override lower.

One of the following:
string
number
boolean
updated_at: optional string

Update datetime

formatdate-time
FileUpdateResponse = object { id, directory_id, display_name, 8 more }

API response schema for a directory file.

id: string

Unique identifier for the directory file.

directory_id: string

Directory the file belongs to.

display_name: string

Display name for the file.

minLength1
project_id: string

Project the directory file belongs to.

unique_id: string

Unique identifier for the file in the directory

minLength1
created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source credential associated with the file.

deleted_at: optional string

Soft delete marker when the file is removed upstream or by user action.

formatdate-time
file_id: optional string

File ID for the storage location.

metadata: optional map[string or number or boolean]

Merged metadata from all sources. Higher-priority sources override lower.

One of the following:
string
number
boolean
updated_at: optional string

Update datetime

formatdate-time
FileUploadResponse = object { id, directory_id, display_name, 8 more }

API response schema for a directory file.

id: string

Unique identifier for the directory file.

directory_id: string

Directory the file belongs to.

display_name: string

Display name for the file.

minLength1
project_id: string

Project the directory file belongs to.

unique_id: string

Unique identifier for the file in the directory

minLength1
created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

Optional data source credential associated with the file.

deleted_at: optional string

Soft delete marker when the file is removed upstream or by user action.

formatdate-time
file_id: optional string

File ID for the storage location.

metadata: optional map[string or number or boolean]

Merged metadata from all sources. Higher-priority sources override lower.

One of the following:
string
number
boolean
updated_at: optional string

Update datetime

formatdate-time

BetaBatch

Create Batch Job
POST/api/v1/beta/batch-processing
List Batch Jobs
GET/api/v1/beta/batch-processing
Get Batch Job Status
GET/api/v1/beta/batch-processing/{job_id}
Cancel Batch Job
POST/api/v1/beta/batch-processing/{job_id}/cancel
ModelsExpand Collapse
BatchCreateResponse = object { id, job_type, project_id, 14 more }

Response schema for a batch processing job.

id: string

Unique identifier for the batch job

job_type: "parse" or "extract" or "classify"

Type of processing operation (parse or classify)

One of the following:
"parse"
"extract"
"classify"
project_id: string

Project this job belongs to

status: "pending" or "running" or "dispatched" or 3 more

Current job status

One of the following:
"pending"
"running"
"dispatched"
"completed"
"failed"
"cancelled"
total_items: number

Total number of items in the job

completed_at: optional string

Timestamp when job completed

formatdate-time
created_at: optional string

Creation datetime

formatdate-time
directory_id: optional string

Directory being processed

effective_at: optional string
error_message: optional string

Error message for the latest job attempt, if any.

failed_items: optional number

Number of items that failed processing

job_record_id: optional string

The job record ID associated with this status, if any.

processed_items: optional number

Number of items processed so far

skipped_items: optional number

Number of items skipped (already processed or size limit)

started_at: optional string

Timestamp when job processing started

formatdate-time
updated_at: optional string

Update datetime

formatdate-time
workflow_id: optional string

Async job tracking ID

BatchListResponse = object { id, job_type, project_id, 14 more }

Response schema for a batch processing job.

id: string

Unique identifier for the batch job

job_type: "parse" or "extract" or "classify"

Type of processing operation (parse or classify)

One of the following:
"parse"
"extract"
"classify"
project_id: string

Project this job belongs to

status: "pending" or "running" or "dispatched" or 3 more

Current job status

One of the following:
"pending"
"running"
"dispatched"
"completed"
"failed"
"cancelled"
total_items: number

Total number of items in the job

completed_at: optional string

Timestamp when job completed

formatdate-time
created_at: optional string

Creation datetime

formatdate-time
directory_id: optional string

Directory being processed

effective_at: optional string
error_message: optional string

Error message for the latest job attempt, if any.

failed_items: optional number

Number of items that failed processing

job_record_id: optional string

The job record ID associated with this status, if any.

processed_items: optional number

Number of items processed so far

skipped_items: optional number

Number of items skipped (already processed or size limit)

started_at: optional string

Timestamp when job processing started

formatdate-time
updated_at: optional string

Update datetime

formatdate-time
workflow_id: optional string

Async job tracking ID

BatchGetStatusResponse = object { job, progress_percentage }

Detailed status response for a batch processing job.

job: object { id, job_type, project_id, 14 more }

Response schema for a batch processing job.

id: string

Unique identifier for the batch job

job_type: "parse" or "extract" or "classify"

Type of processing operation (parse or classify)

One of the following:
"parse"
"extract"
"classify"
project_id: string

Project this job belongs to

status: "pending" or "running" or "dispatched" or 3 more

Current job status

One of the following:
"pending"
"running"
"dispatched"
"completed"
"failed"
"cancelled"
total_items: number

Total number of items in the job

completed_at: optional string

Timestamp when job completed

formatdate-time
created_at: optional string

Creation datetime

formatdate-time
directory_id: optional string

Directory being processed

effective_at: optional string
error_message: optional string

Error message for the latest job attempt, if any.

failed_items: optional number

Number of items that failed processing

job_record_id: optional string

The job record ID associated with this status, if any.

processed_items: optional number

Number of items processed so far

skipped_items: optional number

Number of items skipped (already processed or size limit)

started_at: optional string

Timestamp when job processing started

formatdate-time
updated_at: optional string

Update datetime

formatdate-time
workflow_id: optional string

Async job tracking ID

progress_percentage: number

Percentage of items processed (0-100)

maximum100
minimum0
BatchCancelResponse = object { job_id, message, processed_items, status }

Response after cancelling a batch job.

job_id: string

ID of the cancelled job

message: string

Confirmation message

processed_items: number

Number of items processed before cancellation

status: "pending" or "running" or "dispatched" or 3 more

New status (should be ‘cancelled’)

One of the following:
"pending"
"running"
"dispatched"
"completed"
"failed"
"cancelled"

BetaBatchJob Items

List Batch Job Items
GET/api/v1/beta/batch-processing/{job_id}/items
Get Item Processing Results
GET/api/v1/beta/batch-processing/items/{item_id}/processing-results
ModelsExpand Collapse
JobItemListResponse = object { item_id, item_name, status, 7 more }

Detailed information about an item in a batch job.

item_id: string

ID of the item

item_name: string

Name of the item

status: "pending" or "processing" or "completed" or 3 more

Processing status of this item

One of the following:
"pending"
"processing"
"completed"
"failed"
"skipped"
"cancelled"
completed_at: optional string

When processing completed for this item

formatdate-time
effective_at: optional string
error_message: optional string

Error message for the latest job attempt, if any.

job_id: optional string

Job ID for the underlying processing job (links to parse/extract job results)

job_record_id: optional string

The job record ID associated with this status, if any.

skip_reason: optional string

Reason item was skipped (e.g., ‘already_processed’, ‘size_limit_exceeded’)

started_at: optional string

When processing started for this item

formatdate-time
JobItemGetProcessingResultsResponse = object { item_id, item_name, processing_results }

Response containing all processing results for an item.

item_id: string

ID of the source item

item_name: string

Name of the source item

processing_results: optional array of object { item_id, job_config, job_type, 5 more }

List of all processing operations performed on this item

item_id: string

Source item that was processed

job_config: object { correlation_id, job_name, parameters, 6 more } or ClassifyJob { id, project_id, rules, 9 more }

Job configuration used for processing

One of the following:
BatchParseJobRecordCreate = object { correlation_id, job_name, parameters, 6 more }

Batch-specific parse job record for batch processing.

This model contains the metadata and configuration for a batch parse job, but excludes file-specific information. It’s used as input to the batch parent workflow and combined with DirectoryFile data to create full ParseJobRecordCreate instances for each file.

Attributes: job_name: Must be PARSE_RAW_FILE partitions: Partitions for job output location parameters: Generic parse configuration (BatchParseJobConfig) session_id: Upstream request ID for tracking correlation_id: Correlation ID for cross-service tracking parent_job_execution_id: Parent job execution ID if nested user_id: User who created the job project_id: Project this job belongs to webhook_url: Optional webhook URL for job completion notifications

correlation_id: optional string

The correlation ID for this job. Used for tracking the job across services.

formatuuid
job_name: optional "parse_raw_file_job"
parameters: optional object { adaptive_long_table, aggressive_table_extraction, annotate_links, 122 more }

Generic parse job configuration for batch processing.

This model contains the parsing configuration that applies to all files in a batch, but excludes file-specific fields like file_name, file_id, etc. Those file-specific fields are populated from DirectoryFile data when creating individual ParseJobRecordCreate instances for each file.

The fields in this model should be generic settings that apply uniformly to all files being processed in the batch.

adaptive_long_table: optional boolean
aggressive_table_extraction: optional boolean
auto_mode: optional boolean
auto_mode_configuration_json: optional string
auto_mode_trigger_on_image_in_page: optional boolean
auto_mode_trigger_on_regexp_in_page: optional string
auto_mode_trigger_on_table_in_page: optional boolean
auto_mode_trigger_on_text_in_page: optional string
azure_openai_api_version: optional string
azure_openai_deployment_name: optional string
azure_openai_endpoint: optional string
azure_openai_key: optional string
bbox_bottom: optional number
bbox_left: optional number
bbox_right: optional number
bbox_top: optional number
bounding_box: optional string
compact_markdown_table: optional boolean
complemental_formatting_instruction: optional string
content_guideline_instruction: optional string
continuous_mode: optional boolean
custom_metadata: optional map[unknown]

The custom metadata to attach to the documents.

disable_image_extraction: optional boolean
disable_ocr: optional boolean
disable_reconstruction: optional boolean
do_not_cache: optional boolean
do_not_unroll_columns: optional boolean
enable_cost_optimizer: optional boolean
extract_charts: optional boolean
extract_layout: optional boolean
extract_printed_page_number: optional boolean
fast_mode: optional boolean
formatting_instruction: optional string
gpt4o_api_key: optional string
gpt4o_mode: optional boolean
guess_xlsx_sheet_name: optional boolean
hide_footers: optional boolean
hide_headers: optional boolean
high_res_ocr: optional boolean
html_make_all_elements_visible: optional boolean
html_remove_fixed_elements: optional boolean
html_remove_navigation_elements: optional boolean
http_proxy: optional string
ignore_document_elements_for_layout_detection: optional boolean
images_to_save: optional array of "screenshot" or "embedded" or "layout"
One of the following:
"screenshot"
"embedded"
"layout"
inline_images_in_markdown: optional boolean
input_s3_path: optional string
input_s3_region: optional string

The region for the input S3 bucket.

input_url: optional string
internal_is_screenshot_job: optional boolean
invalidate_cache: optional boolean
is_formatting_instruction: optional boolean
job_timeout_extra_time_per_page_in_seconds: optional number
job_timeout_in_seconds: optional number
keep_page_separator_when_merging_tables: optional boolean
lang: optional string

The language.

languages: optional array of ParsingLanguages
One of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
layout_aware: optional boolean
line_level_bounding_box: optional boolean
markdown_table_multiline_header_separator: optional string
max_pages: optional number
max_pages_enforced: optional number
merge_tables_across_pages_in_markdown: optional boolean
model: optional string
outlined_table_extraction: optional boolean
output_pdf_of_document: optional boolean
output_s3_path_prefix: optional string

If specified, llamaParse will save the output to the specified path. All output file will use this ‘prefix’ should be a valid s3:// url

output_s3_region: optional string

The region for the output S3 bucket.

output_tables_as_HTML: optional boolean
outputBucket: optional string

The output bucket.

page_error_tolerance: optional number
page_header_prefix: optional string
page_header_suffix: optional string
page_prefix: optional string
page_separator: optional string
page_suffix: optional string
parse_mode: optional ParsingMode

Enum for representing the mode of parsing to be used.

One of the following:
"parse_page_without_llm"
"parse_page_with_llm"
"parse_page_with_lvm"
"parse_page_with_agent"
"parse_page_with_layout_agent"
"parse_document_with_llm"
"parse_document_with_lvm"
"parse_document_with_agent"
parsing_instruction: optional string
pipeline_id: optional string

The pipeline ID.

precise_bounding_box: optional boolean
premium_mode: optional boolean
presentation_out_of_bounds_content: optional boolean
presentation_skip_embedded_data: optional boolean
preserve_layout_alignment_across_pages: optional boolean
preserve_very_small_text: optional boolean
preset: optional string
priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:
"low"
"medium"
"high"
"critical"
project_id: optional string
remove_hidden_text: optional boolean
replace_failed_page_mode: optional FailPageMode

Enum for representing the different available page error handling modes.

One of the following:
"raw_text"
"blank_page"
"error_message"
replace_failed_page_with_error_message_prefix: optional string
replace_failed_page_with_error_message_suffix: optional string
resource_info: optional map[unknown]

The resource info about the file

save_images: optional boolean
skip_diagonal_text: optional boolean
specialized_chart_parsing_agentic: optional boolean
specialized_chart_parsing_efficient: optional boolean
specialized_chart_parsing_plus: optional boolean
specialized_image_parsing: optional boolean
spreadsheet_extract_sub_tables: optional boolean
spreadsheet_force_formula_computation: optional boolean
spreadsheet_include_hidden_sheets: optional boolean
strict_mode_buggy_font: optional boolean
strict_mode_image_extraction: optional boolean
strict_mode_image_ocr: optional boolean
strict_mode_reconstruction: optional boolean
structured_output: optional boolean
structured_output_json_schema: optional string
structured_output_json_schema_name: optional string
system_prompt: optional string
system_prompt_append: optional string
take_screenshot: optional boolean
target_pages: optional string
tier: optional string
type: optional "parse"
use_vendor_multimodal_model: optional boolean
user_prompt: optional string
vendor_multimodal_api_key: optional string
vendor_multimodal_model_name: optional string
version: optional string
webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_output_format, webhook_url }

Outbound webhook endpoints to notify on job status changes

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:
"extract.pending"
"extract.success"
"extract.error"
"extract.partial_success"
"extract.cancelled"
"parse.pending"
"parse.running"
"parse.success"
"parse.error"
"parse.partial_success"
"parse.cancelled"
"classify.pending"
"classify.success"
"classify.error"
"classify.partial_success"
"classify.cancelled"
"unmapped_event"
webhook_headers: optional map[string]

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

webhook_output_format: optional string

Response format sent to the webhook: ‘string’ (default) or ‘json’

webhook_url: optional string

URL to receive webhook POST notifications

webhook_url: optional string
parent_job_execution_id: optional string

The ID of the parent job execution.

formatuuid
partitions: optional map[string]

The partitions for this execution. Used for determining where to save job output.

project_id: optional string

The ID of the project this job belongs to.

formatuuid
session_id: optional string

The upstream request ID that created this job. Used for tracking the job across services.

formatuuid
user_id: optional string

The ID of the user that created this job

webhook_url: optional string

The URL that needs to be called at the end of the parsing job.

ClassifyJob = object { id, project_id, rules, 9 more }

A classify job.

id: string

Unique identifier

formatuuid
project_id: string

The ID of the project

formatuuid
rules: array of ClassifierRule { description, type }

The rules to classify the files

description: string

Natural language description of what to classify. Be specific about the content characteristics that identify this document type.

maxLength500
minLength10
type: string

The document type to assign when this rule matches (e.g., ‘invoice’, ‘receipt’, ‘contract’)

maxLength50
minLength1
status: StatusEnum

The status of the classify job

One of the following:
"PENDING"
"SUCCESS"
"ERROR"
"PARTIAL_SUCCESS"
"CANCELLED"
user_id: string

The ID of the user

created_at: optional string

Creation datetime

formatdate-time
effective_at: optional string
error_message: optional string

Error message for the latest job attempt, if any.

job_record_id: optional string

The job record ID associated with this status, if any.

mode: optional "FAST" or "MULTIMODAL"

The classification mode to use

One of the following:
"FAST"
"MULTIMODAL"
parsing_configuration: optional ClassifyParsingConfiguration { lang, max_pages, target_pages }

The configuration for the parsing job

lang: optional ParsingLanguages

The language to parse the files in

One of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
max_pages: optional number

The maximum number of pages to parse

target_pages: optional array of number

The pages to target for parsing (0-indexed, so first page is at 0)

updated_at: optional string

Update datetime

formatdate-time
job_type: "parse" or "extract" or "classify"

Type of processing performed

One of the following:
"parse"
"extract"
"classify"
output_s3_path: string

Location of the processing output

parameters_hash: string

Content hash of the job configuration for dedup

processed_at: string

When this processing occurred

formatdate-time
result_id: string

Unique identifier for this result

output_metadata: optional unknown

Metadata about processing output.

Currently empty - will be populated with job-type-specific metadata fields in the future.

BetaSplit

Create Split Job
POST/api/v1/beta/split/jobs
List Split Jobs
GET/api/v1/beta/split/jobs
Get Split Job
GET/api/v1/beta/split/jobs/{split_job_id}
ModelsExpand Collapse
SplitCategory = object { name, description }

Category definition for document splitting.

name: string

Name of the category.

maxLength200
minLength1
description: optional string

Optional description of what content belongs in this category.

maxLength2000
minLength1
SplitDocumentInput = object { type, value }

Document input specification for beta API.

type: string

Type of document input. Valid values are: file_id

value: string

Document identifier.

SplitResultResponse = object { segments }

Result of a completed split job.

segments: array of SplitSegmentResponse { category, confidence_category, pages }

List of document segments.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

SplitSegmentResponse = object { category, confidence_category, pages }

A segment of the split document.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

SplitCreateResponse = object { id, categories, document_input, 8 more }

Beta response — uses nested document_input object.

id: string

Unique identifier for the split job.

categories: array of SplitCategory { name, description }

Categories used for splitting.

name: string

Name of the category.

maxLength200
minLength1
description: optional string

Optional description of what content belongs in this category.

maxLength2000
minLength1
document_input: SplitDocumentInput { type, value }

Document that was split.

type: string

Type of document input. Valid values are: file_id

value: string

Document identifier.

project_id: string

Project ID this job belongs to.

status: string

Current status of the job. Valid values are: pending, processing, completed, failed, cancelled.

user_id: string

User ID who created this job.

configuration_id: optional string

Split configuration ID used for this job.

created_at: optional string

Creation datetime

formatdate-time
error_message: optional string

Error message if the job failed.

result: optional SplitResultResponse { segments }

Result of a completed split job.

segments: array of SplitSegmentResponse { category, confidence_category, pages }

List of document segments.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

updated_at: optional string

Update datetime

formatdate-time
SplitListResponse = object { id, categories, document_input, 8 more }

Beta response — uses nested document_input object.

id: string

Unique identifier for the split job.

categories: array of SplitCategory { name, description }

Categories used for splitting.

name: string

Name of the category.

maxLength200
minLength1
description: optional string

Optional description of what content belongs in this category.

maxLength2000
minLength1
document_input: SplitDocumentInput { type, value }

Document that was split.

type: string

Type of document input. Valid values are: file_id

value: string

Document identifier.

project_id: string

Project ID this job belongs to.

status: string

Current status of the job. Valid values are: pending, processing, completed, failed, cancelled.

user_id: string

User ID who created this job.

configuration_id: optional string

Split configuration ID used for this job.

created_at: optional string

Creation datetime

formatdate-time
error_message: optional string

Error message if the job failed.

result: optional SplitResultResponse { segments }

Result of a completed split job.

segments: array of SplitSegmentResponse { category, confidence_category, pages }

List of document segments.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

updated_at: optional string

Update datetime

formatdate-time
SplitGetResponse = object { id, categories, document_input, 8 more }

Beta response — uses nested document_input object.

id: string

Unique identifier for the split job.

categories: array of SplitCategory { name, description }

Categories used for splitting.

name: string

Name of the category.

maxLength200
minLength1
description: optional string

Optional description of what content belongs in this category.

maxLength2000
minLength1
document_input: SplitDocumentInput { type, value }

Document that was split.

type: string

Type of document input. Valid values are: file_id

value: string

Document identifier.

project_id: string

Project ID this job belongs to.

status: string

Current status of the job. Valid values are: pending, processing, completed, failed, cancelled.

user_id: string

User ID who created this job.

configuration_id: optional string

Split configuration ID used for this job.

created_at: optional string

Creation datetime

formatdate-time
error_message: optional string

Error message if the job failed.

result: optional SplitResultResponse { segments }

Result of a completed split job.

segments: array of SplitSegmentResponse { category, confidence_category, pages }

List of document segments.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

updated_at: optional string

Update datetime

formatdate-time