Sheets

Create Spreadsheet Job

beta.sheets.create() -> SheetsJob

POST/api/v1/beta/sheets/jobs

List Spreadsheet Jobs

beta.sheets.list() -> SyncPaginatedCursor[SheetsJob]

GET/api/v1/beta/sheets/jobs

Get Spreadsheet Job

beta.sheets.get(, ) -> SheetsJob

GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}

Get Result Region

beta.sheets.get_result_table(, ) -> PresignedURL

GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}/regions/{region_id}/result/{region_type}

Delete Spreadsheet Job

beta.sheets.delete_job(, ) -> object

DELETE/api/v1/beta/sheets/jobs/{spreadsheet_job_id}

ModelsExpand Collapse

class SheetsJob: …

A spreadsheet parsing job.

id: str

The ID of the job

configuration: SheetsParsingConfig

Configuration applied to the parsing job (inline or resolved from a saved preset).

extraction_range: Optional[str]

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: Optional[bool]

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: Optional[bool]

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: Optional[bool]

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: Optional[List[str]]

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: Optional[str]

Optional specialization mode for domain-specific extraction. Supported values: ‘financial-standard’, ‘financial-enhanced’, ‘financial-precise’. Default None uses the general-purpose pipeline.

table_merge_sensitivity: Optional[Literal["strong", "weak"]]

Influences how likely similar-looking regions are merged into a single table. Useful for spreadsheets that either have sparse tables (strong merging) or many distinct tables close together (weak merging).

One of the following:

"strong"

"weak"

use_experimental_processing: Optional[bool]

Enables experimental processing. Accuracy may be impacted.

created_at: str

When the job was created

file_id: Optional[str]

The ID of the input file

formatuuid

project_id: str

The ID of the project

formatuuid

status: Literal["PENDING", "SUCCESS", "ERROR", 2 more]

The status of the parsing job

One of the following:

"PENDING"

"SUCCESS"

"ERROR"

"PARTIAL_SUCCESS"

"CANCELLED"

updated_at: str

When the job was last updated

user_id: str

The ID of the user

Deprecatedconfig: Optional[SheetsParsingConfig]

Configuration for spreadsheet parsing and region extraction

extraction_range: Optional[str]

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: Optional[bool]

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: Optional[bool]

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: Optional[bool]

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: Optional[List[str]]

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: Optional[str]

table_merge_sensitivity: Optional[Literal["strong", "weak"]]

One of the following:

"strong"

"weak"

use_experimental_processing: Optional[bool]

Enables experimental processing. Accuracy may be impacted.

configuration_id: Optional[str]

The saved product configuration ID used at create time, if any.

errors: Optional[List[str]]

Any errors encountered

Deprecatedfile: Optional[File]

Schema for a file.

id: str

Unique identifier

formatuuid

name: str

project_id: str

The ID of the project that the file belongs to

formatuuid

created_at: Optional[datetime]

Creation datetime

formatdate-time

data_source_id: Optional[str]

The ID of the data source that the file belongs to

formatuuid

expires_at: Optional[datetime]

The expiration date for the file. Files past this date can be deleted.

formatdate-time

external_file_id: Optional[str]

The ID of the file in the external system

file_size: Optional[int]

Size of the file in bytes

minimum0

file_type: Optional[str]

File type (e.g. pdf, docx, etc.)

maxLength3000

minLength1

last_modified_at: Optional[datetime]

The last modified time of the file

formatdate-time

permission_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

Permission information for the file

One of the following:

Dict[str, object]

List[object]

str

float

bool

purpose: Optional[str]

The intended purpose of the file (e.g., ‘user_data’, ‘parse’, ‘extract’, ‘split’, ‘classify’)

resource_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

Resource information for the file

One of the following:

Dict[str, object]

List[object]

str

float

bool

updated_at: Optional[datetime]

Update datetime

formatdate-time

metadata_state_transitions: Optional[Dict[str, object]]

Per-status entry timestamps. Returned only when requested via ?expand=metadata_state_transitions.

parameters: Optional[Parameters]

Job-time parameters such as webhook configurations.

webhook_configurations: Optional[List[ParametersWebhookConfiguration]]

Webhook configurations for job status notifications.

webhook_events: Optional[List[Literal["extract.pending", "extract.success", "extract.error", 20 more]]]

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:

"extract.pending"

"extract.success"

"extract.error"

"extract.partial_success"

"extract.cancelled"

"parse.pending"

"parse.running"

"parse.success"

"parse.error"

"parse.partial_success"

"parse.cancelled"

"classify.pending"

"classify.running"

"classify.success"

"classify.error"

"classify.partial_success"

"classify.cancelled"

"sheets.pending"

"sheets.success"

"sheets.error"

"sheets.partial_success"

"sheets.cancelled"

"unmapped_event"

webhook_headers: Optional[Dict[str, str]]

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

webhook_output_format: Optional[str]

Response format sent to the webhook: ‘string’ (default) or ‘json’

webhook_url: Optional[str]

URL to receive webhook POST notifications

regions: Optional[List[Region]]

All extracted regions (populated when job is complete)

location: str

Location of the region in the spreadsheet

region_type: str

Type of the extracted region

sheet_name: str

Worksheet name where region was found

description: Optional[str]

Generated description for the region

region_id: Optional[str]

Unique identifier for this region within the file

title: Optional[str]

Generated title for the region

success: Optional[bool]

Whether the job completed successfully

worksheet_metadata: Optional[List[WorksheetMetadata]]

Metadata for each processed worksheet (populated when job is complete)

sheet_name: str

Name of the worksheet

description: Optional[str]

Generated description of the worksheet

title: Optional[str]

Generated title for the worksheet

class SheetsParsingConfig: …

Configuration for spreadsheet parsing and region extraction

extraction_range: Optional[str]

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: Optional[bool]

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: Optional[bool]

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: Optional[bool]

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: Optional[List[str]]

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: Optional[str]

table_merge_sensitivity: Optional[Literal["strong", "weak"]]

One of the following:

"strong"

"weak"

use_experimental_processing: Optional[bool]

Enables experimental processing. Accuracy may be impacted.