Create Spreadsheet Job

Deprecated

$ llamacloud-prod beta:sheets create

POST/api/v1/beta/sheets/jobs

Create a spreadsheet parsing job.

Provide at most one of configuration (an inline parsing configuration) or configuration_id (a saved configuration preset). If neither is provided, a default configuration is used. Optionally include webhook_configurations to receive sheets.* status notifications.

ParametersExpand Collapse

--file-id: string

Body param: The ID of the file to parse

--organization-id: optional string

Query param

--project-id: optional string

Query param

--config: optional object { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 6 more }

Body param: Configuration for spreadsheet parsing and region extraction

--configuration: optional object { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 6 more }

Body param: Configuration for spreadsheet parsing and region extraction

--configuration-id: optional string

Body param: Saved configuration ID

--webhook-configuration: optional array of object { webhook_events, webhook_headers, webhook_output_format, 2 more }

Body param: Outbound webhook endpoints to notify on job status changes

ReturnsExpand Collapse

sheets_job: object { id, configuration, created_at, 14 more }

A spreadsheet parsing job.

id: string

The ID of the job

configuration: object { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 6 more }

Configuration applied to the parsing job (inline or resolved from a saved preset).

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Deprecated: controlled by tier. Whether to generate additional metadata (title, description) for each extracted region. Honored only on agentic.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: optional string

Deprecated: controlled by tier. Optional specialization mode for domain-specific extraction. Supported values: ‘financial-standard’, ‘financial-enhanced’, ‘financial-precise’. Default None uses the general-purpose pipeline. Honored only on agentic.

table_merge_sensitivity: optional "strong" or "weak"

Deprecated: controlled by tier. Influences how likely similar-looking regions are merged into a single table. Honored only on agentic.

"strong"

"weak"

tier: optional "cost_effective" or "agentic"

Spreadsheet extraction tier. cost_effective uses the rule-based/ML-only pipeline; agentic uses the full pipeline.

"cost_effective"

"agentic"

use_experimental_processing: optional boolean

Deprecated: controlled by tier. Enables experimental processing. Honored only on agentic.

created_at: string

When the job was created

file_id: string

The ID of the input file

project_id: string

The ID of the project

status: "PENDING" or "SUCCESS" or "ERROR" or 2 more

The status of the parsing job

"PENDING"

"SUCCESS"

"ERROR"

"PARTIAL_SUCCESS"

"CANCELLED"

updated_at: string

When the job was last updated

user_id: string

The ID of the user

Deprecatedconfig: optional object { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 6 more }

Configuration for spreadsheet parsing and region extraction

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Deprecated: controlled by tier. Whether to generate additional metadata (title, description) for each extracted region. Honored only on agentic.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: optional string

table_merge_sensitivity: optional "strong" or "weak"

Deprecated: controlled by tier. Influences how likely similar-looking regions are merged into a single table. Honored only on agentic.

"strong"

"weak"

tier: optional "cost_effective" or "agentic"

Spreadsheet extraction tier. cost_effective uses the rule-based/ML-only pipeline; agentic uses the full pipeline.

"cost_effective"

"agentic"

use_experimental_processing: optional boolean

Deprecated: controlled by tier. Enables experimental processing. Honored only on agentic.

configuration_id: optional string

The saved product configuration ID used at create time, if any.

errors: optional array of string

Any errors encountered

Deprecatedfile: optional object { id, name, project_id, 11 more }

Schema for a file.

id: string

Unique identifier

name: string

project_id: string

The ID of the project that the file belongs to

created_at: optional string

Creation datetime

data_source_id: optional string

The ID of the data source that the file belongs to

expires_at: optional string

The expiration date for the file. Files past this date can be deleted.

external_file_id: optional string

The ID of the file in the external system

file_size: optional number

Size of the file in bytes

file_type: optional string

File type (e.g. pdf, docx, etc.)

last_modified_at: optional string

The last modified time of the file

permission_info: optional map[map[unknown] or array of unknown or string or 2 more]

Permission information for the file

union_member_0: map[unknown]

union_member_1: array of unknown

union_member_2: string

union_member_3: number

union_member_4: boolean

purpose: optional string

The intended purpose of the file (e.g., ‘user_data’, ‘parse’, ‘extract’, ‘split’, ‘classify’)

resource_info: optional map[map[unknown] or array of unknown or string or 2 more]

Resource information for the file

union_member_0: map[unknown]

union_member_1: array of unknown

union_member_2: string

union_member_3: number

union_member_4: boolean

updated_at: optional string

Update datetime

metadata_state_transitions: optional map[unknown]

Per-status entry timestamps. Returned only when requested via ?expand=metadata_state_transitions.

parameters: optional object { webhook_configurations }

Job-time parameters such as webhook configurations.

webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_output_format, 2 more }

Webhook configurations for job status notifications.

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 25 more

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

"extract.pending"

"extract.success"

"extract.error"

"extract.partial_success"

"extract.cancelled"

"parse.pending"

"parse.running"

"parse.success"

"parse.error"

"parse.partial_success"

"parse.cancelled"

"classify.pending"

"classify.running"

"classify.success"

"classify.error"

"classify.partial_success"

"classify.cancelled"

"sheets.pending"

"sheets.success"

"sheets.error"

"sheets.partial_success"

"sheets.cancelled"

"split.pending"

"split.processing"

"split.success"

"split.error"

"split.cancelled"

"unmapped_event"

webhook_headers: optional map[string]

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

webhook_output_format: optional string

Response format sent to the webhook: ‘string’ (default) or ‘json’

webhook_signing_secret: optional string

Shared signing secret used to sign webhook deliveries. When set, each request includes an HMAC-SHA256 signature of the request body in the ‘LC-Signature’ header (value ‘sha256=’). Recompute the HMAC over the raw request body with this secret to verify the delivery is authentic.

webhook_url: optional string

URL to receive webhook POST notifications

regions: optional array of object { location, region_type, sheet_name, 3 more }

All extracted regions (populated when job is complete)

location: string

Location of the region in the spreadsheet

region_type: string

Type of the extracted region

sheet_name: string

Worksheet name where region was found

description: optional string

Generated description for the region

region_id: optional string

Unique identifier for this region within the file

title: optional string

Generated title for the region

success: optional boolean

Whether the job completed successfully

worksheet_metadata: optional array of object { sheet_name, description, title }

Metadata for each processed worksheet (populated when job is complete)

sheet_name: string

Name of the worksheet

description: optional string

Generated description of the worksheet

title: optional string

Generated title for the worksheet

Create Spreadsheet Job

llamacloud-prod beta:sheets create \
  --api-key 'My API Key' \
  --file-id 182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e

{
  "id": "id",
  "configuration": {
    "extraction_range": "extraction_range",
    "flatten_hierarchical_tables": true,
    "generate_additional_metadata": true,
    "include_hidden_cells": true,
    "sheet_names": [
      "string"
    ],
    "specialization": "specialization",
    "table_merge_sensitivity": "strong",
    "tier": "cost_effective",
    "use_experimental_processing": true
  },
  "created_at": "created_at",
  "file_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "status": "PENDING",
  "updated_at": "updated_at",
  "user_id": "user_id",
  "config": {
    "extraction_range": "extraction_range",
    "flatten_hierarchical_tables": true,
    "generate_additional_metadata": true,
    "include_hidden_cells": true,
    "sheet_names": [
      "string"
    ],
    "specialization": "specialization",
    "table_merge_sensitivity": "strong",
    "tier": "cost_effective",
    "use_experimental_processing": true
  },
  "configuration_id": "configuration_id",
  "errors": [
    "string"
  ],
  "file": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "name": "x",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "data_source_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "expires_at": "2019-12-27T18:11:19.117Z",
    "external_file_id": "external_file_id",
    "file_size": 0,
    "file_type": "x",
    "last_modified_at": "2019-12-27T18:11:19.117Z",
    "permission_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "purpose": "purpose",
    "resource_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "metadata_state_transitions": {
    "foo": "bar"
  },
  "parameters": {
    "webhook_configurations": [
      {
        "webhook_events": [
          "parse.success",
          "parse.error"
        ],
        "webhook_headers": {
          "Authorization": "Bearer sk-..."
        },
        "webhook_output_format": "json",
        "webhook_signing_secret": "whsec_...",
        "webhook_url": "https://example.com/webhooks/llamacloud"
      }
    ]
  },
  "regions": [
    {
      "location": "location",
      "region_type": "region_type",
      "sheet_name": "sheet_name",
      "description": "description",
      "region_id": "region_id",
      "title": "title"
    }
  ],
  "success": true,
  "worksheet_metadata": [
    {
      "sheet_name": "sheet_name",
      "description": "description",
      "title": "title"
    }
  ]
}

Returns Examples

{
  "id": "id",
  "configuration": {
    "extraction_range": "extraction_range",
    "flatten_hierarchical_tables": true,
    "generate_additional_metadata": true,
    "include_hidden_cells": true,
    "sheet_names": [
      "string"
    ],
    "specialization": "specialization",
    "table_merge_sensitivity": "strong",
    "tier": "cost_effective",
    "use_experimental_processing": true
  },
  "created_at": "created_at",
  "file_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "status": "PENDING",
  "updated_at": "updated_at",
  "user_id": "user_id",
  "config": {
    "extraction_range": "extraction_range",
    "flatten_hierarchical_tables": true,
    "generate_additional_metadata": true,
    "include_hidden_cells": true,
    "sheet_names": [
      "string"
    ],
    "specialization": "specialization",
    "table_merge_sensitivity": "strong",
    "tier": "cost_effective",
    "use_experimental_processing": true
  },
  "configuration_id": "configuration_id",
  "errors": [
    "string"
  ],
  "file": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "name": "x",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "data_source_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "expires_at": "2019-12-27T18:11:19.117Z",
    "external_file_id": "external_file_id",
    "file_size": 0,
    "file_type": "x",
    "last_modified_at": "2019-12-27T18:11:19.117Z",
    "permission_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "purpose": "purpose",
    "resource_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "metadata_state_transitions": {
    "foo": "bar"
  },
  "parameters": {
    "webhook_configurations": [
      {
        "webhook_events": [
          "parse.success",
          "parse.error"
        ],
        "webhook_headers": {
          "Authorization": "Bearer sk-..."
        },
        "webhook_output_format": "json",
        "webhook_signing_secret": "whsec_...",
        "webhook_url": "https://example.com/webhooks/llamacloud"
      }
    ]
  },
  "regions": [
    {
      "location": "location",
      "region_type": "region_type",
      "sheet_name": "sheet_name",
      "description": "description",
      "region_id": "region_id",
      "title": "title"
    }
  ],
  "success": true,
  "worksheet_metadata": [
    {
      "sheet_name": "sheet_name",
      "description": "description",
      "title": "title"
    }
  ]
}

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/