Skip to content

Generate Extraction Schema

POST/api/v2/extract/schema/generate

Generate a JSON schema and return a product configuration request.

Query ParametersExpand Collapse
organization_id: optional string
project_id: optional string
Cookie ParametersExpand Collapse
session: optional string
Body ParametersJSONExpand Collapse
data_schema: optional map[map[unknown] or array of unknown or string or 2 more] or string

Optional schema to validate, refine, or extend

Accepts one of the following:
UnionMember0 = map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
UnionMember1 = string
file_id: optional string

Optional file ID to analyze for schema generation

name: optional string

Name for the generated configuration (auto-generated if omitted)

maxLength255
prompt: optional string

Natural language description of the data structure to extract

ReturnsExpand Collapse
name: string

Human-readable name for this configuration.

maxLength255
minLength1
parameters: object { categories, product_type, splitting_strategy } or object { extract_options, product_type, parse_config_id, parse_tier } or object { product_type, rules, mode } or object { product_type }

Product-specific configuration parameters.

Accepts one of the following:
SplitV1 = object { categories, product_type, splitting_strategy }

Typed parameters for a split v1 product configuration.

categories: array of SplitCategory { name, description }

Categories to split documents into.

name: string

Name of the category.

maxLength200
minLength1
description: optional string

Optional description of what content belongs in this category.

maxLength2000
minLength1
product_type: "split_v1"

Product type.

splitting_strategy: optional object { allow_uncategorized }

Strategy for splitting documents.

allow_uncategorized: optional "include" or "forbid" or "omit"

Controls handling of pages that don't match any category. 'include': pages can be grouped as 'uncategorized' and included in results. 'forbid': all pages must be assigned to a defined category. 'omit': pages can be classified as 'uncategorized' but are excluded from results.

Accepts one of the following:
"include"
"forbid"
"omit"
ExtractV2 = object { extract_options, product_type, parse_config_id, parse_tier }

Typed parameters for an extract v2 product configuration.

extract_options: ExtractOptions { data_schema, cite_sources, confidence_scores, 4 more }

Extract-specific configuration options including the data schema

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON schema used for extraction

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extract_version: optional string

Extraction algorithm version to use (e.g., '2026-01-08', 'latest')

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Extraction scope: per_doc, per_page, or per_table_row

Accepts one of the following:
"per_doc"
"per_page"
"per_table_row"
system_prompt: optional string

Custom system prompt for extraction

tier: optional "cost_effective" or "agentic"

Extraction tier: cost_effective (10 credits) or agentic (20 credits)

Accepts one of the following:
"cost_effective"
"agentic"
product_type: "extract_v2"

Product type.

parse_config_id: optional string

Parse config ID used for extraction

parse_tier: optional string

Parse tier to use for extraction (e.g. fast, cost_effective, agentic).

ClassifyV2 = object { product_type, rules, mode }

Typed parameters for a classify v2 product configuration.

product_type: "classify_v2"

Product type.

rules: array of object { description, type }

Classification rules to apply (at least one required)

description: string

Natural language description of what to classify

maxLength500
minLength10
type: string

Document type to assign when rule matches

maxLength50
minLength1
mode: optional "FAST"

Classification execution mode

ParseV2 = object { product_type }

Typed parameters for a parse v2 product configuration.

Parse configs have a flexible parameter set (tier, version, plus various parsing options), so extra fields are permitted.

product_type: "parse_v2"

Product type.

Generate Extraction Schema

curl https://api.cloud.llamaindex.ai/api/v2/extract/schema/generate \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
    -d '{}'
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}
Returns Examples
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}