Skip to content

Generate Extraction Schema

extract.generate_schema(ExtractGenerateSchemaParams**kwargs) -> ExtractGenerateSchemaResponse
POST/api/v2/extract/schema/generate

Generate a JSON schema and return a product configuration request.

ParametersExpand Collapse
organization_id: Optional[str]
project_id: Optional[str]
data_schema: Optional[Union[Dict[str, Union[Dict[str, object], Iterable[object], str, 3 more]], str, null]]

Optional schema to validate, refine, or extend

Accepts one of the following:
Dict[str, Union[Dict[str, object], Iterable[object], str, 3 more]]
Accepts one of the following:
Dict[str, object]
Iterable[object]
str
float
bool
str
file_id: Optional[str]

Optional file ID to analyze for schema generation

name: Optional[str]

Name for the generated configuration (auto-generated if omitted)

maxLength255
prompt: Optional[str]

Natural language description of the data structure to extract

ReturnsExpand Collapse
class ExtractGenerateSchemaResponse:

Request body for creating a product configuration.

name: str

Human-readable name for this configuration.

maxLength255
minLength1
parameters: Parameters

Product-specific configuration parameters.

Accepts one of the following:
class ParametersSplitV1Parameters:

Typed parameters for a split v1 product configuration.

categories: List[SplitCategory]

Categories to split documents into.

name: str

Name of the category.

maxLength200
minLength1
description: Optional[str]

Optional description of what content belongs in this category.

maxLength2000
minLength1
product_type: Literal["split_v1"]

Product type.

splitting_strategy: Optional[ParametersSplitV1ParametersSplittingStrategy]

Strategy for splitting documents.

allow_uncategorized: Optional[Literal["include", "forbid", "omit"]]

Controls handling of pages that don't match any category. 'include': pages can be grouped as 'uncategorized' and included in results. 'forbid': all pages must be assigned to a defined category. 'omit': pages can be classified as 'uncategorized' but are excluded from results.

Accepts one of the following:
"include"
"forbid"
"omit"
class ParametersExtractV2Parameters:

Typed parameters for an extract v2 product configuration.

extract_options: ExtractOptions

Extract-specific configuration options including the data schema

data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]

JSON schema used for extraction

Accepts one of the following:
Dict[str, object]
List[object]
str
float
bool
cite_sources: Optional[bool]

Include citations in results

confidence_scores: Optional[bool]

Include confidence scores in results

extract_version: Optional[str]

Extraction algorithm version to use (e.g., '2026-01-08', 'latest')

extraction_target: Optional[Literal["per_doc", "per_page", "per_table_row"]]

Extraction scope: per_doc, per_page, or per_table_row

Accepts one of the following:
"per_doc"
"per_page"
"per_table_row"
system_prompt: Optional[str]

Custom system prompt for extraction

tier: Optional[Literal["cost_effective", "agentic"]]

Extraction tier: cost_effective (10 credits) or agentic (20 credits)

Accepts one of the following:
"cost_effective"
"agentic"
product_type: Literal["extract_v2"]

Product type.

parse_config_id: Optional[str]

Parse config ID used for extraction

parse_tier: Optional[str]

Parse tier to use for extraction (e.g. fast, cost_effective, agentic).

class ParametersClassifyV2Parameters:

Typed parameters for a classify v2 product configuration.

product_type: Literal["classify_v2"]

Product type.

rules: List[ParametersClassifyV2ParametersRule]

Classification rules to apply (at least one required)

description: str

Natural language description of what to classify

maxLength500
minLength10
type: str

Document type to assign when rule matches

maxLength50
minLength1
mode: Optional[Literal["FAST"]]

Classification execution mode

class ParametersParseV2Parameters:

Typed parameters for a parse v2 product configuration.

Parse configs have a flexible parameter set (tier, version, plus various parsing options), so extra fields are permitted.

product_type: Literal["parse_v2"]

Product type.

Generate Extraction Schema

import os
from llama_cloud import LlamaCloud

client = LlamaCloud(
    api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),  # This is the default and can be omitted
)
response = client.extract.generate_schema()
print(response.name)
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}
Returns Examples
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}