Skip to content

Generate Extraction Schema

client.extract.generateSchema(ExtractGenerateSchemaParams { organization_id, project_id, data_schema, 3 more } params, RequestOptionsoptions?): ExtractGenerateSchemaResponse { name, parameters }
POST/api/v2/extract/schema/generate

Generate a JSON schema and return a product configuration request.

ParametersExpand Collapse
params: ExtractGenerateSchemaParams { organization_id, project_id, data_schema, 3 more }
organization_id?: string | null

Query param

formatuuid
project_id?: string | null

Query param

formatuuid
data_schema?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | string | null

Body param: Optional schema to validate, refine, or extend

Accepts one of the following:
Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>
Record<string, unknown>
Array<unknown>
string
number
boolean
string
file_id?: string | null

Body param: Optional file ID to analyze for schema generation

name?: string | null

Body param: Name for the generated configuration (auto-generated if omitted)

maxLength255
prompt?: string | null

Body param: Natural language description of the data structure to extract

ReturnsExpand Collapse
ExtractGenerateSchemaResponse { name, parameters }

Request body for creating a product configuration.

name: string

Human-readable name for this configuration.

maxLength255
minLength1
parameters: SplitV1Parameters { categories, product_type, splitting_strategy } | ExtractV2Parameters { extract_options, product_type, parse_config_id, parse_tier } | ClassifyV2Parameters { product_type, rules, mode } | ParseV2Parameters { product_type }

Product-specific configuration parameters.

Accepts one of the following:
SplitV1Parameters { categories, product_type, splitting_strategy }

Typed parameters for a split v1 product configuration.

categories: Array<SplitCategory { name, description } >

Categories to split documents into.

name: string

Name of the category.

maxLength200
minLength1
description?: string | null

Optional description of what content belongs in this category.

maxLength2000
minLength1
product_type: "split_v1"

Product type.

splitting_strategy?: SplittingStrategy { allow_uncategorized }

Strategy for splitting documents.

allow_uncategorized?: "include" | "forbid" | "omit"

Controls handling of pages that don't match any category. 'include': pages can be grouped as 'uncategorized' and included in results. 'forbid': all pages must be assigned to a defined category. 'omit': pages can be classified as 'uncategorized' but are excluded from results.

Accepts one of the following:
"include"
"forbid"
"omit"
ExtractV2Parameters { extract_options, product_type, parse_config_id, parse_tier }

Typed parameters for an extract v2 product configuration.

extract_options: ExtractOptions { data_schema, cite_sources, confidence_scores, 4 more }

Extract-specific configuration options including the data schema

data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>

JSON schema used for extraction

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
cite_sources?: boolean

Include citations in results

confidence_scores?: boolean

Include confidence scores in results

extract_version?: string

Extraction algorithm version to use (e.g., '2026-01-08', 'latest')

extraction_target?: "per_doc" | "per_page" | "per_table_row"

Extraction scope: per_doc, per_page, or per_table_row

Accepts one of the following:
"per_doc"
"per_page"
"per_table_row"
system_prompt?: string | null

Custom system prompt for extraction

tier?: "cost_effective" | "agentic"

Extraction tier: cost_effective (10 credits) or agentic (20 credits)

Accepts one of the following:
"cost_effective"
"agentic"
product_type: "extract_v2"

Product type.

parse_config_id?: string | null

Parse config ID used for extraction

parse_tier?: string | null

Parse tier to use for extraction (e.g. fast, cost_effective, agentic).

ClassifyV2Parameters { product_type, rules, mode }

Typed parameters for a classify v2 product configuration.

product_type: "classify_v2"

Product type.

rules: Array<Rule>

Classification rules to apply (at least one required)

description: string

Natural language description of what to classify

maxLength500
minLength10
type: string

Document type to assign when rule matches

maxLength50
minLength1
mode?: "FAST"

Classification execution mode

ParseV2Parameters { product_type }

Typed parameters for a parse v2 product configuration.

Parse configs have a flexible parameter set (tier, version, plus various parsing options), so extra fields are permitted.

product_type: "parse_v2"

Product type.

Generate Extraction Schema

import LlamaCloud from '@llamaindex/llama-cloud';

const client = new LlamaCloud({
  apiKey: process.env['LLAMA_CLOUD_API_KEY'], // This is the default and can be omitted
});

const response = await client.extract.generateSchema();

console.log(response.name);
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}
Returns Examples
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}