---
title: LlamaExtract Core Concepts | LlamaIndex OSS Documentation
description: Overview of the core concepts in LlamaExtract, including data schemas, extraction targets, extraction jobs, and extraction runs.
---

LlamaExtract is designed to be a flexible and scalable extraction platform. At the core of the platform are the following concepts:

- **Extraction Configurations**: Reusable settings including schema, tier, and extraction options.
- **Data Schema**: Structured definition for the data you want to extract in JSON/Pydantic format. See detailed explanation below.
- **Extraction Target**: Defines the scope of extraction and how your schema is applied to documents. See detailed explanation below.
- **Extraction Jobs**: Asynchronous tasks that extract structured data from documents using a configuration.
- **Extraction Runs**: The results of an extraction job including the extracted data and other metadata.

## Data Schema

The **Data Schema** defines the structure of the data you want to extract from your documents. It is a JSON Schema that specifies the fields, types, and descriptions for the information you need.

While the schema is fundamentally a JSON Schema (supporting a subset of the full JSON Schema specification), our Python SDK allows you to use Pydantic models for a more Pythonic experience with type validation and IDE support.

### How to define your schema

A schema is made of **fields**. Each field has a **name** (the key in the output) and a **type** (string, number, boolean, array, object, etc.). You can also give each field a **description**.

- **Field names** — Use clear, stable names that match how you’ll use the data (e.g. `invoice_number`, `vendor_name`). These become the keys in the extracted JSON.
- **Field descriptions** — Descriptions are **additional context for the underlying LLM**. They are not only for documentation: the model uses them to decide what to extract. Use descriptions to guide the model on what the value for the field should be—for example, what the field means, where it usually appears in the document, acceptable formats, or examples. Better descriptions typically lead to more accurate and consistent extraction.

Note

**Learn more:** [Schema Design and Restrictions](../schema_design) — Best practices, patterns, and limitations for creating effective schemas.

## Extraction Target

The **Extraction Target** determines how your schema is applied to the document and what granularity of results you receive. This is an important configuration option as it fundamentally changes how data is extracted.

![Extraction Target Visualization](/_astro/extraction_target.CtgbuO79_Z2g7H9w.png)

|                       | per\_doc (Default)                                                                | per\_page                                                                                                                                                                                | per\_table\_row                                                                                                                                                                                                                                                                                                                                                                                             |
| --------------------- | --------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **When to Use**       | Default mode for extracting data from the full document based on your JSON schema | Each page independently contains information about a different entity (e.g., each page contains financial information about a different portfolio company)                               | Document contains an ordered list of entities (in tables, bulleted/numbered lists, or separated by headers) and you want to extract the same information for each entity                                                                                                                                                                                                                                    |
| **How It Works**      | Schema is applied to the entire document as a single unit                         | Schema is applied independently to each page of the document                                                                                                                             | Schema is applied to each identified entity in the document. LlamaExtract automatically detects formatting patterns that distinguish entities (table rows, list items, section headers, etc.)                                                                                                                                                                                                               |
| **Returns**           | A single JSON object matching your schema                                         | An array of JSON objects, one per page, each matching your schema                                                                                                                        | An array of JSON objects, one per entity/row, each matching your schema                                                                                                                                                                                                                                                                                                                                     |
| **Example Use Cases** | Extracting summary information from a contract, annual report, or research paper  | Multi-page forms where each page represents a different entity, or a document with one record per page                                                                                   | - Invoice line items (each row is a product/service)
- Employee lists or directories
- Purchase orders with multiple items
- Any document with repeating structured entities                                                                                                                                                                                                                                |
| **Important Notes**   | -                                                                                 | Your schema should describe a single entity/page, not a list. Don’t use `extracted_result: list[template]`, instead provide the template directly that will be applied at the page level | * Your schema should describe a single entity, not a list. Don’t use `extracted_result: list[template]`, instead provide the template directly that will be applied at the entity level
* The document must have some formatting or structure that distinguishes the different entities (table formatting, bullets, numbering, headers, etc.)
* Entities should appear in an ordered manner in the document |
