Skip to content

Overview of Extract

Introduction to LlamaExtract, a tool for extracting structured data from unstructured documents, available as a web UI, Python SDK, and REST API.

LlamaExtract provides a simple API for extracting structured data from unstructured documents like PDFs, text files, and images.

LlamaExtract is available as a web UI, Python SDK and REST API.

LlamaExtract is a great fit for when you need:

  • Well-typed data for downstream tasks: You want to extract data from documents and use it for downstream tasks like training a model, building a dashboard, entering into a database, etc. LlamaExtract guarantees that your data complies with the provided schema or provides helpful error messages when it doesn’t.
  • Accurate data extraction: We use the best in class LLM models to extract data from your documents.
  • Iterative schema development: You want to quickly iterate on your schema and get feedback on how well it works on your sample documents. Do you need to provide more examples to extract a certain field? Do you need to make a certain field optional?
  • Support for multiple file types: LlamaExtract supports a wide range of file types, including PDFs, text files, and images. Let us know if you need support for another file type!

The simplest way to try out LlamaExtract is to use the web UI.

Just define your Extraction Configuration (schema and settings), drag and drop any supported document into LlamaParse and extract data from your documents.

Extraction Results

Once you’re ready to start coding, get an API key to use LlamaExtract with the Python SDK.

We have a library available for Python and Typescript. This is the recommended way to use LlamaExtract for running extraction jobs at scale. Check out the SDK quick start to get started.

If you are using a language other than Python, you can use the REST API.

LlamaExtract offers three primary tiers in the UI:

  • Cost Effective – best when you want lower cost and higher throughput for simpler extraction tasks.
  • Agentic – recommended default tier that balances quality, speed, and cost for most real‑world documents.
  • Agentic Plus (coming soon) – high‑fidelity tier for very complex or high‑stakes extractions.

LlamaExtract now runs on the v2 APIs by default. If you need to use the legacy Extract v1 experience, see Using Extract v1 below.

When you create v2 extract jobs or saved extract configurations, set configuration.version to control the extract algorithm version.

For production workflows, pin version to the date you create or update the configuration in YYYY-MM-DD format, for example 2026-03-31. A date pin resolves to the most recent available extract version for the selected tier at or before that date, so later releases do not silently change existing behavior. Use latest when you want jobs to automatically pick up the newest extract version.

When using the SDK or REST API directly, V2 decouples parse and extract tiers. Here is how V2 configurations map to V1 equivalents:

V2 extract tierV2 parse tierV1 equivalent (extraction_mode)
cost_effectivefastFAST
agenticagenticMULTIMODAL
agenticagentic_plusPREMIUM

LlamaExtract v2 is the default and recommended experience. If you need to use the legacy Extract v1:

  • Web UI: Open the main LlamaCloud UI, go to Settings → General, and enable the Extract v1 toggle for your workspace.
  • Python SDK: Use the llama-cloud-services package (shown as the “Python (legacy)” tab in our SDK examples). See the SDK page for details.
  • REST API: The v1 endpoints are documented on the REST API (v1 Legacy) page.
  • Migration help: Use the Extract v1 → v2 migration guide for API/SDK/UI mapping.

Extract v1 is legacy and may be deprecated in the future. We recommend migrating to v2 for new projects.

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/