Guide

LlamaParse Platform Quickstart

Install the SDK, get an API key, and run your first call against Parse, Extract, Classify, Split, Sheets, or Index — all from one platform.

Build document agents powered by agentic OCR.

LlamaParse is the enterprise platform for turning documents into production AI pipelines. One API key, one SDK, and six composable products: Parse (agentic OCR), Extract (structured data), Classify, Split, Sheets, and Index.

pip install llama-cloud>=2.1

npm install @llamaindex/llama-cloud

Set your API key:

export LLAMA_CLOUD_API_KEY=llx-...

Get an API key from the LlamaCloud dashboard.

Which product do I want?

Map what you’re trying to do to the right product:

I want to…	Use
Turn PDFs, scans, or images into clean LLM-ready text	Parse
Pull structured JSON out of documents that matches my schema	Extract
Route documents into categories with natural-language rules	Classify
Split concatenated documents into their logical parts	Split
Work with spreadsheet-like data and reason over rows	Sheets
Build a hosted vector search pipeline for RAG	Index
New here? Start with Parse—it’s the foundation most pipelines build on. Or scroll down for a runnable snippet in every product below.

Quick Start

Agentic OCR and parsing for 130+ formats. Turn PDFs and scans into LLM-ready text—the foundation for document agents.

from llama_cloud import LlamaCloud

client = LlamaCloud()  # Uses LLAMA_CLOUD_API_KEY env var

# Upload and parse a document
file = client.files.create(file="document.pdf", purpose="parse")
result = client.parsing.parse(
    file_id=file.id,
    tier="agentic",
    version="latest",
    expand=["markdown"],
)

# Get markdown output
print(result.markdown.pages[0].markdown)

import LlamaCloud from '@llamaindex/llama-cloud';
import fs from 'fs';

const client = new LlamaCloud(); // Uses LLAMA_CLOUD_API_KEY env var

// Upload and parse a document
const file = await client.files.create({
  file: fs.createReadStream('document.pdf'),
  purpose: 'parse',
});
const result = await client.parsing.parse({
  file_id: file.id,
  tier: 'agentic',
  version: 'latest',
  expand: ['markdown']
});

// Get markdown output
console.log(result.markdown.pages[0].markdown);

Full Guide | Examples | Tiers & Pricing

Structured data from documents with custom schemas. Feed agents with clean entities, tables, and fields.

from pydantic import BaseModel, Field
from llama_cloud import LlamaCloud

# Define your schema
class Resume(BaseModel):
    name: str = Field(description="Full name of candidate")
    email: str = Field(description="Email address")
    skills: list[str] = Field(description="Technical skills")

client = LlamaCloud()

# Upload and extract
file = client.files.create(file="resume.pdf", purpose="extract")
job = client.extract.create(
    document_input_value=file.id,
    config={
        "extract_options": {
            "data_schema": Resume.model_json_schema(),
            "tier": "agentic",
        },
    },
)
print(job.extract_result)

import LlamaCloud from '@llamaindex/llama-cloud';
import { z } from 'zod';
import fs from 'fs';

// Define your schema with Zod
const ResumeSchema = z.object({
  name: z.string().describe('Full name of candidate'),
  email: z.string().describe('Email address'),
  skills: z.array(z.string()).describe('Technical skills'),
});

const client = new LlamaCloud();

// Upload and extract
const file = await client.files.create({
  file: fs.createReadStream('resume.pdf'),
  purpose: 'extract',
});
let job = await client.extract.create({
  document_input_value: file.id,
  config: {
    extract_options: {
      data_schema: ResumeSchema,
      tier: 'agentic',
    },
  },
});
console.log(job.extract_result);

Full Guide | Examples | Schema Design

Categorize documents with natural-language rules. Pre-processing for extraction, parsing, or indexing.

from llama_cloud import LlamaCloud

client = LlamaCloud()

# Upload a document
file = client.files.create(file="document.pdf", purpose="classify")

# Classify with natural language rules
result = client.classifier.classify(
    file_ids=[file.id],
    rules=[
        {
            "type": "invoice",
            "description": "Documents with invoice numbers, line items, and totals"
        },
        {
            "type": "receipt",
            "description": "Short POS receipts with merchant and total"
        },
        {
            "type": "contract",
            "description": "Legal agreements with terms and signatures"
        },
    ],
    mode="FAST",  # or "MULTIMODAL" for visual docs
)

for item in result.items:
    print(f"Type: {item.result.type}, Confidence: {item.result.confidence}")

import LlamaCloud from '@llamaindex/llama-cloud';
import fs from 'fs';

const client = new LlamaCloud();

// Upload a document
const file = await client.files.create({
  file: fs.createReadStream('document.pdf'),
  purpose: 'classify',
});

// Classify with natural language rules
const result = await client.classifier.classify({
  file_ids: [file.id],
  rules: [
    {
      type: 'invoice',
      description: 'Documents with invoice numbers, line items, and totals',
    },
    {
      type: 'receipt',
      description: 'Short POS receipts with merchant and total',
    },
    {
      type: 'contract',
      description: 'Legal agreements with terms and signatures',
    },
  ],
  mode: 'FAST', // or 'MULTIMODAL' for visual docs
});

for (const item of result.items) {
  if (item.result) {
    console.log(`Type: ${item.result.type}, Confidence: ${item.result.confidence}`);
  }
}

Full Guide | Examples

Segment concatenated PDFs into logical sections. AI-powered classification to split combined documents.

from llama_cloud import LlamaCloud

client = LlamaCloud()

# Upload a combined PDF
file = client.files.create(file="combined.pdf", purpose="split")

# Split into logical sections
result = await client.beta.split.split(
    categories=[
        {
            "name": "invoice",
            "description": "Commercial document with line items and totals"
        },
        {
            "name": "contract",
            "description": "Legal agreement with terms and signatures"
        },
    ],
    document_input={"type": "file_id", "value": file.id},
)

for segment in result.result.segments:
    print(f"Pages {segment.pages}: {segment.category} ({segment.confidence_category})")

import LlamaCloud from '@llamaindex/llama-cloud';
import fs from 'fs';

const client = new LlamaCloud();

// Upload a combined PDF
const file = await client.files.create({
  file: fs.createReadStream('combined.pdf'),
  purpose: 'split',
});

// Split into logical sections
const result = await client.beta.split.split({
  categories: [
    {
      name: 'invoice',
      description: 'Commercial document with line items and totals',
    },
    {
      name: 'contract',
      description: 'Legal agreement with terms and signatures',
    },
  ],
  document_input: { type: 'file_id', value: file.id },
});

for (const segment of result.result.segments) {
  console.log(`Pages ${segment.pages}: ${segment.category} (${segment.confidence_category})`);
}

Full Guide | Examples

Extract tables and metadata from messy spreadsheets. Output as Parquet files with rich cell metadata.

from llama_cloud import LlamaCloud

client = LlamaCloud()

# Upload a spreadsheet
file = client.files.create(file="spreadsheet.xlsx", purpose="parse")

# Extract tables and regions
result = client.beta.sheets.parse(
    file_id=file.id,
    config={"generate_additional_metadata": True},
)

# Print extracted regions
print(f"Found {len(result.regions)} regions")
for region in result.regions:
    print(f"  - {region.region_id}: {region.title} ({region.location})")

import LlamaCloud from '@llamaindex/llama-cloud';
import fs from 'fs';

const client = new LlamaCloud();

// Upload a spreadsheet
const file = await client.files.create({
  file: fs.createReadStream('spreadsheet.xlsx'),
  purpose: 'parse',
});

// Extract tables and regions
const result = await client.beta.sheets.parse({
  file_id: file.id,
  config: { generate_additional_metadata: true },
});

// Print extracted regions
console.log(`Found ${result.regions?.length || 0} regions`);
for (const region of result.regions || []) {
  console.log(`  - ${region.region_id}: ${region.title} (${region.location})`);
}

Full Guide | Examples

Ingest, chunk, and embed into searchable indexes. Power RAG and retrieval for document agents. Index is designed for UI-first setup with SDK integration. Start in the LlamaCloud dashboard to create your index, then integrate:

from llama_cloud import LlamaCloud

client = LlamaCloud()  # Uses LLAMA_CLOUD_API_KEY env var

# Retrieve relevant nodes from the index
results = client.pipelines.retrieve(
    pipeline_id="your-pipeline-id",
    query="Your query here",
    # -- Customize search behavior --
    # dense_similarity_top_k=20,
    # sparse_similarity_top_k=20,
    # alpha=0.5,
    # -- Control reranking behavior --
    # enable_reranking=True,
    # rerank_top_n=5,
)

for n in results.retrieval_nodes:
    print(f"Score: {n.score}, Text: {n.node.text}")

import LlamaCloud from '@llamaindex/llama-cloud';

const client = new LlamaCloud(); // Uses LLAMA_CLOUD_API_KEY env var

// Retrieve relevant nodes from the index
const results = await client.pipelines.retrieve('your-pipeline-id', {
  query: 'Your query here',
  // -- Customize search behavior --
  // dense_similarity_top_k: 20,
  // sparse_similarity_top_k: 20,
  // alpha: 0.5,
  // -- Control reranking behavior --
  // enable_reranking: true,
  // rerank_top_n: 5,
});

for (const node of results.retrieval_nodes || []) {
  console.log(`Score: ${node.score}, Text: ${node.node?.text}`);
}

Full Guide | Examples

LlamaParse Agent Skills

Download Skills

Available Skills

llamaparse: Advanced parsing for PDFs, docs, presentations and images (charts, tables, embedded visuals). Requires LLAMA_CLOUD_API_KEY and Node 18+.
liteparse: Local-first, fast parsing for text-dense PDFs and docs. No API key needed, requires @llamaindex/liteparse globally installed and Node 18+.

Installation

You can install LlamaParse Agent Skills using the skills CLI:

npx skills add run-llama/llamaparse-agent-skills

Or, if you wish to download only one skill:

npx skills add run-llama/llamaparse-agent-skills --skill llamaparse # or the name of another skill

You can also download the skills folder in .zip format from GitHub Releases.

Resources

Python SDK pip install llama-cloud

TypeScript SDK npm install @llamaindex/llama-cloud

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/