Recipes
Short, copy-pasteable Parse snippets for common workflows — retries, S3 uploads, webhooks, multi-language OCR, page ranges, pandas tables, multimodal screenshots.
Short, copy-pasteable snippets for the patterns Parse users hit most often. Each recipe is ~10-20 lines and answers a specific “how do I do X?” question. Drop them into your project and adapt as needed.
For full walk-throughs, see the Parse Examples tutorials. For the full API reference, see the REST API Guide.
All recipes assume:
from llama_cloud import LlamaCloudclient = LlamaCloud(api_key="llx-...") # or set LLAMA_CLOUD_API_KEY in your envParse a document end-to-end
Section titled “Parse a document end-to-end”The bare minimum: upload a file, parse it, print the markdown.
file = client.files.create(file="doc.pdf", purpose="parse")result = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", expand=["markdown"],)print(result.markdown.pages[0].markdown)Pin a version for production
Section titled “Pin a version for production”Use a dated version so model updates can’t change your output without your knowledge.
result = client.parsing.parse( file_id=file.id, tier="agentic", version="2026-04-06", # pin a specific date expand=["markdown"],)See Tiers → Versioning and reproducibility for the latest available dates.
Parse only specific pages
Section titled “Parse only specific pages”Skip irrelevant pages and save credits.
result = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", page_ranges={"target_pages": "1,3,5-10"}, # 1-indexed expand=["markdown"],)You can also cap the total with {"max_pages": 10}.
Crop headers and footers from every page
Section titled “Crop headers and footers from every page”Strip a fixed margin off the top and bottom of every page before parsing.
result = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", crop_box={"top": 0.08, "bottom": 0.08, "left": 0, "right": 0}, expand=["markdown"],)Values are page-height/width ratios (0.0–1.0).
Multi-language OCR
Section titled “Multi-language OCR”Hint the OCR engine for non-English documents.
result = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", processing_options={ "ocr_parameters": {"languages": ["en", "fr", "de"]}, }, expand=["markdown"],)Save money on long mixed-complexity documents
Section titled “Save money on long mixed-complexity documents”Enable Cost Optimizer to route simple pages to cost_effective automatically.
result = client.parsing.parse( file_id=file.id, tier="agentic_plus", version="latest", processing_options={ "cost_optimizer": {"enable": True}, }, expand=["markdown", "metadata"],)
# See which pages got the cheaper tierfor page in result.metadata.pages: flag = "cost-optimized" if page.cost_optimized else "premium" print(f"page {page.page_number}: {flag}")See Cost Optimizer.
Steer the parser with a custom prompt
Section titled “Steer the parser with a custom prompt”Tell the agentic model what to focus on.
result = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", agentic_options={ "custom_prompt": "This is a financial 10-K. Preserve currency symbols on every number and keep the original section hierarchy.", }, expand=["markdown"],)See Custom Prompt for prompt-engineering tips.
Extract a table into pandas
Section titled “Extract a table into pandas”Get structured items, walk for tables, load into a dataframe.
import ioimport pandas as pd
result = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", expand=["items"],)
# Find the first table on page 3 and load itpage = result.items.pages[2] # 0-indexed in the items treetable = next(item for item in page.items if getattr(item, "type", None) == "table")
df = pd.read_csv(io.StringIO(table.csv))print(df.head())For an end-to-end version with charts, see the Parse Charts in PDFs and Analyze with Pandas tutorial.
Get per-page screenshots for a multimodal pipeline
Section titled “Get per-page screenshots for a multimodal pipeline”Save full-page screenshots and download them via presigned URLs.
import requests
result = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", output_options={"images_to_save": ["screenshot"]}, expand=["markdown", "images_content_metadata"],)
# Markdown for the LLMmarkdown_blob = "\n\n".join(p.markdown for p in result.markdown.pages)
# Download every screenshotfor image in result.images_content_metadata.images: img_bytes = requests.get(image.presigned_url).content with open(image.filename, "wb") as f: f.write(img_bytes)Push results to a webhook instead of polling
Section titled “Push results to a webhook instead of polling”For long-running jobs, let Parse call you back when the job finishes.
result = client.parsing.parse( file_id=file.id, tier="agentic_plus", version="latest", webhook_configurations=[ { "webhook_url": "https://your-app.com/parse-callback", "webhook_headers": {"X-My-Auth": "secret"}, } ],)print(f"Job started: {result.job.id}")When the job finishes, Parse POSTs an event notification to your URL. See Webhook Configurations.
Retrieve results later (without re-parsing)
Section titled “Retrieve results later (without re-parsing)”Run the job once, fetch additional fields later by job_id.
# Step 1 — parse with a minimal expandresult = client.parsing.parse( file_id=file.id, tier="agentic", version="latest", expand=["markdown"],)job_id = result.job.id
# Step 2 — later, in another script, get the items tree for the same jobitems_result = client.parsing.get(job_id=job_id, expand=["items"])for page in items_result.items.pages: print(f"page {page.page_number}: {len(page.items)} items")See Retrieving Results for every legal expand value.
Retry a failing job on a smaller tier
Section titled “Retry a failing job on a smaller tier”If a parse job fails on agentic_plus (e.g. for an unusual layout), retry on agentic and inspect.
def parse_with_fallback(file_id): for tier in ["agentic_plus", "agentic", "cost_effective"]: try: return client.parsing.parse( file_id=file_id, tier=tier, version="latest", expand=["markdown"], ) except Exception as e: print(f"tier {tier} failed: {e}") raise RuntimeError("all tiers failed")
result = parse_with_fallback(file.id)Disable cache for a fresh parse
Section titled “Disable cache for a fresh parse”Skip cached results when you need a deterministic re-parse (e.g. after a tier version change).
result = client.parsing.parse( file_id=file.id, tier="agentic", version="2026-04-06", disable_cache=True, expand=["markdown"],)See Cache Control.
See also
Section titled “See also”- Parse Examples — full walk-throughs of these patterns
- Configuration Model — where every option lives in the request shape
- API reference: Parse File — full field-by-field listing of every option
- Retrieving Results — every legal
expandvalue