Skip to content
Guide
Parse
Guides

Retrieving Results

Use the expand parameter to control what comes back from a parse job — text, markdown, items, metadata, full files, or presigned download URLs.

expand controls what comes back from a parse job. By default, the API returns only job metadata (status, ID, error messages) — no parsed content. You add expand values to opt in to the data you actually want.

This page explains how expand works, lists every legal value, and shows the common patterns most callers use.

These return parsed data directly in the response body — no extra download step.

ValueWhat you getTier limits
markdownMarkdown per page (preserves headings, tables, structure)Not on fast
markdown_fullFull markdown for the whole document as a single stringNot on fast
textPlain text per pageAll tiers
text_fullFull plain text for the whole document as a single stringAll tiers
itemsStructured items tree per page (tables, headings, figures, etc.)Not on fast
metadataPer-page metadata (confidence, speaker notes, cost-optimizer flags)All tiers
job_metadataJob-level processing details (timing, configuration echo)All tiers

These return presigned S3 download URLs instead of the content itself. Use when the result is large and you want to defer or stream the download.

ValueWhat you getTier limits
markdown_content_metadataDownload URL for .mdNot on fast
markdown_full_content_metadataDownload URL for the full markdown fileNot on fast
text_content_metadataDownload URL for .txtAll tiers
text_full_content_metadataDownload URL for the full plain text fileAll tiers
items_content_metadataDownload URL for items .jsonNot on fast
metadata_content_metadataDownload URL for metadata .jsonAll tiers
images_content_metadataList of all extracted images with per-image download URLsRequires images_to_save to be set
xlsx_content_metadataDownload URL for the tables-as-spreadsheet .xlsxRequires tables_as_spreadsheet.enable: true
output_pdf_content_metadataDownload URL for the rendered output PDFAll tiers

The difference between these two groups is explained in detail below in Content fields vs. metadata fields.

The simplest pattern. Most LLM pipelines only need this.

result = client.parsing.parse(
file_id=file.id,
tier="agentic",
version="latest",
expand=["markdown"],
)
for page in result.markdown.pages:
print(page.markdown)

For pipelines that need programmatic access to tables, headings, and figures. Items already include an md field per element, so you can build LLM input from items alone — but requesting markdown alongside gives you a ready-made per-page markdown string without having to reassemble it from items.

result = client.parsing.parse(
file_id=file.id,
tier="agentic",
version="latest",
expand=["markdown", "items"],
)
# Per-page markdown for the LLM (or use markdown_full for a single string)
llm_input = "\n\n".join(p.markdown for p in result.markdown.pages)
# Walk the items tree for tables
for page in result.items.pages:
for item in page.items:
if hasattr(item, "type") and item.type == "table":
print(f"Table on page {page.page_number}: {len(item.rows)} rows")

For multimodal pipelines that want one big markdown blob and the page screenshots as separate downloadable images.

result = client.parsing.parse(
file_id=file.id,
tier="agentic_plus",
version="latest",
output_options={"images_to_save": ["screenshot"]},
expand=["markdown_full", "images_content_metadata"],
)
# One big markdown blob for the LLM
print(result.markdown_full)
# Each page screenshot as a downloadable URL
for image in result.images_content_metadata.images:
print(f"{image.filename}: {image.presigned_url}")

You don’t have to ask for everything you might ever need at parse time. You can run a parse job with one set of expand values and then call client.parsing.get(job_id, expand=...) later with a different set — without re-running the job.

This is the right pattern when:

  • You don’t know yet what you need. Run with expand=["markdown"], decide later you also want the items tree, fetch with expand=["items"] against the same job ID. Cheaper than re-parsing.
  • You’re working with large files. Use expand=["markdown_content_metadata"] first to check size_bytes and decide whether to download.
  • You’re resuming work on a previously-parsed job. Job results are persisted; pass the old job_id and an expand list to retrieve any field that was generated at parse time.
from llama_cloud import LlamaCloud
client = LlamaCloud(api_key="llx-...")
# Step 1: parse with a minimal expand
result = client.parsing.parse(
upload_file="doc.pdf",
tier="cost_effective",
version="latest",
expand=["markdown_full"],
)
print(result.job.status)
print(result.markdown_full)
# Step 2: later, retrieve the text representation of the same job
text_result = client.parsing.get(
job_id=result.job.id,
expand=["text_full"],
)
print(text_result.text_full)

Every expand value falls into one of two flavors. Knowing the difference saves you from accidentally downloading the same megabyte of markdown twice.

Content fields return the actual parsed data in the API response. Use these when you need immediate access to the parsed content and the document is reasonably sized.

FieldDescriptionBest For
textPlain text extraction from the document per pageSimple text analysis, search indexing
markdownMarkdown-formatted content with structure per pageDisplay, documentation, LLM prompts
itemsStructured JSON with typed elements (tables, headings, etc.) per pageProgrammatic processing, data extraction
metadataPage-level metadata (confidence scores, speaker notes, etc.)Quality assessment, presentation data
job_metadataJob-level usage and processing detailsTracking credit usage, debugging job-level behavior
markdown_fullFull raw markdown file content (output.md)Complete markdown output without pagination
text_fullFull plain text file contentComplete text output without pagination

Example response with content:

{
"job": {
"id": "pjb-123",
"status": "COMPLETED"
},
"text": {
"pages": [
{
"page_number": 1,
"text": "This is the extracted text from page 1..."
}
]
},
"markdown": {
"pages": [
{
"page_number": 1,
"markdown": "# Heading\n\nThis is markdown content..."
}
]
}
}

Metadata fields return presigned download URLs instead of the actual content. The result file is stored in S3 and you fetch it with a follow-up HTTP request. Use these when:

  • The result is large and you want to defer the download
  • You want to stream the file directly to your own storage
  • You only need to know whether the file exists and how big it is, not its contents
FieldDescriptionDownload Format
text_content_metadataDownload URL for plain text file.txt file
markdown_content_metadataDownload URL for markdown file.md file
items_content_metadataDownload URL for structured JSON.json file
metadata_content_metadataDownload URL for metadata file.json file
xlsx_content_metadataDownload URL for spreadsheet export.xlsx file
output_pdf_content_metadataDownload URL for output PDF.pdf file
images_content_metadataList of download URLs for extracted images.png, .jpg files
markdown_full_content_metadataDownload URL for full raw markdown.md file
text_full_content_metadataDownload URL for full plain text.txt file

Example response with metadata:

{
"job": {
"id": "pjb-123",
"status": "COMPLETED"
},
"result_content_metadata": {
"md": {
"size_bytes": 45678,
"exists": true,
"presigned_url": "https://s3.amazonaws.com/bucket/path/output.md?signature=..."
},
"fullText": {
"size_bytes": 23456,
"exists": true,
"presigned_url": "https://s3.amazonaws.com/bucket/path/output.txt?signature=..."
}
}
}

The images_content_metadata field is special — it returns a structured list of all extracted images with individual download URLs:

{
"job": { ... },
"images_content_metadata": {
"total_count": 3,
"images": [
{
"index": 0,
"filename": "image_0.png",
"content_type": "image/png",
"size_bytes": 12345,
"presigned_url": "https://s3.amazonaws.com/..."
},
{
"index": 1,
"filename": "image_1.jpg",
"content_type": "image/jpeg",
"size_bytes": 23456,
"presigned_url": "https://s3.amazonaws.com/..."
}
]
}
}

You can filter to specific images using the image_filenames parameter:

result = client.parsing.get(
job_id="job-123",
expand=["images_content_metadata"],
image_filenames=["image_0.png", "image_5.jpg"]
)

All available expand values (detailed reference)

Section titled “All available expand values (detailed reference)”

Returns the plain text extraction from each page.

Structure:

{
"text": {
"pages": [
{
"page_number": 1,
"text": "Extracted plain text content..."
}
]
}
}

Returns the markdown-formatted content from each page with preserved structure.

Important: Not available for the fast tier.

Structure:

{
"markdown": {
"pages": [
{
"page_number": 1,
"success": true,
"markdown": "# Heading\n\n## Subheading\n\nContent with **formatting**...",
"header": "Page header",
"footer": "LlamaIndex 2026"
}
]
}
}

Returns structured JSON with typed elements (tables, headings, lists, code blocks, etc.).

Important: Not available for the fast tier.

Structure:

{
"items": {
"pages": [
{
"page_number": 1,
"page_width": 612.0,
"page_height": 792.0,
"items": [
{
"type": "heading",
"level": 1,
"value": "Document Title",
"md": "# Document Title"
},
{
"type": "table",
"rows": [["Header1", "Header2"], ["Row1", "Data1"]],
"html": "<table>...</table>",
"csv": "Header1,Header2\nRow1,Data1",
"md": "| Header1 | Header2 |\n|---------|---------|..."
}
],
"success": true
}
]
}
}

Returns page-level metadata including confidence scores, speaker notes, and document-specific information.

Structure:

{
"metadata": {
"pages": [
{
"page_number": 1,
"confidence": 0.95,
"speaker_notes": "Notes from presentation slide",
"slide_section_name": "Introduction",
"printed_page_number": "i",
"original_orientation_angle": 0,
"cost_optimized": false,
"triggered_auto_mode": false
}
],
"document": {
"XRBIData": "XBRL metadata for financial documents"
}
}
}

Returns job-level metadata: usage and processing details for the parse job as a whole, separate from per-page metadata. Useful for tracking credit cost, job timing, and the configuration the job actually ran with.

Returns the complete raw markdown output as a single string (the full output.md file).

Structure:

{
"markdown_full": "# Complete Document\n\n## Chapter 1\n\nContent...\n\n---\n\n## Chapter 2..."
}

Returns the complete raw plain text output as a single string (the full output.txt file).

All metadata fields return information about result files stored in S3, including presigned download URLs:

{
"result_content_metadata": {
"md": {
"size_bytes": 45678,
"exists": true,
"presigned_url": "https://s3.amazonaws.com/..."
},
"fullText": {
"size_bytes": 23456,
"exists": true,
"presigned_url": "https://s3.amazonaws.com/..."
},
"json": {
"size_bytes": 67890,
"exists": true,
"presigned_url": "https://s3.amazonaws.com/..."
}
}
}

This includes:

  • text_content_metadata
  • markdown_content_metadata
  • items_content_metadata
  • metadata_content_metadata
  • xlsx_content_metadata
  • output_pdf_content_metadata
  • images_content_metadata See above
  • markdown_full_content_metadata
  • text_full_content_metadata

The fast tier does not support markdown output. Requesting markdown, items, markdown_content_metadata, or items_content_metadata with a fast-tier job returns a validation error:

{
"detail": "Markdown expansion is not available for FAST tier jobs."
}

If you need any of those, use cost_effective or higher. See Tiers for the full Fast-tier limitations.

Presigned URLs expire after a limited time. Download files promptly after retrieving them, or call client.parsing.get() again with the same expand to get fresh URLs.

  • Tiersexpand values are gated by which tier you parsed with
  • Configuration Modelexpand is a top-level key, not nested in output_options
  • Output Options — what you can ask Parse to generate (which then becomes available via expand), including image assets, tables-as-spreadsheet, and printed page numbers