Skip to content

Migration Guide: Parse Upload Endpoint v1 to v2

This guide will help you migrate from the v1 Parse upload endpoint to the new v2 endpoint, which introduces a structured configuration approach and improved organization of parsing options.

The v2 endpoint replaces individual form parameters with a single JSON configuration string, providing:

  • Better organization: Related options are grouped into logical sections
  • Type safety: Structured validation with clear schemas
  • Extensibility: Easier to add new features without endpoint bloat
  • Validation: Better error messages and configuration validation
POST /api/v1/parsing/upload
Content-Type: multipart/form-data
- 70+ individual form parameters
- Flat parameter structure
- All parameters available regardless of parse mode
POST /api/v2/parse (file by ID or URL)
POST /api/v2/parse/upload (multipart file upload)
- Two endpoints: JSON for file ID/URL, multipart for file uploads
- Single 'configuration' JSON parameter per endpoint
- Hierarchical, structured configuration
- Always-enabled optimizations
- Strict validation with clear error messages

Before (v1):

POST https://api.cloud.llamaindex.ai/api/v1/parsing/upload

After (v2): Choose the appropriate endpoint based on your input method:

Terminal window
# For parsing existing files by ID or from URLs (recommended)
POST https://api.cloud.llamaindex.ai/api/v2/parse
# For multipart file uploads
POST https://api.cloud.llamaindex.ai/api/v2/parse/upload

2. Choose the Appropriate Endpoint and Configuration

Section titled “2. Choose the Appropriate Endpoint and Configuration”

v2 provides two endpoints for different input methods. Choose the one that matches how you’re providing the document:

For parsing an already uploaded file by ID or a document from a URL, use /parse with either file_id or source_url in the request body. Exactly one must be provided.

For traditional file uploads, use /parse/upload with multipart form data and a configuration parameter.

3. Replace Form Parameters with Configuration JSON

Section titled “3. Replace Form Parameters with Configuration JSON”

The configuration approach depends on your chosen endpoint:

  • /parse: Uses JSON request body with either file_id or source_url (plus optional http_proxy) and configuration fields
  • /parse/upload: Uses multipart form data with file and configuration parameters

Before migrating, review this checklist:

  • Choose the right endpoint: Select /parse for file ID or URL, /parse/upload for multipart uploads
  • Update request format: Change from form parameters to endpoint-specific configuration
  • Replace parse modes with tiers: Use tier instead of parse_mode (fast, cost_effective, agentic, agentic_plus)
  • Remove model selection: Models are now automatically selected based on tier
  • Move custom prompts: Custom prompts are now under agentic_options.custom_prompt (only for agentic tiers)
  • Remove external provider configs: Azure OpenAI and external API keys are no longer supported
  • Check for always-enabled parameters: high_res_ocr is always enabled in v2
  • Check for table heuristic parameters: adaptive_long_table and outlined_table_extraction are enabled by default in v2. They can be disabled with processing_options.disable_heuristics
  • Move specialized chart parsing parameters: specialized_chart_parsing_agentic, specialized_chart_parsing_plus, specialized_chart_parsing_efficient are now under processing_options.specialized_chart_parsing
  • Update page indexing: Change target_pages from 0-based to 1-based indexing
  • Move language parameter: Move language to processing_options.ocr_parameters
  • Update cache parameters: Replace invalidate_cache + do_not_cache with single disable_cache
  • Convert webhooks: Change from single webhook_url to webhook_configurations array
  • Remove header/footer customization: Header/footer handling is now automatic
  • Use correct input field: Use either file_id or source_url in /parse endpoint (exactly one required)
  • Test thoroughly: The alpha API may have additional breaking changes

The v2 configuration structure varies by endpoint:

Use either file_id or source_url (exactly one is required):

{
"file_id": "existing-file-id",
// OR use source_url instead:
"source_url": "https://example.com/document.pdf",
"http_proxy": "https://proxy.example.com", // optional, only with source_url
"tier": "fast|cost_effective|agentic|agentic_plus",
"version": "latest|2026-01-08|2025-12-31|2025-12-18|2025-12-11",
"processing_options": {
"ignore": {...},
"ocr_parameters": {...},
"aggressive_table_extraction": false,
"auto_mode_configuration": [...]
},
"agentic_options": {
"custom_prompt": "..." // Only for cost_effective, agentic, agentic_plus tiers
},
"webhook_configurations": [...],
"input_options": {...},
"crop_box": {...},
"page_ranges": {...},
"disable_cache": false,
"output_options": {...},
"processing_control": {...}
}

Uses multipart/form-data with a file field and a configuration JSON field:

{
"tier": "fast|cost_effective|agentic|agentic_plus",
"version": "latest|2026-01-08|2025-12-31|2025-12-18|2025-12-11",
"processing_options": {...},
"agentic_options": {
"custom_prompt": "..." // Only for cost_effective, agentic, agentic_plus tiers
},
"webhook_configurations": [...],
"input_options": {...},
"crop_box": {...},
"page_ranges": {...},
"disable_cache": false,
"output_options": {...},
"processing_control": {...}
}
v1 Parameterv2 LocationNotes
input_urlsource_urlRenamed
http_proxyhttp_proxySame functionality
max_pagespage_ranges.max_pagesSame functionality
target_pagespage_ranges.target_pagesBreaking change: Now uses 1-based indexing (user inputs “1,2,3” instead of “0,1,2”)
invalidate_cache and do_not_cachedisable_cacheBreaking change: Single boolean combines both v1 parameters
languageprocessing_options.ocr_parameters.languagesSame functionality

Important: In v1, target_pages used 0-based indexing (e.g., “0,1,2” for pages 1, 2, 3). In v2, it uses 1-based indexing (e.g., “1,2,3” for the same pages) to be homogenous with the rest of the platform.

The following parameters are always enabled in v2 across all tiers and cannot be disabled. We’re doing this to simplify calling LlamaParse and because these options give better results:

v1 Parameterv2 BehaviorBreaking Change
high_res_ocrAlways trueBreaking: Cannot be disabled in v2
precise_bounding_boxAlways true for cost_effective, agentic, and agentic_plus tiersBreaking: Cannot be disabled in v2

Configurable in v2 (Different v1 Defaults)

Section titled “Configurable in v2 (Different v1 Defaults)”

The following parameters are now configurable but have different defaults:

v1 Parameterv2 Equivalentv2 DefaultNotes
guess_xlsx_sheet_nameoutput_options.tables_as_spreadsheet.guess_sheet_nametrueAlways true when tables_as_spreadsheet.enable is set
v1 Parameterv2 LocationNotes
parse_modetierBreaking: Now uses tier-based system
modelAutomatic selectionBreaking: Model is selected automatically based on tier
parsing_instructionagentic_options.custom_promptBreaking: Only available for cost_effective, agentic, and agentic_plus tiers
formatting_instructionRemovedBreaking: Use agentic_options.custom_prompt instead
system_promptagentic_options.custom_promptBreaking: Consolidated into single custom prompt
system_prompt_appendagentic_options.custom_promptBreaking: Consolidated into single custom prompt
user_promptRemovedBreaking: Use agentic_options.custom_prompt instead
languageprocessing_options.ocr_parameters.languagesSame functionality

The following v1 parameters are not supported in v2:

v1 Parameterv2 StatusMigration Path
use_vendor_multimodal_modelRemoved (was deprecated)Use appropriate tier instead
gpt4o_modeRemovedUse tier: "cost_effective" for GPT-4o-mini or tier: "agentic_plus" for premium
gpt4o_api_keyRemovedExternal provider support removed for simplification
premium_modeRemovedUse tier: "agentic_plus" for highest quality
fast_modeRemovedUse tier: "fast" for fastest processing
continuous_modeRemovedNo direct equivalent
vendor_multimodal_api_keyRemovedBreaking: External providers removed for simplification
azure_openai_*RemovedBreaking: External providers removed for simplification
bounding_boxRenamedUse crop_box object instead
disable_image_extractionRemovedBreaking: Image extraction is now always optimized automatically
hide_headersRemovedBreaking: Header handling is now automatic
hide_footersRemovedBreaking: Footer handling is now automatic
page_header_prefixRemovedBreaking: Header formatting removed for simplification
page_footer_prefixRemovedBreaking: Footer formatting removed for simplification
page_prefixRemovedBreaking: Page prefix formatting removed for simplification
page_separatorRemovedBreaking: Custom page separators removed for simplification
keep_page_separator_when_merging_tablesRemovedBreaking: Table merging behavior is now optimized automatically
input_s3_path and input_s3_regionRemovedNot supported in v2
output_s3_path_prefix and output_s3_regionRemovedNot supported in v2
output_pdf_of_documentRemovedPDF of document is always generated in v2
v1 Parameterv2 LocationNotes
webhook_urlwebhook_configurations[0].webhook_urlBreaking: Now an array, but only first entry is used at the moment
webhook_configurations (string)webhook_configurations (array)Breaking: Format changed from JSON string to structured array

The following options exist in the v2 schema but are not yet implemented:

  • input_options.pdf.password (placeholder for future implementation)

The following features are new in v2 and have no direct v1 equivalent:

v2 ParameterDescription
output_options.images_to_saveArray specifying which image categories to save: "screenshot" (full page), "embedded" (images in document), "layout" (cropped from layout detection)
processing_options.auto_mode_configurationAdvanced feature for conditional parsing with triggers and per-condition configurations
input_options.spreadsheet.force_formula_computation_in_sheetsForce re-computation of spreadsheet cells containing formulas
v1 Parameterv2 Location
bbox_topcrop_box.top
bbox_bottomcrop_box.bottom
bbox_leftcrop_box.left
bbox_rightcrop_box.right
v1 Parameterv2 Location
html_make_all_elements_visibleinput_options.html.make_all_elements_visible
html_remove_fixed_elementsinput_options.html.remove_fixed_elements
html_remove_navigation_elementsinput_options.html.remove_navigation_elements
spreadsheet_extract_sub_tablesinput_options.spreadsheet.detect_sub_tables_in_sheets
spreadsheet_force_formula_computationinput_options.spreadsheet.force_formula_computation_in_sheets
presentation_out_of_bounds_contentinput_options.presentation.out_of_bounds_content
presentation_skip_embedded_datainput_options.presentation.skip_embedded_data
v1 Parameterv2 Location
aggressive_table_extractionprocessing_options.aggressive_table_extraction
outlined_table_extraction + adaptive_long_tableprocessing_options.disable_heuristics (inverted)
specialized_chart_parsing_agenticprocessing_options.specialized_chart_parsing: "agentic_plus"
specialized_chart_parsing_plusprocessing_options.specialized_chart_parsing: "agentic"
specialized_chart_parsing_efficientprocessing_options.specialized_chart_parsing: "efficient"
v1 Parameterv2 Location
skip_diagonal_textprocessing_options.ignore.ignore_diagonal_text
disable_ocrprocessing_options.ignore.ignore_text_in_image
remove_hidden_textprocessing_options.ignore.ignore_hidden_text
v1 Parameterv2 Location
annotate_linksoutput_options.markdown.annotate_links
page_suffixRemoved
hide_headersRemoved
hide_footersRemoved
compact_markdown_tableoutput_options.markdown.tables.compact_markdown_tables
output_tables_as_HTMLoutput_options.markdown.tables.output_tables_as_markdown (inverted)
markdown_table_multiline_header_separatoroutput_options.markdown.tables.markdown_table_multiline_separator
merge_tables_across_pages_in_markdownoutput_options.markdown.tables.merge_continued_tables
guess_xlsx_sheet_nameoutput_options.tables_as_spreadsheet.guess_sheet_name
extract_layoutRemoved
save_imagesoutput_options.images_to_save
take_screenshotoutput_options.images_to_save: ["screenshot"]
extract_printed_page_numberoutput_options.extract_printed_page_number
v1 Parameterv2 Location
preserve_layout_alignment_across_pagesoutput_options.spatial_text.preserve_layout_alignment_across_pages
preserve_very_small_textoutput_options.spatial_text.preserve_very_small_text
do_not_unroll_columnsoutput_options.spatial_text.do_not_unroll_columns
v1 Parameterv2 Location
job_timeout_in_secondsprocessing_control.timeouts.base_in_seconds
job_timeout_extra_time_per_page_in_secondsprocessing_control.timeouts.extra_time_per_page_in_seconds
page_error_toleranceprocessing_control.job_failure_conditions.allowed_page_failure_ratio
strict_mode_image_extractionprocessing_control.job_failure_conditions.fail_on_image_extraction_error
strict_mode_image_ocrprocessing_control.job_failure_conditions.fail_on_image_ocr_error
strict_mode_reconstructionprocessing_control.job_failure_conditions.fail_on_markdown_reconstruction_error
strict_mode_buggy_fontprocessing_control.job_failure_conditions.fail_on_buggy_font

API v2 provides more detailed error messages:

400: Invalid parameter combination
{
"detail": [
{
"type": "value_error",
"loc": ["tier"],
"msg": "Unsupported tier: invalid_tier. Must be one of: fast, cost_effective, agentic, agentic_plus",
"input": {...}
}
]
}

v2 introduces a new images_to_save parameter that provides fine-grained control over which images are saved during parsing. This replaces the v1 save_images and take_screenshot boolean flags.

CategoryDescription
screenshotFull page screenshots (replaces v1 take_screenshot)
embeddedImages embedded within the document
layoutCropped images from layout detection

Save only screenshots:

{
"output_options": {
"images_to_save": ["screenshot"]
}
}

Save screenshots, embedded images and layout crops:

{
"output_options": {
"images_to_save": ["screenshots", "embedded", "layout"]
}
}

Note: If images_to_save is not specified, images are not saved by default in v2 (unlike in v1 where save_images defaulted to true).

v2 introduces a new way to retrieve binary outputs (XLSX, PDF, images) using presigned S3 URLs. Instead of streaming the file content directly through the API, v2 returns metadata with a temporary presigned URL for direct download.

Use the xlsx_content_metadata expand parameter:

Terminal window
curl -X 'GET' \
'https://api.cloud.llamaindex.ai/api/v2/parse/{job_id}?expand=xlsx_content_metadata' \
-H 'Authorization: Bearer $LLAMA_CLOUD_API_KEY'

Response:

{
"job": { ... },
"result_content_metadata": {
"xlsx": {
"size_bytes": 15234,
"exists": true,
"presigned_url": "https://s3.amazonaws.com/..."
}
}
}

Use the output_pdf_content_metadata expand parameter:

Terminal window
curl -X 'GET' \
'https://api.cloud.llamaindex.ai/api/v2/parse/{job_id}?expand=output_pdf_content_metadata' \
-H 'Authorization: Bearer $LLAMA_CLOUD_API_KEY'

Response:

{
"job": { ... },
"result_content_metadata": {
"outputPDF": {
"size_bytes": 102400,
"exists": true,
"presigned_url": "https://s3.amazonaws.com/..."
}
}
}

Use the images_content_metadata expand parameter:

Terminal window
curl -X 'GET' \
'https://api.cloud.llamaindex.ai/api/v2/parse/{job_id}?expand=images_content_metadata' \
-H 'Authorization: Bearer $LLAMA_CLOUD_API_KEY'

Response:

{
"job": { ... },
"images_content_metadata": {
"total_count": 3,
"images": [
{
"index": 0,
"filename": "image_0.png",
"content_type": "image/png",
"size_bytes": 12345,
"presigned_url": "https://s3.amazonaws.com/..."
},
{
"index": 1,
"filename": "image_1.jpg",
"content_type": "image/jpeg",
"size_bytes": 23456,
"presigned_url": "https://s3.amazonaws.com/..."
}
]
}
}

Each image entry contains:

  • index: Index of the image in extraction order
  • filename: Image filename (e.g., “image_0.png”)
  • content_type: MIME type of the image
  • size_bytes: Size of the image file in bytes
  • presigned_url: Temporary URL to download the image

Note: Presigned URLs are temporary and expire after a limited time. Download the files promptly after receiving the URLs.

The v1 endpoint will remain available for the foreseeable future, so you can migrate at your own pace. However, new features and improvements will be focused on the v2 endpoint structure.