Skip to content

Crop Box

The crop box parameter allows you to define a specific rectangular area of each page to parse, effectively cropping out unwanted content from the margins.

The crop box is defined using four ratio values (between 0.0 and 1.0) that represent the boundaries of the area to parse:

  • top: Distance from the top edge as a ratio of page height
  • right: Distance from the right edge as a ratio of page width
  • bottom: Distance from the bottom edge as a ratio of page height
  • left: Distance from the left edge as a ratio of page width

The crop box applies to every page in the document.

  • Remove headers and footers: Crop out repetitive page headers and footers
  • Focus on main content: Extract only the central content area of a document
  • Remove margin text: Exclude annotations, page numbers, or watermarks in margins

The crop box should be included at the root level of your parsing request:

{
"crop_box": {
"top": 0.1,
"right": 0.05,
"bottom": 0.15,
"left": 0.05
}
}

Exclude the top 10% and bottom 15% of each page:

{
"crop_box": {
"top": 0.1,
"bottom": 0.15
}
}
Terminal window
curl -X 'POST'
'https://api.cloud.llamaindex.ai/api/v2/parse' \
-H 'Accept: application/json'
-H 'Content-Type: application/json'
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"
--data '{
"file_id": "<file_id>",
"tier": "agentic",
"version": "latest",
"crop_box": {
"top": 50,
"bottom": 50,
"left": 25,
"right": 25
}
}'