Granular Bounding Boxes: Word, Line, and Cell Grounding
This example shows how to get per-word, per-line, and per-table-cell bounding boxes alongside the regular item-level layout boxes Parse returns, and how to fetch + walk the JSONL sidecar that carries them.
Use this when you need to:
- Highlight individual words or lines on a PDF viewer for citation back-references.
- Ground extracted answers down to the exact glyph rather than the whole paragraph.
- Build a side-by-side preview that hover-syncs from markdown text → highlighted region on the source document.
Granular bounding boxes are not delivered inline on the parse-result response — they live in a separate JSONL sidecar that the result links to via a presigned URL. This is a deliberate split: the sidecar can be many MB on a long document, and most callers don’t need it. The flow is two steps: parse with granular_bboxes set, then download the sidecar URL.
1. Setup
Section titled “1. Setup”pip install llama-cloud>=1.0 httpximport osfrom getpass import getpass
os.environ["LLAMA_CLOUD_API_KEY"] = getpass("Llama Cloud API Key: ")from llama_cloud import AsyncLlamaCloud
client = AsyncLlamaCloud()2. Parse with granular_bboxes
Section titled “2. Parse with granular_bboxes”Set output_options.granular_bboxes to any subset of "word", "line", "cell". You can request just one level or all three. Parse will produce the JSONL sidecar automatically — there is no corresponding expand value to add.
# 1) Upload the filefile_obj = await client.files.create( file="executive-summary-2024.pdf", purpose="parse",)
# 2) Parse with word + line + cell groundingresult = await client.parsing.parse( file_id=file_obj.id, tier="agentic", version="latest", output_options={ "granular_bboxes": ["word", "line", "cell"], }, # `items` is optional — we ask for it so we can compare the inline items tree # to the sidecar later. The sidecar URL itself is auto-included on the result. expand=["items"],)3. Find the sidecar URL
Section titled “3. Find the sidecar URL”When granular_bboxes is set, the result auto-includes a grounded_items entry under result_content_metadata. Each entry carries size_bytes, an exists flag, and a presigned_url.
sidecar = (result.result_content_metadata or {}).get("grounded_items")if sidecar is None: raise RuntimeError("Sidecar missing — was `granular_bboxes` set on the parse request?")
print(f"Sidecar: {sidecar.size_bytes} bytes")print(f"URL: {sidecar.presigned_url}")Presigned URLs are temporary. Download promptly, or call
client.parsing.get(job_id=...)again to mint a fresh URL.
4. Download and parse the JSONL
Section titled “4. Download and parse the JSONL”The sidecar is JSONL — one JSON object per line, one line per page — not a single JSON array. Stream it line by line.
import jsonimport httpx
async with httpx.AsyncClient() as http: response = await http.get(sidecar.presigned_url) response.raise_for_status()
# Each non-empty line is one page row.pages = [json.loads(line) for line in response.text.splitlines() if line.strip()]print(f"Pages in sidecar: {len(pages)}")Each page row is one of two shapes:
# Success{ "page_number": 1, "page_width": 612, "page_height": 792, "success": True, "items": [...],}
# Failure — grounding could not be produced for this page{ "page_number": 2, "success": False, "error": "...",}Always check success before drilling in:
for page in pages: if not page["success"]: print(f"Page {page['page_number']} failed: {page['error']}") continue print(f"Page {page['page_number']}: {len(page['items'])} items")5. Walk word-level grounding
Section titled “5. Walk word-level grounding”Each item has the same type / md / bbox shape as the regular items response, plus an optional grounding block. For text-shaped items (paragraphs, headings, captions), grounding is a GroundedTextSupport:
{ "source": "md", # or "caption" — which surface the spans index into "lines": [ { "span": [0, 11], # [start, end) byte range into item["md"] "bbox": { "x": 72.0, "y": 100.0, "w": 200.0, "h": 12.0 }, "words": [ { "span": [0, 5], "bbox": { "x": 72.0, "y": 100.0, "w": 35.0, "h": 12.0 }, }, # ... ], }, # ... ],}To highlight each word on page 1:
page = next(p for p in pages if p["success"] and p["page_number"] == 1)
for item in page["items"]: grounding = item.get("grounding") if not grounding or grounding.get("source") not in ("md", "caption"): continue
for line in grounding["lines"]: for word in line.get("words", []) or []: start, end = word["span"] text = item["md"][start:end] box = word["bbox"] print(f" word {text!r} at ({box['x']:.0f}, {box['y']:.0f}) " f"{box['w']:.0f}×{box['h']:.0f}")The span is a [start, end) UTF-8 byte range into item["md"], so item["md"][start:end] slices out exactly the word’s source text.
6. Walk table-cell grounding
Section titled “6. Walk table-cell grounding”For table items, grounding is a GroundedTableSupport instead — it carries per-cell boxes and spans, plus row- and column-level boxes:
{ "rows": [ # rows[row][col] is a cell or null [ { "span": [42, 56], # optional, into the table cell text "lines": [...], # optional per-line grounding inside the cell "bbox": [ # one or more boxes covering the cell { "x": 100.0, "y": 200.0, "w": 50.0, "h": 16.0 }, ], }, None, # missing/empty cell # ... ], # ... ], "row_bboxes": [[{...}], ...], # boxes per row (a row may span multiple) "column_bboxes": [[{...}], ...], # boxes per column}To find the bbox of the cell at row 0, column 1 on page 1:
for item in page["items"]: if item["type"] != "table": continue grounding = item.get("grounding") if not grounding or not grounding.get("rows"): continue
cell = grounding["rows"][0][1] if cell is None: print("cell (0, 1) is empty") continue
for box in cell.get("bbox") or []: print(f"cell (0, 1) box: ({box['x']:.0f}, {box['y']:.0f}) " f"{box['w']:.0f}×{box['h']:.0f}")7. Render boxes on the page screenshot
Section titled “7. Render boxes on the page screenshot”The sidecar carries page_width and page_height for every page — these match the coordinate space of the bboxes. If you also request images_to_save: ["screenshot"], you can overlay the bboxes onto each page screenshot directly. The screenshot’s pixel dimensions may differ from page_width / page_height (PDF points vs. image pixels), so scale the box coordinates accordingly:
scale_x = screenshot_pixel_width / page["page_width"]scale_y = screenshot_pixel_height / page["page_height"]
x_px = box["x"] * scale_xy_px = box["y"] * scale_yw_px = box["w"] * scale_xh_px = box["h"] * scale_yIf box["r"] is present and non-zero, the box should be rotated by r degrees around its center to recover the visual quad — x/y/w/h describe the axis-aligned bounding rect of the unrotated content.
See also
Section titled “See also”- Configuring Parse → Granular bounding boxes — request-side configuration
- Retrieving Results → Grounded items content metadata — response-side schema reference
- Parse a PDF & Interpret Outputs — the regular
itemstree (item-level boxes only)