Granular Bounding Boxes: Word, Line, and Cell Grounding

Guide

Parse

Examples

Request per-word, per-line, and per-cell bounding boxes from LlamaParse and walk the JSONL sidecar to ground text and table cells for citation highlighting.

This example shows how to get per-word, per-line, and per-table-cell bounding boxes alongside the regular item-level layout boxes Parse returns, and how to fetch + walk the JSONL sidecar that carries them.

Use this when you need to:

Highlight individual words or lines on a PDF viewer for citation back-references.
Ground extracted answers down to the exact glyph rather than the whole paragraph.
Build a side-by-side preview that hover-syncs from markdown text → highlighted region on the source document.

Granular bounding boxes are not delivered inline on the parse-result response — they live in a separate JSONL sidecar that the result links to via a presigned URL. This is a deliberate split: the sidecar can be many MB on a long document, and most callers don’t need it. The flow is two steps: parse with granular_bboxes set, then download the sidecar URL.

1. Setup

pip install llama-cloud>=1.0 httpx

import os
from getpass import getpass

os.environ["LLAMA_CLOUD_API_KEY"] = getpass("Llama Cloud API Key: ")

from llama_cloud import AsyncLlamaCloud

client = AsyncLlamaCloud()

2. Parse with `granular_bboxes`

Set output_options.granular_bboxes to any subset of "word", "line", "cell". You can request just one level or all three. Parse will produce the JSONL sidecar automatically — there is no corresponding expand value to add.

# 1) Upload the file
file_obj = await client.files.create(
    file="executive-summary-2024.pdf",
    purpose="parse",
)

# 2) Parse with word + line + cell grounding
result = await client.parsing.parse(
    file_id=file_obj.id,
    tier="agentic",
    version="latest",
    output_options={
        "granular_bboxes": ["word", "line", "cell"],
    },
    # `items` is optional — we ask for it so we can compare the inline items tree
    # to the sidecar later. The sidecar URL itself is auto-included on the result.
    expand=["items"],
)

3. Find the sidecar URL

When granular_bboxes is set, the result auto-includes a grounded_items entry under result_content_metadata. Each entry carries size_bytes, an exists flag, and a presigned_url.

sidecar = (result.result_content_metadata or {}).get("grounded_items")
if sidecar is None:
    raise RuntimeError("Sidecar missing — was `granular_bboxes` set on the parse request?")

print(f"Sidecar: {sidecar.size_bytes} bytes")
print(f"URL:     {sidecar.presigned_url}")

Presigned URLs are temporary. Download promptly, or call client.parsing.get(job_id=...) again to mint a fresh URL.

4. Download and parse the JSONL

The sidecar is JSONL — one JSON object per line, one line per page — not a single JSON array. Stream it line by line.

import json
import httpx

async with httpx.AsyncClient() as http:
    response = await http.get(sidecar.presigned_url)
    response.raise_for_status()

# Each non-empty line is one page row.
pages = [json.loads(line) for line in response.text.splitlines() if line.strip()]
print(f"Pages in sidecar: {len(pages)}")

Each page row is one of two shapes:

# Success
{
    "page_number": 1,
    "page_width": 612,
    "page_height": 792,
    "success": True,
    "items": [...],
}

# Failure — grounding could not be produced for this page
{
    "page_number": 2,
    "success": False,
    "error": "...",
}

Always check success before drilling in:

for page in pages:
    if not page["success"]:
        print(f"Page {page['page_number']} failed: {page['error']}")
        continue
    print(f"Page {page['page_number']}: {len(page['items'])} items")

5. Walk word-level grounding

Each item has the same type / md / bbox shape as the regular items response, plus an optional grounding block. For text-shaped items (paragraphs, headings, captions), grounding is a GroundedTextSupport:

{
    "source": "md",           # or "caption" — which surface the spans index into
    "lines": [
        {
            "span": [0, 11],  # [start, end) character range into item["md"]
            "bbox": { "x": 72.0, "y": 100.0, "w": 200.0, "h": 12.0 },
            "words": [
                {
                    "span": [0, 5],
                    "bbox": { "x": 72.0, "y": 100.0, "w": 35.0, "h": 12.0 },
                },
                # ...
            ],
        },
        # ...
    ],
}

To highlight each word on page 1:

page = next(p for p in pages if p["success"] and p["page_number"] == 1)

for item in page["items"]:
    grounding = item.get("grounding")
    if not grounding or grounding.get("source") not in ("md", "caption"):
        continue

    for line in grounding["lines"]:
        for word in line.get("words", []) or []:
            start, end = word["span"]
            text = item["md"][start:end]
            box = word["bbox"]
            print(f"  word {text!r} at ({box['x']:.0f}, {box['y']:.0f}) "
                  f"{box['w']:.0f}×{box['h']:.0f}")

The span is a [start, end) character range into item["md"], so item["md"][start:end] slices out exactly the word’s source text.

6. Walk table-cell grounding

For table items, grounding is a GroundedTableSupport instead — it carries per-cell boxes and spans, plus row- and column-level boxes:

{
    "rows": [
        # rows[row][col] is a cell or null
        [
            {
                "span": [42, 56],            # optional, into the table cell text
                "lines": [...],              # optional per-line grounding inside the cell
                "bbox": [                    # one or more boxes covering the cell
                    { "x": 100.0, "y": 200.0, "w": 50.0, "h": 16.0 },
                ],
            },
            None,                            # missing/empty cell
            # ...
        ],
        # ...
    ],
    "row_bboxes":    [[{...}], ...],         # boxes per row (a row may span multiple)
    "column_bboxes": [[{...}], ...],         # boxes per column
}

To find the bbox of the cell at row 0, column 1 on page 1:

for item in page["items"]:
    if item["type"] != "table":
        continue
    grounding = item.get("grounding")
    if not grounding or not grounding.get("rows"):
        continue

    cell = grounding["rows"][0][1]
    if cell is None:
        print("cell (0, 1) is empty")
        continue

    for box in cell.get("bbox") or []:
        print(f"cell (0, 1) box: ({box['x']:.0f}, {box['y']:.0f}) "
              f"{box['w']:.0f}×{box['h']:.0f}")

7. Render boxes on the page screenshot

The sidecar carries page_width and page_height for every page — these match the coordinate space of the bboxes. If you also request images_to_save: ["screenshot"], you can overlay the bboxes onto each page screenshot directly. The screenshot’s pixel dimensions may differ from page_width / page_height (PDF points vs. image pixels), so scale the box coordinates accordingly:

scale_x = screenshot_pixel_width  / page["page_width"]
scale_y = screenshot_pixel_height / page["page_height"]

x_px = box["x"] * scale_x
y_px = box["y"] * scale_y
w_px = box["w"] * scale_x
h_px = box["h"] * scale_y

If box["r"] is present and non-zero, the box should be rotated by r degrees around its center to recover the visual quad — x/y/w/h describe the axis-aligned bounding rect of the unrotated content.

Granular Bounding Boxes: Word, Line, and Cell Grounding

1. Setup

2. Parse with granular_bboxes

3. Find the sidecar URL

4. Download and parse the JSONL

5. Walk word-level grounding

6. Walk table-cell grounding

7. Render boxes on the page screenshot

See also

2. Parse with `granular_bboxes`