Library Usage

LiteParse

Guides

Use LiteParse programmatically from TypeScript or Python.

LiteParse can be used as a library in your own code, not just from the CLI. There are packages for both TypeScript and Python.

TypeScript

Install as a project dependency:

npm install @llamaindex/liteparse
# or
pnpm add @llamaindex/liteparse

Parsing a document

import { LiteParse } from "@llamaindex/liteparse";

const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse("document.pdf");

// Full document text with layout preserved
console.log(result.text);

// Per-page data
for (const page of result.pages) {
  console.log(`Page ${page.pageNum}: ${page.textItems.length} text items`);
}

JSON output with bounding boxes

const parser = new LiteParse({ outputFormat: "json" });
const result = await parser.parse("document.pdf");

for (const page of result.json.pages) {
  for (const bbox of page.boundingBoxes) {
    console.log(`[${bbox.x1}, ${bbox.y1}] → [${bbox.x2}, ${bbox.y2}]`);
  }
}

Configuration

Pass any config options to the constructor. You only need to specify what you want to override:

const parser = new LiteParse({
  ocrEnabled: true,
  ocrServerUrl: "http://localhost:8828/ocr",
  ocrLanguage: "fra",
  dpi: 300,
  outputFormat: "json",
  targetPages: "1-10",
});

Screenshots

Generate page images as buffers — useful for sending to LLMs or saving to disk:

const parser = new LiteParse();
const screenshots = await parser.screenshot("document.pdf");

for (const shot of screenshots) {
  console.log(`Page ${shot.pageNum}: ${shot.width}x${shot.height}`);
  // shot.imageBuffer contains the raw PNG/JPG data
}

See the API reference for full type details.

Python

The Python package wraps the LiteParse CLI, so the Node.js CLI must be installed first.

Installation

# 1. Install the CLI (required)
npm install -g @llamaindex/liteparse

# 2. Install the Python package
pip install liteparse

Parsing a document

from liteparse import LiteParse

parser = LiteParse()
result = parser.parse("document.pdf")

# Full document text
print(result.text)

# Per-page data
for page in result.pages:
    print(f"Page {page.pageNum}: {len(page.textItems)} text items")

Configuration

Options can be set on the constructor (applied to all parse calls) or per-call:

# Constructor-level defaults
parser = LiteParse(
    ocr_enabled=True,
    ocr_server_url="http://localhost:8828/ocr",
    ocr_language="fra",
    dpi=300,
)

# Per-call options
result = parser.parse("document.pdf", target_pages="1-5")

Parsing from bytes

If you already have file contents in memory (e.g. from a web upload):

result = parser.parse_bytes(pdf_bytes, filename="upload.pdf")
print(result.text)

Batch parsing

For multiple files, batch mode reuses the PDF engine and is significantly faster:

result = parser.batch_parse(
    input_dir="./documents",
    output_dir="./output",
    recursive=True,
    extension_filter=".pdf",
)

print(f"Parsed {result.success_count} files in {result.total_time_seconds}s")