Skip to content
LiteParse
Guides

Library Usage

Use LiteParse programmatically from TypeScript or Python.

LiteParse can be used as a library in your own code, not just from the CLI. There are packages for both TypeScript and Python.

Install as a project dependency:

Terminal window
npm install @llamaindex/liteparse
# or
pnpm add @llamaindex/liteparse
import { LiteParse } from "@llamaindex/liteparse";
const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse("document.pdf");
// Full document text with layout preserved
console.log(result.text);
// Per-page data
for (const page of result.pages) {
console.log(`Page ${page.pageNum}: ${page.textItems.length} text items`);
}
const parser = new LiteParse({ outputFormat: "json" });
const result = await parser.parse("document.pdf");
for (const page of result.json.pages) {
for (const bbox of page.boundingBoxes) {
console.log(`[${bbox.x1}, ${bbox.y1}] → [${bbox.x2}, ${bbox.y2}]`);
}
}

Pass any config options to the constructor. You only need to specify what you want to override:

const parser = new LiteParse({
ocrEnabled: true,
ocrServerUrl: "http://localhost:8828/ocr",
ocrLanguage: "fra",
dpi: 300,
outputFormat: "json",
targetPages: "1-10",
});

Generate page images as buffers — useful for sending to LLMs or saving to disk:

const parser = new LiteParse();
const screenshots = await parser.screenshot("document.pdf");
for (const shot of screenshots) {
console.log(`Page ${shot.pageNum}: ${shot.width}x${shot.height}`);
// shot.imageBuffer contains the raw PNG/JPG data
}

See the API reference for full type details.


The Python package wraps the LiteParse CLI, so the Node.js CLI must be installed first.

Terminal window
# 1. Install the CLI (required)
npm install -g @llamaindex/liteparse
# 2. Install the Python package
pip install liteparse
from liteparse import LiteParse
parser = LiteParse()
result = parser.parse("document.pdf")
# Full document text
print(result.text)
# Per-page data
for page in result.pages:
print(f"Page {page.pageNum}: {len(page.textItems)} text items")

Options can be set on the constructor (applied to all parse calls) or per-call:

# Constructor-level defaults
parser = LiteParse(
ocr_enabled=True,
ocr_server_url="http://localhost:8828/ocr",
ocr_language="fra",
dpi=300,
)
# Per-call options
result = parser.parse("document.pdf", target_pages="1-5")

If you already have file contents in memory (e.g. from a web upload):

result = parser.parse_bytes(pdf_bytes, filename="upload.pdf")
print(result.text)

For multiple files, batch mode reuses the PDF engine and is significantly faster:

result = parser.batch_parse(
input_dir="./documents",
output_dir="./output",
recursive=True,
extension_filter=".pdf",
)
print(f"Parsed {result.success_count} files in {result.total_time_seconds}s")