Library Usage
Use LiteParse programmatically from TypeScript or Python.
LiteParse can be used as a library in your own code, not just from the CLI. There are packages for both TypeScript and Python.
TypeScript
Section titled “TypeScript”Install as a project dependency:
npm install @llamaindex/liteparse# orpnpm add @llamaindex/liteparseParsing a document
Section titled “Parsing a document”import { LiteParse } from "@llamaindex/liteparse";
const parser = new LiteParse({ ocrEnabled: true });const result = await parser.parse("document.pdf");
// Full document text with layout preservedconsole.log(result.text);
// Per-page datafor (const page of result.pages) { console.log(`Page ${page.pageNum}: ${page.textItems.length} text items`);}JSON output with bounding boxes
Section titled “JSON output with bounding boxes”const parser = new LiteParse({ outputFormat: "json" });const result = await parser.parse("document.pdf");
for (const page of result.json.pages) { for (const bbox of page.boundingBoxes) { console.log(`[${bbox.x1}, ${bbox.y1}] → [${bbox.x2}, ${bbox.y2}]`); }}Configuration
Section titled “Configuration”Pass any config options to the constructor. You only need to specify what you want to override:
const parser = new LiteParse({ ocrEnabled: true, ocrServerUrl: "http://localhost:8828/ocr", ocrLanguage: "fra", dpi: 300, outputFormat: "json", targetPages: "1-10",});Screenshots
Section titled “Screenshots”Generate page images as buffers — useful for sending to LLMs or saving to disk:
const parser = new LiteParse();const screenshots = await parser.screenshot("document.pdf");
for (const shot of screenshots) { console.log(`Page ${shot.pageNum}: ${shot.width}x${shot.height}`); // shot.imageBuffer contains the raw PNG/JPG data}See the API reference for full type details.
Python
Section titled “Python”The Python package wraps the LiteParse CLI, so the Node.js CLI must be installed first.
Installation
Section titled “Installation”# 1. Install the CLI (required)npm install -g @llamaindex/liteparse
# 2. Install the Python packagepip install liteparseParsing a document
Section titled “Parsing a document”from liteparse import LiteParse
parser = LiteParse()result = parser.parse("document.pdf")
# Full document textprint(result.text)
# Per-page datafor page in result.pages: print(f"Page {page.pageNum}: {len(page.textItems)} text items")Configuration
Section titled “Configuration”Options can be set on the constructor (applied to all parse calls) or per-call:
# Constructor-level defaultsparser = LiteParse( ocr_enabled=True, ocr_server_url="http://localhost:8828/ocr", ocr_language="fra", dpi=300,)
# Per-call optionsresult = parser.parse("document.pdf", target_pages="1-5")Parsing from bytes
Section titled “Parsing from bytes”If you already have file contents in memory (e.g. from a web upload):
result = parser.parse_bytes(pdf_bytes, filename="upload.pdf")print(result.text)Batch parsing
Section titled “Batch parsing”For multiple files, batch mode reuses the PDF engine and is significantly faster:
result = parser.batch_parse( input_dir="./documents", output_dir="./output", recursive=True, extension_filter=".pdf",)
print(f"Parsed {result.success_count} files in {result.total_time_seconds}s")