Skip to content
LiteParse
Guides

Library Usage

Use LiteParse programmatically from TypeScript or Python.

LiteParse can be used as a library in your own code, not just from the CLI. There are native packages for both TypeScript and Python.

Install as a project dependency:

Terminal window
npm install @llamaindex/liteparse
# or
pnpm add @llamaindex/liteparse
import { LiteParse } from "@llamaindex/liteparse";
const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse("document.pdf");
// Full document text with layout preserved
console.log(result.text);
// Per-page data
for (const page of result.pages) {
console.log(`Page ${page.pageNum}: ${page.textItems.length} text items`);
}

Text items include spatial coordinates (x, y, width, height) in PDF points:

const parser = new LiteParse({ outputFormat: "json" });
const result = await parser.parse("document.pdf");
for (const page of result.pages) {
for (const item of page.textItems) {
console.log(`[${item.x}, ${item.y}] ${item.width}×${item.height} "${item.text}"`);
}
}

Pass any config options to the constructor. You only need to specify what you want to override:

const parser = new LiteParse({
ocrEnabled: true,
ocrServerUrl: "http://localhost:8828/ocr",
ocrLanguage: "fra",
dpi: 300,
outputFormat: "json",
targetPages: "1-10",
password: "secret",
});

You can pass raw bytes directly instead of a file path:

import { readFile } from "fs/promises";
const parser = new LiteParse();
// From a file read
const pdfBytes = await readFile("document.pdf");
const result = await parser.parse(pdfBytes);
// From an HTTP response
const response = await fetch("https://example.com/document.pdf");
const buffer = Buffer.from(await response.arrayBuffer());
const result2 = await parser.parse(buffer);

Generate page images as buffers — useful for sending to LLMs or saving to disk:

const parser = new LiteParse();
const screenshots = parser.screenshot("document.pdf");
for (const shot of screenshots) {
console.log(`Page ${shot.pageNum}: ${shot.width}x${shot.height}`);
// shot.imageBuffer contains the raw PNG data
}
// Screenshot specific pages
const shots = parser.screenshot("document.pdf", [1, 2, 3]);
VariableDescription
TESSDATA_PREFIXPath to a directory containing Tesseract .traineddata files. For offline/air-gapped environments. Also available as the tessdataPath config option.