Skip to content
LiteParse
Guides

Browser Usage (WASM)

Run LiteParse entirely in the browser with the WASM package.

LiteParse ships a WebAssembly package that runs entirely in the browser — no server, no cloud calls. It supports PDF parsing and custom OCR engines implemented in JavaScript.

Terminal window
npm install @llamaindex/liteparse-wasm
import init, { LiteParse } from "@llamaindex/liteparse-wasm";
// Load the WASM module
await init();
const parser = new LiteParse({
ocrEnabled: false,
outputFormat: "json",
});
// data is a Uint8Array (e.g. from <input type="file"> or fetch)
const bytes = new Uint8Array(await file.arrayBuffer());
const result = await parser.parse(bytes);
console.log(result.text);
console.log(result.pages[0]);
  • PDF parsing from Uint8Array input (use file.arrayBuffer() to get bytes from a file picker for example)
  • Custom OCR via the ocrEngine callback interface (see below)
  • Text and JSON output formats
  • File path input — pass Uint8Array instead
  • DOCX/XLSX/PPTX/image conversion — requires LibreOffice/ImageMagick which aren’t available in the browser
  • Built-in Tesseract or HTTP OCR — use the custom ocrEngine interface instead
  • Screenshots — not available in the WASM build

The native Tesseract and HTTP OCR backends are not available in WASM. To use OCR, pass a custom ocrEngine object with a recognize method:

const parser = new LiteParse({
ocrEnabled: true,
ocrLanguage: "eng",
ocrEngine: {
/**
* @param imageData PNG-encoded image bytes
* @param width rendered page width in pixels
* @param height rendered page height in pixels
* @param language e.g. "eng"
* @returns array of { text, bbox: [x1, y1, x2, y2], confidence }
*/
async recognize(imageData, width, height, language) {
// e.g. call a Web Worker wrapping tesseract.js, or a remote OCR service
return [
{ text: "Hello", bbox: [10, 20, 80, 40], confidence: 0.98 },
];
},
},
});

This lets you plug in any OCR implementation — a Web Worker running tesseract.js, a cloud OCR API, or anything else that returns text with bounding boxes.

All optional, camelCase:

OptionTypeDefaultDescription
ocrLanguagestring"eng"Language code passed to the OCR engine
ocrEnabledbooleantrueRun OCR on text-sparse pages
maxPagesnumber1000Stop after this many pages
targetPagesstringe.g. "1-5,10,15-20"
dpinumber150Render DPI for OCR
outputFormat"json" | "text""json"Format used by parser.format(...)
preserveVerySmallTextbooleanfalseKeep tiny text that’s normally filtered
passwordstringPassword for protected PDFs
quietbooleanfalseSuppress progress logging
ocrEngineobjectCustom JS-side OCR engine (see above)