What is LiteParse?
Fast, local PDF parsing with spatial text parsing, OCR, and bounding boxes.
LiteParse is an open-source document parsing library that parses text with spatial layout information and bounding boxes. Written in Rust for speed and reliability, it runs entirely on your machine with no cloud dependencies, no LLMs, and no API keys.
LiteParse is designed specifically for use cases that require fast, accurate text parsing: real-time applications, coding agents, and local workflows. It provides a simple CLI and library API for parsing PDFs, Office documents, and images, with built-in OCR support.
What can LiteParse do?
Section titled “What can LiteParse do?”- Parse PDFs with precise spatial layout. Text comes back positioned where it appears on the page
- Extract bounding boxes for every text line, ready for downstream processing or visualization
- OCR scanned documents using built-in Tesseract or plug in your own OCR server
- Parse Office files and images with support for DOCX, XLSX, PPTX, PNG, JPG, and more via automatic conversion
- Screenshot PDF pages as high-quality images for LLM-based workflows
- Use from Node.js/TypeScript, Python, Rust, or the browser (WASM) — whatever fits your stack
Get started
Section titled “Get started”- Getting started: Install LiteParse and parse your first document.
- Library usage: Use LiteParse from TypeScript or Python code.
- Browser usage (WASM): Run LiteParse in the browser with zero server dependencies.
- CLI reference: Complete command and option reference.
- API reference: Detailed API documentation (rust) for all public types and functions. The same types apply across all language bindings.