What is LiteParse?

LiteParse

Fast, local PDF parsing with spatial text parsing, OCR, and bounding boxes.

LiteParse is an open-source document parsing library that parses text with spatial layout information and bounding boxes. Written in Rust for speed and reliability, it runs entirely on your machine with no cloud dependencies, no LLMs, and no API keys.

LiteParse is designed specifically for use cases that require fast, accurate text parsing: real-time applications, coding agents, and local workflows. It provides a simple CLI and library API for parsing PDFs, Office documents, and images, with built-in OCR support.

What can LiteParse do?

Parse PDFs with precise spatial layout. Text comes back positioned where it appears on the page
Render to Markdown with headings, tables, lists, images, and links — clean structured output for LLMs and RAG pipelines
Extract bounding boxes for every text line, ready for downstream processing or visualization
OCR scanned documents using built-in Tesseract or plug in your own OCR server
Parse Office files and images with support for DOCX, XLSX, PPTX, PNG, JPG, and more via automatic conversion
Screenshot PDF pages as high-quality images for LLM-based workflows
Pull structured PDF data — embedded images, vector graphics, annotations, AcroForm fields, and tagged-structure trees
Score document complexity before you parse, so you can route scanned or multi-column documents to the right pipeline
Use from Node.js/TypeScript, Python, Rust, or the browser (WASM) — whatever fits your stack

Get started

Getting started: Install LiteParse and parse your first document.
Markdown output: Render documents to clean, structured Markdown.
Extraction options: Opt in to images, form fields, annotations, and more.
Document complexity: Detect scanned, multi-column, and table-heavy pages up front.
Library usage: Use LiteParse from TypeScript or Python code.
Browser usage (WASM): Run LiteParse in the browser with zero server dependencies.
CLI reference: Complete command and option reference.
API reference: Detailed API documentation (rust) for all public types and functions. The same types apply across all language bindings.

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/