Skip to content

What is LiteParse?

Fast, local PDF parsing with spatial text parsing, OCR, and bounding boxes.

LiteParse is an open-source document parsing library that parses text with spatial layout information and bounding boxes. It runs entirely on your machine, with no cloud dependencies, no LLMs, no API keys.

LiteParse is designed specifically for use cases that require fast, accurate text parsing: real-time applications, coding agents, and local workflows. It provides a simple CLI and library API for parsing PDFs, Office documents, and images, with built-in OCR support.

  • Parse PDFs with precise spatial layout. Text comes back positioned where it appears on the page
  • Extract bounding boxes for every text line, ready for downstream processing or visualization
  • OCR scanned documents using built-in Tesseract.js or plug in your own OCR server
  • Parse Office files and images with support for DOCX, XLSX, PPTX, PNG, JPG, and more via automatic conversion
  • Screenshot PDF pages as high-quality images for LLM-based workflows
  • Use from TypeScript, Python, or the CLI — whatever fits your stack