Getting Started
Install LiteParse and parse your first document in under a minute.
LiteParse is available for Node.js, Python, and as a standalone Rust binary.
All versions (except WASM) ship the same CLI and core library capabilities.
Installation
Section titled “Installation”npm i -g @llamaindex/liteparsepip install liteparsecargo install liteparsenpm i @llamaindex/liteparse-wasmSince the WASM package is designed for browser usage, it does not include the lit CLI command. Instead, you can use the exported functions directly from JavaScript to parse documents in the browser environment.
For full browser usage and limitations, see the WASM package guide.
Quick start
Section titled “Quick start”Once installed, parse from the command line:
# Parse a PDF and print text to stdoutlit parse document.pdf
# Save output to a filelit parse document.pdf -o output.txt
# Get structured JSON with bounding boxeslit parse document.pdf --format json -o output.json
# Parse only specific pageslit parse document.pdf --target-pages "1-5,10,15-20"Batch parsing
Section titled “Batch parsing”Parse an entire directory of documents at once:
lit batch-parse ./pdfs ./outputsScreenshots
Section titled “Screenshots”Generate page images from a PDF for LLM agents or visual workflows:
lit screenshot document.pdf -o ./screenshotsNext steps
Section titled “Next steps”- Library usage: Use LiteParse programmatically from TypeScript or Python.
- OCR configuration: Configure Tesseract, use an external OCR server, or bring your own.
- Multi-format support: Parse DOCX, XLSX, PPTX, images, and more.
- Browser usage (WASM): Run LiteParse entirely in the browser.
- Agent skill: Add LiteParse as a skill for coding agents.
- CLI reference: Complete command and option reference.