LiteParse provides the lit CLI with three commands: parse, batch-parse, and screenshot. The CLI is the same whether installed via npm, pip, or built from Rust source.
Parse a single document.
lit parse [options] <file>
Argument Description filePath to the document file, or - to read from stdin
Option Description Default -o, --output <file>Write output to a file instead of stdout — --format <format>Output format: json or text text--no-ocrDisable OCR entirely — --ocr-language <lang>OCR language code (Tesseract format) eng--ocr-server-url <url>HTTP OCR server URL — (uses Tesseract) --tessdata-path <path>Path to tessdata directory — (uses TESSDATA_PREFIX env var) --num-workers <n>Pages to OCR in parallel CPU cores - 1 --max-pages <n>Maximum pages to parse 1000--target-pages <pages>Pages to parse (e.g., "1-5,10") — (all pages) --dpi <dpi>Rendering DPI 150--preserve-small-textKeep very small text — --password <password>Password for encrypted/protected documents — -q, --quietSuppress progress output —
# JSON output with bounding boxes
lit parse report.pdf --format json -o report.json
# Parse pages 1-5 only, no OCR
lit parse report.pdf --target-pages "1-5" --no-ocr
# High-DPI rendering with French OCR
lit parse report.pdf --dpi 300 --ocr-language fra
# Use an external OCR server
lit parse report.pdf --ocr-server-url http://localhost:8828/ocr
# Pipe output to another tool
lit parse report.pdf -q | wc -l
# Parse a remote file via stdin
curl -sL https://example.com/report.pdf | lit parse --no-ocr -
Parse multiple documents in a directory.
lit batch-parse [options] <input-dir> <output-dir>
Argument Description input-dirDirectory containing documents to parse output-dirDirectory for output files
Option Description Default --format <format>Output format: json or text text--no-ocrDisable OCR entirely — --ocr-language <lang>OCR language code eng--ocr-server-url <url>HTTP OCR server URL — (uses Tesseract) --tessdata-path <path>Path to tessdata directory — --num-workers <n>Pages to OCR in parallel CPU cores - 1 --max-pages <n>Maximum pages per file 1000--dpi <dpi>Rendering DPI 150--recursiveSearch subdirectories — --extension <ext>Only process this extension (e.g., ".pdf") — (all supported) --password <password>Password for encrypted/protected documents (applied to all files) — -q, --quietSuppress progress output —
# Parse all supported files in a directory
lit batch-parse ./documents ./output
# Recursively parse only PDFs
lit batch-parse ./documents ./output --recursive --extension ".pdf"
# Batch parse with JSON output and no OCR
lit batch-parse ./documents ./output --format json --no-ocr
Generate page images from a document (PDF, DOCX, XLSX, images, etc.).
lit screenshot [options] <file>
Argument Description filePath to the document file
Option Description Default -o, --output-dir <dir>Output directory ./screenshots--target-pages <pages>Pages to screenshot (e.g., "1,3,5" or "1-5") — (all pages) --dpi <dpi>Rendering DPI 150--password <password>Password for encrypted/protected documents — -q, --quietSuppress progress output —
# Screenshot all pages of a PDF
lit screenshot document.pdf -o ./pages
# Screenshot a Word document (requires LibreOffice)
lit screenshot report.docx -o ./pages
# First 5 pages at high DPI
lit screenshot document.pdf --target-pages "1-5" --dpi 300 -o ./pages
lit screenshot document.pdf --target-pages "1,5,10" -o ./pages
These options are available on all commands:
Option Description -h, --helpShow help for a command -V, --versionShow version number