Skip to content
LiteParse

CLI Reference

Complete reference for all LiteParse CLI commands and options.

LiteParse provides the lit CLI with three commands: parse, batch-parse, and screenshot. The CLI is the same whether installed via npm, pip, or built from Rust source.

Parse a single document.

lit parse [options] <file>
ArgumentDescription
filePath to the document file, or - to read from stdin
OptionDescriptionDefault
-o, --output <file>Write output to a file instead of stdout
--format <format>Output format: json or texttext
--no-ocrDisable OCR entirely
--ocr-language <lang>OCR language code (Tesseract format)eng
--ocr-server-url <url>HTTP OCR server URL— (uses Tesseract)
--tessdata-path <path>Path to tessdata directory— (uses TESSDATA_PREFIX env var)
--num-workers <n>Pages to OCR in parallelCPU cores - 1
--max-pages <n>Maximum pages to parse1000
--target-pages <pages>Pages to parse (e.g., "1-5,10")— (all pages)
--dpi <dpi>Rendering DPI150
--preserve-small-textKeep very small text
--password <password>Password for encrypted/protected documents
-q, --quietSuppress progress output
Terminal window
# Basic text parsing
lit parse report.pdf
# JSON output with bounding boxes
lit parse report.pdf --format json -o report.json
# Parse pages 1-5 only, no OCR
lit parse report.pdf --target-pages "1-5" --no-ocr
# High-DPI rendering with French OCR
lit parse report.pdf --dpi 300 --ocr-language fra
# Use an external OCR server
lit parse report.pdf --ocr-server-url http://localhost:8828/ocr
# Pipe output to another tool
lit parse report.pdf -q | wc -l
# Parse a remote file via stdin
curl -sL https://example.com/report.pdf | lit parse --no-ocr -

Parse multiple documents in a directory.

lit batch-parse [options] <input-dir> <output-dir>
ArgumentDescription
input-dirDirectory containing documents to parse
output-dirDirectory for output files
OptionDescriptionDefault
--format <format>Output format: json or texttext
--no-ocrDisable OCR entirely
--ocr-language <lang>OCR language codeeng
--ocr-server-url <url>HTTP OCR server URL— (uses Tesseract)
--tessdata-path <path>Path to tessdata directory
--num-workers <n>Pages to OCR in parallelCPU cores - 1
--max-pages <n>Maximum pages per file1000
--dpi <dpi>Rendering DPI150
--recursiveSearch subdirectories
--extension <ext>Only process this extension (e.g., ".pdf")— (all supported)
--password <password>Password for encrypted/protected documents (applied to all files)
-q, --quietSuppress progress output
Terminal window
# Parse all supported files in a directory
lit batch-parse ./documents ./output
# Recursively parse only PDFs
lit batch-parse ./documents ./output --recursive --extension ".pdf"
# Batch parse with JSON output and no OCR
lit batch-parse ./documents ./output --format json --no-ocr

Generate page images from a document (PDF, DOCX, XLSX, images, etc.).

lit screenshot [options] <file>
ArgumentDescription
filePath to the document file
OptionDescriptionDefault
-o, --output-dir <dir>Output directory./screenshots
--target-pages <pages>Pages to screenshot (e.g., "1,3,5" or "1-5")— (all pages)
--dpi <dpi>Rendering DPI150
--password <password>Password for encrypted/protected documents
-q, --quietSuppress progress output
Terminal window
# Screenshot all pages of a PDF
lit screenshot document.pdf -o ./pages
# Screenshot a Word document (requires LibreOffice)
lit screenshot report.docx -o ./pages
# First 5 pages at high DPI
lit screenshot document.pdf --target-pages "1-5" --dpi 300 -o ./pages
# Specific pages only
lit screenshot document.pdf --target-pages "1,5,10" -o ./pages

These options are available on all commands:

OptionDescription
-h, --helpShow help for a command
-V, --versionShow version number