Multi-Format Support
Parse Word documents, spreadsheets, presentations, and images with LiteParse.
LiteParse automatically converts non-PDF formats to PDF before parsing. This lets you use the same parsing pipeline for Office documents, images, and more.
Supported formats
Section titled “Supported formats”Office documents (via LibreOffice)
Section titled “Office documents (via LibreOffice)”| Category | Extensions |
|---|---|
| Word | .doc, .docx, .docm, .odt, .rtf |
| PowerPoint | .ppt, .pptx, .pptm, .odp |
| Spreadsheets | .xls, .xlsx, .xlsm, .ods, .csv, .tsv |
Images (via ImageMagick)
Section titled “Images (via ImageMagick)”.jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .svg
Images are converted to PDF and then parsed with OCR to extract text.
Installing dependencies
Section titled “Installing dependencies”Format conversion uses standard system tools. Install the ones you need:
LibreOffice (for Office documents)
Section titled “LibreOffice (for Office documents)”# macOSbrew install --cask libreoffice
# Ubuntu/Debianapt-get install libreoffice
# Windowschoco install libreoffice-freshOn Windows, you may need to add the LibreOffice CLI directory (typically
C:\Program Files\LibreOffice\program) to your PATH and restart.
ImageMagick (for images)
Section titled “ImageMagick (for images)”# macOSbrew install imagemagick
# Ubuntu/Debianapt-get install imagemagick
# Windowschoco install imagemagick.appOnce the dependencies are installed, just pass any supported file to lit parse:
lit parse report.docxlit parse slides.pptx --format jsonlit parse spreadsheet.xlsx -o output.txtlit parse scan.pngBatch mode also handles mixed formats:
lit batch-parse ./documents ./output --recursiveHow it works
Section titled “How it works”- LiteParse detects the file extension
- If it’s not a PDF, it converts to PDF using the appropriate tool (LibreOffice or ImageMagick)
- The resulting PDF is parsed normally
- Temporary conversion files are cleaned up automatically
If the required conversion tool isn’t installed, LiteParse will return an error explaining which dependency is needed.