Skip to content
LiteParse
Guides

Parsing URLs

Parse remote documents by reading URLs.

To parse remote files, LiteParse supports both CLI and library usage for reading bytes and streams. The CLI can download them with any tool you like and pipe the bytes to lit parse using - as the file argument, while the libraries can fetch the bytes directly and pass them to the parser.

Terminal window
# Parse a remote PDF
curl -sL https://example.com/report.pdf | lit parse -
# With options
curl -sL https://example.com/report.pdf | lit parse --no-ocr --format json -
# Save to a file
curl -sL https://example.com/report.pdf | lit parse -o report.txt -

The - argument tells LiteParse to read from stdin instead of a file path. Any tool that writes to stdout works — curl, wget, aws s3 cp - -, etc.

The TypeScript library accepts Buffer/Uint8Array directly, so you can handle the download however you like.

For example, using fetch in a Node.js environment:

import { LiteParse } from "@llamaindex/liteparse";
const response = await fetch("https://example.com/report.pdf");
const buffer = Buffer.from(await response.arrayBuffer());
const parser = new LiteParse({ ocrEnabled: false });
const result = await parser.parse(buffer);
console.log(result.text);
Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/