Skip to content
LiteParse
Guides

Markdown Output

Render documents to clean, structured Markdown for LLMs and RAG pipelines.

LiteParse can render documents directly to Markdown, reconstructing headings, tables, lists, images, and links from the spatial layout. This is ideal for feeding documents to LLMs and RAG pipelines, where clean, structured text matters more than exact visual fidelity.

Markdown is a first-class output format alongside text and json.

Terminal window
# Render a document to Markdown
lit parse document.pdf --format markdown -o output.md
# Print Markdown to stdout
lit parse document.pdf --format markdown

By default, raster images are emitted as Markdown placeholders (![](image_pN_K.png)) in reading order. Control this with --image-mode:

ModeBehavior
placeholder (default)Emit ![](image_pN_K.png) references in reading order
offStrip images entirely
embedWrite each image’s PNG bytes to --image-output-dir and reference them
Terminal window
# Strip images
lit parse document.pdf --format markdown --image-mode off
# Extract embedded images to disk and reference them from the markdown
lit parse document.pdf --format markdown --image-mode embed --image-output-dir ./images

Hyperlink annotations are rendered as [text](url) by default. Pass --no-links to emit the anchor text as plain text instead:

Terminal window
lit parse document.pdf --format markdown --no-links

The rendered Markdown is returned on result.text.

import { LiteParse } from "@llamaindex/liteparse";
const parser = new LiteParse({
outputFormat: "markdown", // "json" | "text" | "markdown"
imageMode: "placeholder", // "placeholder" | "off" | "embed" (default: "placeholder")
extractLinks: true, // render [text](url) link syntax (default: true)
});
const result = await parser.parse("document.pdf");
console.log(result.text); // rendered Markdown
from liteparse import LiteParse
parser = LiteParse(
output_format="markdown", # "json" | "text" | "markdown"
image_mode="placeholder", # "placeholder" | "off" | "embed"
extract_links=True, # render [text](url) link syntax (default: True)
)
result = parser.parse("document.pdf")
print(result.text) # rendered Markdown
use liteparse::config::{ImageMode, LiteParseConfig, OutputFormat};
use liteparse::LiteParse;
let config = LiteParseConfig {
output_format: OutputFormat::Markdown,
image_mode: ImageMode::Placeholder,
extract_links: true,
..Default::default()
};
let result = LiteParse::new(config).parse("document.pdf").await?;
println!("{}", result.text); // rendered Markdown

Markdown reconstruction quality varies with document complexity. LiteParse does a strong job on typical documents, handling prose, headings, simple-to-moderate tables, and lists. This runs entirely locally with no models using rule-based heuristics. For the hardest documents (dense or multi-level tables, complex multi-column layouts, charts, and scans), LlamaParse remains the most accurate option.

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/