Loading Data

LlamaIndex Framework

Component Guides

The key to data ingestion in LlamaIndex is loading and transformations. Once you have loaded Documents, you can process them via transformations and output Nodes.

Once you have learned about the basics of loading data in our Understanding section, you can read on to learn more about:

Loading

SimpleDirectoryReader, our built-in loader for loading all sorts of file types from a local directory
LlamaParse, LlamaIndex’s official tool for PDF parsing, available as a managed API.
LlamaHub, our registry of hundreds of data loading libraries to ingest data from any source

Transformations

This includes common operations like splitting text.

Node Parser Usage Pattern, showing you how to use our node parsers
Node Parser Modules, showing our text splitters (sentence, token, HTML, JSON) and other parser modules.

Putting it all Together

The ingestion pipeline which allows you to set up a repeatable, cache-optimized process for loading data.

Abstractions

Document and Node objects and how to customize them for more advanced use cases

Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/