Loading Data
The key to data ingestion in LlamaIndex is loading and transformations. Once you have loaded Documents, you can process them via transformations and output Nodes.
Once you have learned about the basics of loading data in our Understanding section, you can read on to learn more about:
Loading
Section titled “Loading”- SimpleDirectoryReader, our built-in loader for loading all sorts of file types from a local directory
- LlamaParse, LlamaIndex’s official tool for PDF parsing, available as a managed API.
- LlamaHub, our registry of hundreds of data loading libraries to ingest data from any source
Transformations
Section titled “Transformations”This includes common operations like splitting text.
- Node Parser Usage Pattern, showing you how to use our node parsers
- Node Parser Modules, showing our text splitters (sentence, token, HTML, JSON) and other parser modules.
Putting it all Together
Section titled “Putting it all Together”- The ingestion pipeline which allows you to set up a repeatable, cache-optimized process for loading data.
Abstractions
Section titled “Abstractions”- Document and Node objects and how to customize them for more advanced use cases
Note for AI agents: this documentation is built for programmatic access.
- Overview of all docs: https://developers.llamaindex.ai/llms.txt
- Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md
- Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters.
- A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/