Metadata Extraction Usage Pattern
You can use LLMs to automate metadata extraction with our Metadata Extractor modules.
Our metadata extractor modules include the following “feature extractors”:
SummaryExtractor- automatically extracts a summary over a set of NodesQuestionsAnsweredExtractor- extracts a set of questions that each Node can answerTitleExtractor- extracts a title over the context of each NodeEntityExtractor- extracts entities (i.e. names of places, people, things) mentioned in the content of each Node
Then you can chain the Metadata Extractors with our node parser:
from llama_index.core.extractors import ( TitleExtractor, QuestionsAnsweredExtractor,)from llama_index.core.node_parser import TokenTextSplitter
text_splitter = TokenTextSplitter( separator=" ", chunk_size=512, chunk_overlap=128)title_extractor = TitleExtractor(nodes=5)qa_extractor = QuestionsAnsweredExtractor(questions=3)
# assume documents are defined -> extract nodesfrom llama_index.core.ingestion import IngestionPipeline
pipeline = IngestionPipeline( transformations=[text_splitter, title_extractor, qa_extractor])
nodes = pipeline.run( documents=documents, in_place=True, show_progress=True,)or insert into an index:
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents( documents, transformations=[text_splitter, title_extractor, qa_extractor])Resources
Section titled “Resources”Note for AI agents: this documentation is built for programmatic access.
- Overview of all docs: https://developers.llamaindex.ai/llms.txt
- Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md
- Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters.
- A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/