Metadata Extraction Usage Pattern
You can use LLMs to automate metadata extraction with our Metadata Extractor modules.
Our metadata extractor modules include the following “feature extractors”:
- SummaryExtractor- automatically extracts a summary over a set of Nodes
- QuestionsAnsweredExtractor- extracts a set of questions that each Node can answer
- TitleExtractor- extracts a title over the context of each Node
- EntityExtractor- extracts entities (i.e. names of places, people, things) mentioned in the content of each Node
Then you can chain the Metadata Extractors with our node parser:
from llama_index.core.extractors import (    TitleExtractor,    QuestionsAnsweredExtractor,)from llama_index.core.node_parser import TokenTextSplitter
text_splitter = TokenTextSplitter(    separator=" ", chunk_size=512, chunk_overlap=128)title_extractor = TitleExtractor(nodes=5)qa_extractor = QuestionsAnsweredExtractor(questions=3)
# assume documents are defined -> extract nodesfrom llama_index.core.ingestion import IngestionPipeline
pipeline = IngestionPipeline(    transformations=[text_splitter, title_extractor, qa_extractor])
nodes = pipeline.run(    documents=documents,    in_place=True,    show_progress=True,)or insert into an index:
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(    documents, transformations=[text_splitter, title_extractor, qa_extractor])