File Based Node Parsers
The SimpleFileNodeParser
and FlatReader
are designed to allow opening a variety of file types and automatically selecting the best NodeParser
to process the files. The FlatReader
loads the file in a raw text format and attaches the file information to the metadata, then the SimpleFileNodeParser
maps file types to node parsers in node_parser/file
, selecting the best node parser for the job.
The SimpleFileNodeParser
does not perform token based chunking of the text, and is intended to be used in combination with a token node parser.
Letβs look at an example of using the FlatReader
and SimpleFileNodeParser
to load content. For the README file I will be using the LlamaIndex README and the HTML file is the Stack Overflow landing page, however any README and HTML file will work.
If youβre opening this Notebook on colab, you will probably need to install LlamaIndex π¦.
%pip install llama-index-readers-file
!pip install llama-index
from llama_index.core.node_parser import SimpleFileNodeParserfrom llama_index.readers.file import FlatReaderfrom pathlib import Path
/Users/adamhofmann/opt/anaconda3/lib/python3.9/site-packages/langchain/__init__.py:24: UserWarning: Importing BasePromptTemplate from langchain root module is no longer supported. warnings.warn(/Users/adamhofmann/opt/anaconda3/lib/python3.9/site-packages/langchain/__init__.py:24: UserWarning: Importing PromptTemplate from langchain root module is no longer supported. warnings.warn(
reader = FlatReader()html_file = reader.load_data(Path("./stack-overflow.html"))md_file = reader.load_data(Path("./README.md"))print(html_file[0].metadata)print(html_file[0])print("----")print(md_file[0].metadata)print(md_file[0])
{'filename': 'stack-overflow.html', 'extension': '.html'}Doc ID: a6750408-b0fa-466d-be28-ff2fcbcbaa97Text: <!DOCTYPE html> <html class="html__responsivehtml__unpinned-leftnav" lang="en"> <head> <title>StackOverflow - Where Developers Learn, Share, & Build Careers</title><link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196"> <link rel="apple-touch-icon" hr...----{'filename': 'README.md', 'extension': '.md'}Doc ID: 1d872f44-2bb3-4693-a1b8-a59392c23be2Text: # ποΈ LlamaIndex π¦ [](https://pypi.org/project/llama-index/) [](https://github.com/jerryjliu/llama_index/graphs/contributors) [](https:...
Parsing the files
Section titled βParsing the filesβThe flat reader has simple loaded the content of the files into Document objects for further processing. We can see that the file information is retained in the metadata. Letβs pass the documents to the node parser to see the parsing.
parser = SimpleFileNodeParser()md_nodes = parser.get_nodes_from_documents(md_file)html_nodes = parser.get_nodes_from_documents(html_file)print(md_nodes[0].metadata)print(md_nodes[0].text)print(md_nodes[1].metadata)print(md_nodes[1].text)print("----")print(html_nodes[0].metadata)print(html_nodes[0].text)
{'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦'}ποΈ LlamaIndex π¦[](https://pypi.org/project/llama-index/)[](https://github.com/jerryjliu/llama_index/graphs/contributors)[](https://discord.gg/dGcwcsnxhU)
LlamaIndex (GPT Index) is a data framework for your LLM application.
PyPI:- LlamaIndex: https://pypi.org/project/llama-index/.- GPT Index (duplicate): https://pypi.org/project/gpt-index/.
LlamaIndex.TS (Typescript/Javascript): https://github.com/run-llama/LlamaIndexTS.
Documentation: https://gpt-index.readthedocs.io/.
Twitter: https://twitter.com/llama_index.
Discord: https://discord.gg/dGcwcsnxhU.{'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 3': 'Ecosystem'}Ecosystem
- LlamaHub (community library of data loaders): https://llamahub.ai- LlamaLab (cutting-edge AGI projects using LlamaIndex): https://github.com/run-llama/llama-lab----{'filename': 'stack-overflow.html', 'extension': '.html', 'tag': 'li'}AboutProductsFor TeamsStack OverflowPublic questions & answersStack Overflow for TeamsWhere developers & technologists share private knowledge with coworkersTalent
Build your employer brandAdvertisingReach developers & technologists worldwideLabsThe future of collective knowledge sharingAbout the companycurrent community
Stack Overflow
helpchat
Meta Stack Overflow
your communities
Sign up or log in to customize your list.
more stack exchange communities
company blog
Furter processing of files
Section titled βFurter processing of filesβWe can see that the Markdown and HTML files have been split into chunks based on the structure of the document. The markdown node parser splits on any headers and attaches the hierarchy of headers into metadata. The HTML node parser extracted text from common text elements to simplifiy the HTML file, and combines neighbouring nodes of the same element. Compared to working with raw HTML, this is alreadly a big improvement in terms of retrieving meaningful text content.
Because these files were only split according to the structure of the file, we can apply further processing with a text splitter to prepare the content into nodes of limited token length.
from llama_index.core.node_parser import SentenceSplitter
# For clarity in the demo, make small splits without overlapsplitting_parser = SentenceSplitter(chunk_size=200, chunk_overlap=0)
html_chunked_nodes = splitting_parser(html_nodes)md_chunked_nodes = splitting_parser(md_nodes)print(f"\n\nHTML parsed nodes: {len(html_nodes)}")print(html_nodes[0].text)
print(f"\n\nHTML chunked nodes: {len(html_chunked_nodes)}")print(html_chunked_nodes[0].text)
print(f"\n\nMD parsed nodes: {len(md_nodes)}")print(md_nodes[0].text)
print(f"\n\nMD chunked nodes: {len(md_chunked_nodes)}")print(md_chunked_nodes[0].text)
HTML parsed nodes: 67AboutProductsFor TeamsStack OverflowPublic questions & answersStack Overflow for TeamsWhere developers & technologists share private knowledge with coworkersTalent
Build your employer brandAdvertisingReach developers & technologists worldwideLabsThe future of collective knowledge sharingAbout the companycurrent community
Stack Overflow
helpchat
Meta Stack Overflow
your communities
Sign up or log in to customize your list.
more stack exchange communities
company blog
HTML chunked nodes: 87AboutProductsFor TeamsStack OverflowPublic questions & answersStack Overflow for TeamsWhere developers & technologists share private knowledge with coworkersTalent
Build your employer brandAdvertisingReach developers & technologists worldwideLabsThe future of collective knowledge sharingAbout the companycurrent community
Stack Overflow
helpchat
Meta Stack Overflow
your communities
MD parsed nodes: 10ποΈ LlamaIndex π¦[](https://pypi.org/project/llama-index/)[](https://github.com/jerryjliu/llama_index/graphs/contributors)[](https://discord.gg/dGcwcsnxhU)
LlamaIndex (GPT Index) is a data framework for your LLM application.
PyPI:- LlamaIndex: https://pypi.org/project/llama-index/.- GPT Index (duplicate): https://pypi.org/project/gpt-index/.
LlamaIndex.TS (Typescript/Javascript): https://github.com/run-llama/LlamaIndexTS.
Documentation: https://gpt-index.readthedocs.io/.
Twitter: https://twitter.com/llama_index.
Discord: https://discord.gg/dGcwcsnxhU.
MD chunked nodes: 13ποΈ LlamaIndex π¦[](https://pypi.org/project/llama-index/)[](https://github.com/jerryjliu/llama_index/graphs/contributors)[](https://discord.gg/dGcwcsnxhU)
Summary
Section titled βSummaryβWe can see that the files have been further processed within the splits created by SimpleFileNodeParser
, and are now ready to be ingested by an index or vector store. The code cell below shows just the chaining of the parsers to go from raw file to chunked nodes:
from llama_index.core.ingestion import IngestionPipeline
pipeline = IngestionPipeline( documents=reader.load_data(Path("./README.md")), transformations=[ SimpleFileNodeParser(), SentenceSplitter(chunk_size=200, chunk_overlap=0), ],)
md_chunked_nodes = pipeline.run()print(md_chunked_nodes)
[TextNode(id_='e6236169-45a1-4699-9762-c8d3d89f8fa0', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e7bc328f-85c1-430a-9772-425e59909a58', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦'}, hash='e538ad7c04f635f1c707eba290b55618a9f0942211c4b5ca2a4e54e1fdf04973'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='51b40b54-dfd3-48ed-b377-5ca58a0f48a3', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦'}, hash='ca9e3590b951f1fca38687fd12bb43fbccd0133a38020c94800586b3579c3218')}, hash='ec733c85ad1dca248ae583ece341428ee20e4d796bc11adea1618c8e4ed9246a', text='ποΈ LlamaIndex π¦\n[](https://pypi.org/project/llama-index/)\n[](https://github.com/jerryjliu/llama_index/graphs/contributors)\n[](https://discord.gg/dGcwcsnxhU)', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='51b40b54-dfd3-48ed-b377-5ca58a0f48a3', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e7bc328f-85c1-430a-9772-425e59909a58', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦'}, hash='e538ad7c04f635f1c707eba290b55618a9f0942211c4b5ca2a4e54e1fdf04973'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='e6236169-45a1-4699-9762-c8d3d89f8fa0', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦'}, hash='ec733c85ad1dca248ae583ece341428ee20e4d796bc11adea1618c8e4ed9246a')}, hash='ca9e3590b951f1fca38687fd12bb43fbccd0133a38020c94800586b3579c3218', text='LlamaIndex (GPT Index) is a data framework for your LLM application.\n\nPyPI: \n- LlamaIndex: https://pypi.org/project/llama-index/.\n- GPT Index (duplicate): https://pypi.org/project/gpt-index/.\n\nLlamaIndex.TS (Typescript/Javascript): https://github.com/run-llama/LlamaIndexTS.\n\nDocumentation: https://gpt-index.readthedocs.io/.\n\nTwitter: https://twitter.com/llama_index.\n\nDiscord: https://discord.gg/dGcwcsnxhU.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='ce269047-4718-4a08-b170-34fef19cdafe', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 3': 'Ecosystem'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='953934dc-dd4f-4069-9e2a-326ee8a593bf', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 3': 'Ecosystem'}, hash='ede2843c0f18e0f409ae9e2bb4090bca4409eaa992fe8ca149295406d3d7adac')}, hash='52b03025c73d7218bd4d66b9812f6e1f6fab6ccf64e5660dc31d123bf1caf5be', text='Ecosystem\n\n- LlamaHub (community library of data loaders): https://llamahub.ai\n- LlamaLab (cutting-edge AGI projects using LlamaIndex): https://github.com/run-llama/llama-lab', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='5ef55167-1fa1-4cae-b2b5-4a86beffbef6', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2223925f-93a8-45db-9044-41838633e8cc', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview'}, hash='adc49240ff2bdd007e3462b2c3d3f6b6f3b394abbf043d4c291b1a029302c909')}, hash='dc3f175a9119976866e3e6fb2233a12590e8861dc91c621db131521d84e490c4', text='π Overview\n\n**NOTE**: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='8b8e4778-7943-424c-a160-b7da845dd7da', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Context'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='c1ea3027-aad7-4a6f-b8dc-460a8ffbc258', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Context'}, hash='632c76181233b32c03377ccc3d41e458aaec7de845d123a20ace6e3036bbdcd7')}, hash='b867ce7afa1cee176db4e5d0b147276c2e4c724223d590dd5017e68fab3aa29a', text='Context\n- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.\n- How do we best augment LLMs with our own private data?\n\nWe need a comprehensive toolkit to help perform this data augmentation for LLMs.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='be9d228a-91f6-4c39-845d-b79d3b8fa874', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Proposed Solution'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f57a202a-cb3d-4a74-ab09-70bf93a0bf51', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Proposed Solution'}, hash='4d338f21570da1564e407877e2fceac4dc9e9f8c90cb3b34876507f85d29f41e'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='a18e1c90-0455-47be-9411-8e098df9c951', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Proposed Solution'}, hash='7b9bbe433d53e727b353864a38ad8a9e78b74c84dbef4ca931422f0f45a4906d')}, hash='b02a43b52686c62c8c4a2f32aa7b8a5bcf2a9e9ea7a033430645ec492f04a4fd', text='Proposed Solution\n\nThat\'s where **LlamaIndex** comes in. LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:\n\n- Offers **data connectors** to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)\n- Provides ways to **structure your data** (indices, graphs) so that this data can be easily used with LLMs.\n- Provides an **advanced retrieval/query interface over your data**: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.\n- Allows easy integrations with your outer application framework (e.g.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='a18e1c90-0455-47be-9411-8e098df9c951', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Proposed Solution'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f57a202a-cb3d-4a74-ab09-70bf93a0bf51', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Proposed Solution'}, hash='4d338f21570da1564e407877e2fceac4dc9e9f8c90cb3b34876507f85d29f41e'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='be9d228a-91f6-4c39-845d-b79d3b8fa874', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Overview', 'Header 3': 'Proposed Solution'}, hash='b02a43b52686c62c8c4a2f32aa7b8a5bcf2a9e9ea7a033430645ec492f04a4fd')}, hash='7b9bbe433d53e727b353864a38ad8a9e78b74c84dbef4ca931422f0f45a4906d', text='with LangChain, Flask, Docker, ChatGPT, anything else).\n\nLlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in\n5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules),\nto fit their needs.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='b3c6544a-6f68-4060-b3ec-27e5d4b9a599', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π‘ Contributing'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='6abcec78-98c1-4f74-b57b-d8cae4aa7112', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π‘ Contributing'}, hash='cdb950bc1703132df9c05c607702201177c1ad5f8f0de9dcfa3f6154a12a3acd')}, hash='4892fb635ac6b11743ca428676ed492ef7d264e440a205a68a0d752d43e3a19c', text='π‘ Contributing\n\nInterested in contributing? See our [Contribution Guide](CONTRIBUTING.md) for more details.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='e0fc56d6-ec94-476d-a3e4-c007daa2e405', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Documentation'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f44afbd2-0bf3-46f5-8662-309e0cf7fa9c', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Documentation'}, hash='b01a7435fcbe2962f9b6a2cb397a07c1fed6632941e06a1814f4c4ea2300dc67')}, hash='f0215c48bf198d05ee1d6dcc74e12f70d9310c43f4b4dcea71452c9aec051612', text='π Documentation\n\nFull documentation can be found here: https://gpt-index.readthedocs.io/en/latest/. \n\nPlease check it out for the most up-to-date tutorials, how-to guides, references, and other resources!', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='b583e1f6-e696-42e3-9c87-fa1a12af5cc9', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π» Example Usage'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f25c47c0-b8bd-451b-81bf-3879c48c55f4', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π» Example Usage'}, hash='dfe232d846ceae9f0ccbf96e053b01a00cf24382ff4f49f1380830522d8ae86c'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='82fcab04-4346-4fba-86ae-612e95285c8a', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π» Example Usage'}, hash='fe6196075f613ebae9f64bf5b1e04d8324c239e8f256d4455653ccade1da5541')}, hash='9073dfc928908788a3e174fe06f4689c081a6eeafe002180134a57c28c640c83', text='π» Example Usage\n\n```\npip install llama-index\n```\n\nExamples are in the `examples` folder. Indices are in the `indices` folder (see list of indices below).\n\nTo build a simple vector store index:\n```python\nimport os\nos.environ["OPENAI_API_KEY"] = \'YOUR_OPENAI_API_KEY\'\n\nfrom llama_index import VectorStoreIndex, SimpleDirectoryReader\ndocuments = SimpleDirectoryReader(\'data\').load_data()\nindex = VectorStoreIndex.from_documents(documents)\n```', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='82fcab04-4346-4fba-86ae-612e95285c8a', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π» Example Usage'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='f25c47c0-b8bd-451b-81bf-3879c48c55f4', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π» Example Usage'}, hash='dfe232d846ceae9f0ccbf96e053b01a00cf24382ff4f49f1380830522d8ae86c'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='b583e1f6-e696-42e3-9c87-fa1a12af5cc9', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π» Example Usage'}, hash='9073dfc928908788a3e174fe06f4689c081a6eeafe002180134a57c28c640c83')}, hash='fe6196075f613ebae9f64bf5b1e04d8324c239e8f256d4455653ccade1da5541', text='To query:\n```python\nquery_engine = index.as_query_engine()\nquery_engine.query("<question_text>?")\n```\n\n\nBy default, data is stored in-memory.\nTo persist to disk (under `./storage`):\n\n```python\nindex.storage_context.persist()\n```\n\nTo reload from disk:\n```python\nfrom llama_index import StorageContext, load_index_from_storage\n\n# rebuild storage context\nstorage_context = StorageContext.from_defaults(persist_dir=\'./storage\')\n# load index\nindex = load_index_from_storage(storage_context)\n```', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='b2c3437a-7cef-4990-ab3e-6b3f293f3d9f', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π§ Dependencies'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='0f9e96b7-9a47-4053-8a43-b27a444910ee', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π§ Dependencies'}, hash='3302ab107310e381d572f2410e8994d0b3737b78acc7729c18f8b7f100fd0078')}, hash='28d0ed4496c3bd0a8f0ace18c11be509eadfae4693a3a239c80a5ec1a6eaedd6', text='π§ Dependencies\n\nThe main third-party package requirements are `tiktoken`, `openai`, and `langchain`.\n\nAll requirements should be contained within the `setup.py` file. To run the package locally without building the wheel, simply run `pip install -r requirements.txt`.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), TextNode(id_='a5af8ac3-57dd-4ed7-ab7f-fab6fb435a42', embedding=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Citation'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='12629a60-c584-4ec9-888d-ea120813f4df', node_type=None, metadata={'filename': 'README.md', 'extension': '.md', 'Header 1': 'ποΈ LlamaIndex π¦', 'Header 2': 'π Citation'}, hash='ad2d72754f9faa42727bd38ba84f71ad43c9d65bc1b12a8c46d5dc951212f863')}, hash='f7df46992fbea69c394e73961c4d17ea0b49a587420b0c9f47986af12f787950', text='π Citation\n\nReference to cite if you use LlamaIndex in a paper:\n\n```\n@software{Liu_LlamaIndex_2022,\nauthor = {Liu, Jerry},\ndoi = {10.5281/zenodo.1234},\nmonth = {11},\ntitle = {{LlamaIndex}},\nurl = {https://github.com/jerryjliu/llama_index},\nyear = {2022}\n}\n```', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]