Usage Pattern
Estimating LLM and Embedding Token Counts
Section titled “Estimating LLM and Embedding Token Counts”In order to measure LLM and Embedding token counts, you’ll need to
- Setup
MockLLMandMockEmbeddingobjects
from llama_index.core.llms import MockLLMfrom llama_index.core import MockEmbedding
llm = MockLLM(max_tokens=256)embed_model = MockEmbedding(embed_dim=1536)- Setup the
TokenCountingCallbackhandler
import tiktokenfrom llama_index.core.callbacks import CallbackManager, TokenCountingHandler
token_counter = TokenCountingHandler( tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode)
callback_manager = CallbackManager([token_counter])- Add them to the global
Settings
from llama_index.core import Settings
Settings.llm = llmSettings.embed_model = embed_modelSettings.callback_manager = callback_manager- Construct an Index
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader( "./docs/examples/data/paul_graham").load_data()
index = VectorStoreIndex.from_documents(documents)- Measure the counts!
print( "Embedding Tokens: ", token_counter.total_embedding_token_count, "\n", "LLM Prompt Tokens: ", token_counter.prompt_llm_token_count, "\n", "LLM Completion Tokens: ", token_counter.completion_llm_token_count, "\n", "Total LLM Token Count: ", token_counter.total_llm_token_count, "\n",)
# reset countstoken_counter.reset_counts()- Run a query, measure again
query_engine = index.as_query_engine()
response = query_engine.query("query")
print( "Embedding Tokens: ", token_counter.total_embedding_token_count, "\n", "LLM Prompt Tokens: ", token_counter.prompt_llm_token_count, "\n", "LLM Completion Tokens: ", token_counter.completion_llm_token_count, "\n", "Total LLM Token Count: ", token_counter.total_llm_token_count, "\n",)Note for AI agents: this documentation is built for programmatic access.
- Overview of all docs: https://developers.llamaindex.ai/llms.txt
- Any page is available as raw Markdown by appending index.md to its URL — e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md
- Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ — search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters.
- A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/