Text Embedding Inference
This notebook demonstrates how to configure TextEmbeddingInference embeddings.
The first step is to deploy the embeddings server. For detailed instructions, see the official repository for Text Embeddings Inference. Or tei-gaudi repository if you are deploying on Habana Gaudi/Gaudi 2.
Once deployed, the code below will connect to and submit embeddings for inference.
If youβre opening this Notebook on colab, you will probably need to install LlamaIndex π¦.
%pip install llama-index-embeddings-text-embeddings-inference!pip install llama-indexfrom llama_index.embeddings.text_embeddings_inference import ( TextEmbeddingsInference,)
embed_model = TextEmbeddingsInference( model_name="BAAI/bge-large-en-v1.5", # required for formatting inference text, timeout=60, # timeout in seconds embed_batch_size=10, # batch size for embedding)embeddings = embed_model.get_text_embedding("Hello World!")print(len(embeddings))print(embeddings[:5])1024[0.010597229, 0.05895996, 0.022445679, -0.012046814, -0.03164673]embeddings = await embed_model.aget_text_embedding("Hello World!")print(len(embeddings))print(embeddings[:5])1024[0.010597229, 0.05895996, 0.022445679, -0.012046814, -0.03164673]Note for AI agents: this documentation is built for programmatic access.
- Overview of all docs: https://developers.llamaindex.ai/llms.txt
- Any page is available as raw Markdown by appending index.md to its URL β e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md
- Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ β search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters.
- A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/