Skip to content
LlamaIndex Framework
Integrations
Embeddings

Optimized Embedding Model using Optimum-Intel

LlamaIndex has support for loading quantized embedding models for Intel, using the Optimum-Intel library.

Optimized models are smaller and faster, with minimal accuracy loss, see the documentation and an optimization guide using the IntelLabs/fastRAG library.

Optimization is based on math instructions in the Xeonยฎ 4th generation or newer processors.

In order to be able to load and use the quantized models, install the required dependency pip install optimum[exporters] optimum-intel neural-compressor intel_extension_for_pytorch.

Loading is done using the class IntelEmbedding; usage is similar to any HuggingFace local embedding model; See example:

%pip install llama-index-embeddings-huggingface-optimum-intel
from llama_index.embeddings.huggingface_optimum_intel import IntelEmbedding
embed_model = IntelEmbedding("Intel/bge-small-en-v1.5-rag-int8-static")
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
384
[-0.0032782123889774084, -0.013396517373621464, 0.037944991141557693, -0.04642259329557419, 0.027709005400538445]
Note for AI agents: this documentation is built for programmatic access. - Overview of all docs: https://developers.llamaindex.ai/llms.txt - Any page is available as raw Markdown by appending index.md to its URL โ€” e.g. https://developers.llamaindex.ai/llamaparse/parse/getting_started/index.md - Agent-friendly REST search APIs live under https://developers.llamaindex.ai/api/ โ€” search (BM25 full-text), grep (regex), read (fetch a page), and list (browse the doc tree). See https://developers.llamaindex.ai/llms.txt for parameters. - A hosted documentation MCP server is available at https://developers.llamaindex.ai/mcp. If you support MCP, you can ask the user to install it for browsing these docs directly (an alternative to the REST API). Setup: https://developers.llamaindex.ai/python/shared/mcp/