Mixedbread AI Embeddings
Explore the capabilities of MixedBread AI’s embedding models with custom encoding formats (binary, int, float, base64, etc.), embedding dimensions (Matryoshka) and context prompts.
If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-embeddings-mixedbreadai
!pip install llama-index
import osfrom llama_index.embeddings.mixedbreadai import MixedbreadAIEmbedding
# API Key and Embedding Initialization
# You can visit https://www.mixedbread.ai/api-reference#quick-start-guide# to get an api keymixedbread_api_key = os.environ.get("MXBAI_API_KEY", "your-api-key")
# Please check https://www.mixedbread.ai/docs/embeddings/models#whats-new-in-the-mixedbread-embed-model-family# for our embedding modelsmodel_name = "mixedbread-ai/mxbai-embed-large-v1"
oven = MixedbreadAIEmbedding(api_key=mixedbread_api_key, model_name=model_name)
embeddings = oven.get_query_embedding("Why bread is so tasty?")
print(len(embeddings))print(embeddings[:5])
1024[0.01128387451171875, 0.031097412109375, -0.00606536865234375, 0.0291748046875, -0.038604736328125]
Using prompt for contextual embedding
Section titled “Using prompt for contextual embedding”The prompt can improve the model’s understanding of how the embedding will be used in subsequent tasks, which in turn increases the performance. Our experiments show that having domain specific prompts can increase the performance.
prompt_for_retrieval = ( "Represent this sentence for searching relevant passages:")
contextual_oven = MixedbreadAIEmbedding( api_key=mixedbread_api_key, model_name=model_name, prompt=prompt_for_retrieval,)
contextual_embeddings = contextual_oven.get_query_embedding( "What bread is invented in Germany?")
print(len(contextual_embeddings))print(contextual_embeddings[:5])
1024[-0.0235443115234375, -0.0152435302734375, 0.008392333984375, 0.00336456298828125, -0.044647216796875]
Quantization and Matryoshka support
Section titled “Quantization and Matryoshka support”The Mixedbread AI embedding supports quantization and matryoshka to reduce the size of embeddings for better storage while retaining most of the performance. See these posts for more information:
- Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval
- 64 bytes per embedding, yee-haw.
Using different encoding formats
Section titled “Using different encoding formats”The default encoding_format
is float
. We also support float16
, binary
, ubinary
, int8
, uint8
, base64
.
# with `binary` embedding typesbinary_oven = MixedbreadAIEmbedding( api_key=mixedbread_api_key, model_name=model_name, encoding_format="binary",)
binary_embeddings = binary_oven.get_text_embedding( "The bread is tiny but still filling!")
print(len(binary_embeddings))print(binary_embeddings[:5])
128[-121.0, 96.0, -108.0, 111.0, 110.0]
Using different embedding dimensions
Section titled “Using different embedding dimensions”Mixedbread AI embedding models support Matryoshka dimension truncation. The default dimension is set to the model’s maximum. Keep an eye on our website to see what models support Matryoshka.
# with truncated dimensionhalf_oven = MixedbreadAIEmbedding( api_key=mixedbread_api_key, model_name=model_name, dimensions=512, # 1024 is the maximum of `mxbai-embed-large-v1`)
half_embeddings = half_oven.get_text_embedding( "I want the better half of my bread.")
print(len(half_embeddings))print(half_embeddings[:5])
512[-0.014221191, -0.013671875, -0.03314209, 0.025909424, -0.035095215]