Embedding Models¶
HyperBinder uses embedding models to encode SEMANTIC fields, that is fields where similar text should produce similar results in search.
Any embedding model works. HyperBinder adapts automatically. The default is all-MiniLM-L6-v2 (384 dimensions).
HyperBinder does NOT store raw embeddings
Embeddings are an input to HyperBinder's encoding pipeline. At ingest time, they are transformed into an internal representation, and only that internal representation is stored. At query time, the same transformation runs on the query embedding. Your choice of embedding model affects the quality of this encoding.
Same model for ingest and query
You must use the same embedding model for both ingest and query. Mixing models will produce meaningless results because the internal representation is calibrated to the model used at ingest time.
How Embedding Dimensions Work¶
HyperBinder's internal representation has a fixed capacity (256 dimensions by default for semantic fields). When your embedding model produces vectors of a different size, HyperBinder adapts automatically:
- Smaller embeddings (embed_dim ≤ internal capacity): No fidelity loss. HyperBinder interpolates to fill the representation.
- Larger embeddings (embed_dim > internal capacity): Some compression occurs. HyperBinder auto-detects your model's characteristics and uses the best strategy, but information is necessarily lost.
The compression ratio (embed_dim / internal_dim) determines how much information is preserved:
| Compression Ratio | Quality | Effect |
|---|---|---|
| ≤ 1.0× | Lossless | No information lost |
| 1–2× | Excellent | Semantic ranking well preserved |
| 2–4× | Good | Slight degradation in fine-grained ranking |
| 4×+ | Moderate | Consider using a lower-dimensional model or pre-truncating (see MRL models) |
Recommended Models¶
| Model | Dimensions | Compression @256 | Quality | Notes |
|---|---|---|---|---|
all-MiniLM-L6-v2 |
384 | 1.5× | Excellent | Default, fast, good quality |
nomic-embed-text-v1.5 |
768 | 3× | Good | MRL-trained, compresses well |
text-embedding-3-small |
1536 | 6× | Moderate | Use MRL truncation to 256 for best results |
text-embedding-3-large |
3072 | 12× | Lower | Use MRL truncation to 256–768 first |
Lower-dimensional models often give better end-to-end results because less compression means higher fidelity in HyperBinder's internal representation.
Bringing Your Own Embedding Model¶
There are two ways to provide embeddings to HyperBinder.
encode_fn callback (recommended)¶
Register a function that maps text to embeddings. HyperBinder calls this automatically for both ingest encoding and query encoding.
from sentence_transformers import SentenceTransformer
from hybi import HyperBinder
model = SentenceTransformer("all-MiniLM-L6-v2")
hb = HyperBinder(local=True, encode_fn=model.encode)
hb.ingest(df, collection="docs")
# Queries are automatically encoded using your encode_fn
results = hb.search("machine learning", collection="docs")
The encode_fn should accept a list of strings and return a numpy array (or list of lists) of embeddings:
def my_encode_fn(texts: list[str]) -> np.ndarray:
# Your embedding logic here
return embeddings # shape: (len(texts), embed_dim)
vector_col parameter¶
For pipelines that pre-compute embeddings externally, pass a column name containing embedding vectors at ingest time:
import pandas as pd
# DataFrame with pre-computed embeddings
df = pd.DataFrame({
"text": ["doc one", "doc two"],
"my_embeddings": [embedding_1, embedding_2], # list[float] per row
})
hb.ingest(df, collection="docs", vector_col="my_embeddings")
Note
With vector_col, you are responsible for encoding query vectors yourself. The encode_fn approach handles this automatically and is preferred for most use cases.
MRL (Matryoshka) Models¶
Some embedding models are trained with Matryoshka Representation Learning (MRL), which means you can safely truncate their output to shorter dimensions without retraining. The first N dimensions of an MRL-trained model contain a valid, lower-dimensional embedding.
HyperBinder auto-detects MRL structure — no configuration is needed.
For high-dimensional MRL models, you can pre-truncate embeddings before passing them to HyperBinder. This reduces the compression ratio and preserves more fidelity:
from sentence_transformers import SentenceTransformer
import numpy as np
# text-embedding-3-small produces 1536-dim vectors
# but supports MRL truncation
model = SentenceTransformer("text-embedding-3-small")
def truncated_encode(texts, target_dim=256):
embeddings = model.encode(texts)
truncated = embeddings[:, :target_dim]
# Re-normalize after truncation
norms = np.linalg.norm(truncated, axis=1, keepdims=True)
return truncated / norms
hb = HyperBinder(local=True, encode_fn=truncated_encode)
# Now compression ratio is 256/256 = 1.0× (lossless)
Not all models support MRL truncation. Check your model's documentation before truncating. Models that support it include:
- OpenAI
text-embedding-3-smallandtext-embedding-3-large nomic-embed-text-v1.5- Any model explicitly documented as MRL-trained
Encoding quality over time
HyperBinder's encoding quality improves as it sees more data. The first few rows use a general-purpose encoding strategy. After accumulating statistics about your data distribution, the encoding automatically adapts for better fidelity. No action is needed — this is fully automatic.
Best Practices¶
- Always use the same embedding model for ingest and query. Switching models invalidates the internal representation.
- Lower-dimensional models often give better end-to-end results because less compression means higher fidelity.
- For high-dimensional models, pre-truncate if the model supports MRL. This is the single most effective optimization.
- Don't mix embedding models within a single collection. Each collection should use one consistent model.