Semantic Cache¶
High-level LLM response caching with semantic matching and context isolation.
Overview¶
SemanticCache wraps HyperBinder's compositional encoding to provide a put/get cache where queries are matched semantically (different wording, same intent) while context keys are matched exactly, preventing cross-context false positives.
from hybi import HyperBinder, SemanticCache
hb = HyperBinder(local=True, encode_fn=model.encode)
cache = SemanticCache(hb, collection="llm_cache")
# Store a response
cache.put(query="What is your return policy?", context="order",
response="Items can be returned within 30 days.")
# Retrieve – semantically similar queries hit the cache
hit = cache.get(query="How do I return something?", context="order")
if hit:
print(hit.response) # "Items can be returned within 30 days."
# Different context -> no cross-contamination
miss = cache.get(query="How do I return something?", context="billing")
assert miss is None
SemanticCache¶
hybi.semantic_cache.SemanticCache
¶
High-level semantic cache for LLM responses.
Uses a Bundle schema internally with four fields:
- query (SEMANTIC): query text — for similarity matching
- context (EXACT): context key — for domain isolation
- response (SEMANTIC, low weight): response text — stored, not searched on
- _cache_key (EXACT, zero weight): content hash for dedup — never searched on
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hb
|
A HyperBinder client instance (local or remote). |
required | |
collection
|
str
|
Name of the backing collection. |
'semantic_cache'
|
threshold
|
float
|
Default similarity threshold for cache hits. Benchmark-tuned default of 0.65 balances recall (83%) with precision (93%). Use >=0.75 for strict context isolation. |
0.65
|
query_weight
|
float
|
Weight for the query (subject) slot. Higher values make query similarity dominate the score. |
2.0
|
response_weight
|
float
|
Weight for the response (object) slot. Keep low so stored responses don't pollute search. |
0.1
|
top_k
|
int
|
Number of candidates to scan during |
5
|
default_ttl
|
Optional[Union[timedelta, int, float]]
|
Default time-to-live for cache entries. Entries older
than this are treated as misses. None means no expiry.
Accepts |
None
|
should_cache
|
Optional[Callable[[str, str, str], bool]]
|
Optional callback |
None
|
embedding_cache_size
|
int
|
Max number of query embeddings to cache in-process. Avoids redundant encoding calls for repeated queries. Set to 0 to disable. |
1024
|
Example::
from hybi import HyperBinder, SemanticCache
hb = HyperBinder(local=True)
cache = SemanticCache(hb, collection="llm_cache")
# Store a response
cache.put(
query="What is your return policy?",
context="order",
response="Items can be returned within 30 days.",
)
# Retrieve — semantically similar queries hit the cache
hit = cache.get(query="How do I return something?", context="order")
if hit:
print(hit.response) # "Items can be returned within 30 days."
# Different context -> no cross-contamination
miss = cache.get(query="How do I return something?", context="billing")
assert miss is None
Example with TTL and filtering::
cache = SemanticCache(
hb,
collection="llm_cache",
default_ttl=timedelta(hours=4),
should_cache=lambda q, ctx, r: "personal" not in r.lower(),
)
cache.put("Pricing info?", "billing", "$9.99/mo", ttl=timedelta(hours=1))
__init__(hb, collection='semantic_cache', *, threshold=0.65, query_weight=2.0, response_weight=0.1, top_k=5, default_ttl=None, should_cache=None, embedding_cache_size=1024, phase_dim=None)
¶
put(query, context, response, *, ttl=None)
¶
Store a cache entry.
If an entry with the same query+context already exists, it is replaced (upsert semantics).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
The query text (matched semantically on lookup). |
required |
context
|
str
|
Context key for isolation (matched exactly). |
required |
response
|
str
|
The response to cache. |
required |
ttl
|
Optional[Union[timedelta, int, float]]
|
Time-to-live for this entry. Overrides |
None
|
get(query, context=None, *, threshold=None, top_k=None)
¶
Look up a cache entry by semantic similarity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
The query text to match against cached entries. |
required |
context
|
Optional[str]
|
Context key — must match exactly. If None, searches across all contexts. |
None
|
threshold
|
Optional[float]
|
Override the default similarity threshold. |
None
|
top_k
|
Optional[int]
|
Override the default number of candidates to scan. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
Optional[CacheHit]
|
class: |
seed(data)
¶
Batch-populate the cache from a DataFrame or list of dicts.
The data must contain columns: query, context, response.
An optional expires_at column (epoch float) is respected if present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Union[DataFrame, List[Dict[str, str]]]
|
DataFrame or list of dicts with cache entries. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of entries seeded. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required columns are missing. |
invalidate(*, context=None)
¶
Remove cache entries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
Optional[str]
|
If provided, remove only entries with this context.
If None, remove all entries (same as :meth: |
None
|
clear()
¶
Remove all cache entries.
stats()
¶
Get cache statistics.
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with keys: |
exists()
¶
Check if the backing collection exists.
CacheHit¶
hybi.semantic_cache.CacheHit
dataclass
¶
Result from a successful cache lookup.
Attributes:
| Name | Type | Description |
|---|---|---|
response |
str
|
The cached response text. |
score |
float
|
Similarity score of the match (0-1). |
query |
str
|
The original cached query text that matched. |
context |
str
|
The context key that was matched. |
data |
Dict[str, Any]
|
Full row data dict for extensibility. |
Constructor Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
hb |
HyperBinder |
required | A HyperBinder client (local or remote) |
collection |
str |
"semantic_cache" |
Name of the backing collection |
threshold |
float |
0.65 |
Similarity threshold for cache hits |
query_weight |
float |
2.0 |
Weight for query similarity (higher = query dominates) |
response_weight |
float |
0.1 |
Weight for response slot (keep low to avoid search pollution) |
top_k |
int |
5 |
Number of candidates to scan per get() |
default_ttl |
timedelta, int, float, or None |
None |
Default time-to-live. None = no expiry |
should_cache |
Callable or None |
None |
Filter callback (query, context, response) -> bool |
embedding_cache_size |
int |
1024 |
In-process embedding LRU cache size. 0 to disable |
| --- | |||
| ## Method Summary | |||
| Method | Description | ||
| -------- | ------------- | ||
put(query, context, response, *, ttl) |
Store a cache entry | ||
get(query, context, *, threshold) |
Look up by semantic similarity | ||
seed(df) |
Batch-populate from a DataFrame | ||
invalidate(*, context) |
Remove entries (all or by context) | ||
clear() |
Remove all entries (alias for invalidate()) |
||
stats() |
Get entry count and context list | ||
exists() |
Check if the backing collection exists |