Semantic Cache¶

High-level LLM response caching with semantic matching and context isolation.

Overview¶

SemanticCache wraps HyperBinder's compositional encoding to provide a put/get cache where queries are matched semantically (different wording, same intent) while context keys are matched exactly, preventing cross-context false positives.

from hybi import HyperBinder, SemanticCache
hb = HyperBinder("http://localhost:8000")
cache = SemanticCache(hb, collection="llm_cache")
# Store a response
cache.put(query="What is your return policy?", context="order",
          response="Items can be returned within 30 days.")
# Retrieve – semantically similar queries hit the cache
hit = cache.get(query="How do I return something?", context="order")
if hit:
    print(hit.response)  # "Items can be returned within 30 days."
# Different context -> no cross-contamination
miss = cache.get(query="How do I return something?", context="billing")
assert miss is None

SemanticCache¶

`hybi.semantic_cache.SemanticCache` ¶

High-level semantic cache for LLM responses.

Uses a Bundle schema internally with four fields:

query (SEMANTIC): query text — for similarity matching
context (EXACT): context key — for domain isolation
response (SEMANTIC, low weight): response text — stored, not searched on
_cache_key (EXACT, zero weight): content hash for dedup — never searched on

Parameters:

Name	Type	Description	Default
`hb`		A HyperBinder client instance (local or remote).	required
`collection`	`str`	Name of the backing collection.	`'semantic_cache'`
`threshold`	`float`	Default similarity threshold for cache hits. Benchmark-tuned default of 0.65 balances recall (83%) with precision (93%). Use >=0.75 for strict context isolation.	`0.65`
`query_weight`	`float`	Weight for the query (subject) slot. Higher values make query similarity dominate the score.	`2.0`
`response_weight`	`float`	Weight for the response (object) slot. Keep low so stored responses don't pollute search.	`0.1`
`top_k`	`int`	Number of candidates to scan during `get()`.	`5`
`default_ttl`	`Optional[Union[timedelta, int, float]]`	Default time-to-live for cache entries. Entries older than this are treated as misses. None means no expiry. Accepts `timedelta` or seconds as `int`/`float`.	`None`
`should_cache`	`Optional[Callable[[str, str, str], bool]]`	Optional callback `(query, context, response) -> bool` called before storing an entry. Return False to skip caching (e.g. for personalized or transactional responses).	`None`
`embedding_cache_size`	`int`	Max number of query embeddings to cache in-process. Avoids redundant encoding calls for repeated queries. Set to 0 to disable.	`1024`

Example::

from hybi import HyperBinder, SemanticCache

hb = HyperBinder(local=True)
cache = SemanticCache(hb, collection="llm_cache")

# Store a response
cache.put(
    query="What is your return policy?",
    context="order",
    response="Items can be returned within 30 days.",
)

# Retrieve — semantically similar queries hit the cache
hit = cache.get(query="How do I return something?", context="order")
if hit:
    print(hit.response)  # "Items can be returned within 30 days."

# Different context -> no cross-contamination
miss = cache.get(query="How do I return something?", context="billing")
assert miss is None

Example with TTL and filtering::

cache = SemanticCache(
    hb,
    collection="llm_cache",
    default_ttl=timedelta(hours=4),
    should_cache=lambda q, ctx, r: "personal" not in r.lower(),
)
cache.put("Pricing info?", "billing", "$9.99/mo", ttl=timedelta(hours=1))

`init(hb, collection='semantic_cache', *, threshold=0.65, query_weight=2.0, response_weight=0.1, top_k=5, default_ttl=None, should_cache=None, embedding_cache_size=1024, phase_dim=None)` ¶

`put(query, context, response, *, ttl=None)` ¶

Store a cache entry.

If an entry with the same query+context already exists, it is replaced (upsert semantics).

Parameters:

Name	Type	Description	Default
`query`	`str`	The query text (matched semantically on lookup).	required
`context`	`str`	Context key for isolation (matched exactly).	required
`response`	`str`	The response to cache.	required
`ttl`	`Optional[Union[timedelta, int, float]]`	Time-to-live for this entry. Overrides `default_ttl`. Accepts `timedelta` or seconds as `int`/`float`. None uses the cache's default_ttl.	`None`

`get(query, context=None, *, threshold=None, top_k=None)` ¶

Look up a cache entry by semantic similarity.

Parameters:

Name	Type	Description	Default
`query`	`str`	The query text to match against cached entries.	required
`context`	`Optional[str]`	Context key — must match exactly. If None, searches across all contexts.	`None`
`threshold`	`Optional[float]`	Override the default similarity threshold.	`None`
`top_k`	`Optional[int]`	Override the default number of candidates to scan.	`None`

Returns:

Name	Type	Description
`A`	`Optional[CacheHit]`	class:`CacheHit` if a matching entry is found, otherwise None.

`seed(data)` ¶

Batch-populate the cache from a DataFrame or list of dicts.

The data must contain columns: query, context, response. An optional expires_at column (epoch float) is respected if present.

Parameters:

Name	Type	Description	Default
`data`	`Union[DataFrame, List[Dict[str, str]]]`	DataFrame or list of dicts with cache entries.	required

Returns:

Type	Description
`int`	Number of entries seeded.

Raises:

Type	Description
`ValueError`	If required columns are missing.

`invalidate(*, context=None)` ¶

Remove cache entries.

Parameters:

Name	Type	Description	Default
`context`	`Optional[str]`	If provided, remove only entries with this context. If None, remove all entries (same as :meth:`clear`).	`None`

`clear()` ¶

Remove all cache entries.

`stats()` ¶

Get cache statistics.

Returns:

Type	Description
`Dict[str, Any]`	Dict with keys: `count`, `contexts`, `collection`.

`exists()` ¶

Check if the backing collection exists.

CacheHit¶

`hybi.semantic_cache.CacheHit` `dataclass` ¶

Result from a successful cache lookup.

Attributes:

Name	Type	Description
`response`	`str`	The cached response text.
`score`	`float`	Similarity score of the match (0-1).
`query`	`str`	The original cached query text that matched.
`context`	`str`	The context key that was matched.
`data`	`Dict[str, Any]`	Full row data dict for extensibility.

Constructor Parameters¶

Parameter	Type	Default	Description
`hb`	`HyperBinder`	required	A HyperBinder client (local or remote)
`collection`	`str`	`"semantic_cache"`	Name of the backing collection
`threshold`	`float`	`0.65`	Similarity threshold for cache hits
`query_weight`	`float`	`2.0`	Weight for query similarity (higher = query dominates)
`response_weight`	`float`	`0.1`	Weight for response slot (keep low to avoid search pollution)
`top_k`	`int`	`5`	Number of candidates to scan per `get()`
`default_ttl`	`timedelta`, `int`, `float`, or `None`	`None`	Default time-to-live. `None` = no expiry
`should_cache`	`Callable` or `None`	`None`	Filter callback `(query, context, response) -> bool`
`embedding_cache_size`	`int`	`1024`	In-process embedding LRU cache size. `0` to disable
---
## Method Summary
Method	Description
--------	-------------
`put(query, context, response, *, ttl)`	Store a cache entry
`get(query, context, *, threshold)`	Look up by semantic similarity
`seed(df)`	Batch-populate from a DataFrame
`invalidate(*, context)`	Remove entries (all or by context)
`clear()`	Remove all entries (alias for `invalidate()`)
`stats()`	Get entry count and context list
`exists()`	Check if the backing collection exists

Semantic Cache¶

Overview¶

SemanticCache¶

hybi.semantic_cache.SemanticCache ¶

__init__(hb, collection='semantic_cache', *, threshold=0.65, query_weight=2.0, response_weight=0.1, top_k=5, default_ttl=None, should_cache=None, embedding_cache_size=1024, phase_dim=None) ¶

put(query, context, response, *, ttl=None) ¶

get(query, context=None, *, threshold=None, top_k=None) ¶

seed(data) ¶

invalidate(*, context=None) ¶

clear() ¶

stats() ¶

exists() ¶

CacheHit¶

hybi.semantic_cache.CacheHit dataclass ¶

Constructor Parameters¶

`hybi.semantic_cache.SemanticCache` ¶

`init(hb, collection='semantic_cache', *, threshold=0.65, query_weight=2.0, response_weight=0.1, top_k=5, default_ttl=None, should_cache=None, embedding_cache_size=1024, phase_dim=None)` ¶

`put(query, context, response, *, ttl=None)` ¶

`get(query, context=None, *, threshold=None, top_k=None)` ¶

`seed(data)` ¶

`invalidate(*, context=None)` ¶

`clear()` ¶

`stats()` ¶

`exists()` ¶

`hybi.semantic_cache.CacheHit` `dataclass` ¶