The Compose System¶
Compose is HyperBinder's schema system that exposes the power of Hyperdimensional Computing (HDC) through a clean, typed API.
Why Compose?¶
The Problem with Traditional Vector DBs¶
Traditional vector databases treat embeddings as black boxes:
You can ask "what's similar?" but you cannot:
- Decompose a vector to extract components
- Query specific structural slots
- Perform analogical reasoning
- Traverse relationships semantically
What HDC Enables¶
Hyperdimensional Computing provides compositional operations on vectors that preserve structural relationships:
- Structure encoding: Combine vectors while preserving their relationship
- Structural decomposition: Extract components from composed structures
- Set representation: Represent collections of items in a single vector
Key insight: Unlike traditional embeddings, HyperBinder's encoding is compositional. If you encode a triple (Subject, Predicate, Object), you can later decompose it to extract individual components.
The Layers¶
flowchart LR
subgraph atoms [ATOMS - Advanced]
a["bind, unbind,<br/>bundle, similarity"]
end
subgraph molecules [MOLECULES]
m["Pair, Triple, Bundle,<br/>Sequence, Tree, Graph"]
end
subgraph compounds [COMPOUNDS]
c["KnowledgeGraph, TimeSeries,<br/>Hierarchy, Document, Network"]
end
atoms -.-> molecules --> compounds
Atoms are an internal implementation detail, described here for completeness.
Atoms (bind, unbind, bundle) are low-level HDC primitives that molecules are built from.
Molecules (Building Blocks)¶
The core structures you define schemas with. Each molecule defines:
- Structure: The shape of your data
- Slots: Named positions in the structure
- Queries: What operations are supported
| Molecule | Slots | Best For |
|---|---|---|
| Pair | left, right | Key-value, edges |
| Triple | subject, predicate, object | Knowledge graphs |
| Bundle | User-defined fields | Tabular data |
| Sequence | item | Ordered data |
| Tree | child, parent | Hierarchies |
| Graph | source, edge, target | Networks |
from hybi.compose import Triple, Field, Encoding
schema = Triple(
subject=Field("entity", encoding=Encoding.SEMANTIC),
predicate=Field("relation", encoding=Encoding.EXACT),
object=Field("target", encoding=Encoding.SEMANTIC),
)
Compounds (Domain Templates)¶
Pre-configured molecules optimized for common use cases. They expand to molecules at definition time.
| Compound | Expands To | Use Case |
|---|---|---|
| KnowledgeGraph | Triple | Entity-relation-entity facts |
| TimeSeries | Sequence | Time-ordered data |
| Hierarchy | Tree | Org charts, taxonomies |
| Document | Bundle | Document chunks with metadata |
| Network | Graph | Social graphs, citation networks |
| Catalog | Bundle | Generic tabular data |
from hybi.compose import KnowledgeGraph
# This is equivalent to defining a Triple with sensible defaults
schema = KnowledgeGraph(
entity_field="entity",
relation_field="relation",
)
Structured vs Search-Optimized Encoding¶
Molecules use different encoding strategies optimized for different use cases:
Structured Encoding¶
Used by: Pair, Triple, Tree, Graph, Sequence
- Fully decomposable: Extract individual components from queries
- High precision: Exact matches score ~1.0
- Enables: Slot-specific queries, traversal, structural decomposition
Search-Optimized Encoding¶
Used by: Bundle
- Multi-field search: Search across many fields simultaneously
- Similarity-based: Optimized for semantic similarity matching
- Trade-off: Lower precision for faster multi-field search
Bundle limitations
Bundle uses search-optimized encoding, which means:
- Individual fields cannot be reliably extracted
- Bundles cannot be nested inside other molecules
- Similarity scores decrease with field count
Encoding Types¶
Fields can have different encoding strategies:
| Encoding | Behavior | Use When |
|---|---|---|
SEMANTIC |
Similar values → similar vectors | Names, descriptions, text |
EXACT |
Each unique value → distinct vector | IDs, categories, relations |
NUMERIC |
Numbers close in value → similar vectors | Prices, counts, ratings |
Field("description", encoding=Encoding.SEMANTIC) # Semantic similarity
Field("category", encoding=Encoding.EXACT) # Exact matching
Field("price", encoding=Encoding.NUMERIC, similar_within=50) # $50 = similar
Nesting Molecules¶
Structured molecules can be nested inside each other:
# Typed entity knowledge graph
schema = Triple(
subject=Pair(
left=Field("entity_type", encoding=Encoding.EXACT),
right=Field("entity_name"),
),
predicate=Field("relation", encoding=Encoding.EXACT),
object=Pair(
left=Field("target_type", encoding=Encoding.EXACT),
right=Field("target_name"),
),
)
# Nested structure is fully decomposable!
Nesting rules
- Structured molecules (Pair, Triple, Tree, Graph, Sequence) can nest
- Bundle cannot be nested (uses search-optimized encoding)
- Validation happens at construction time
Unique Capabilities¶
These operations are impossible with traditional vector databases:
Analogical Reasoning¶
Find D where A:B :: C:D
# Einstein:Relativity :: Darwin:?
results = hb.analogy("Einstein", "Relativity", "Darwin",
field_name="subject", collection="facts")
# Finds "Evolution" - the relation Darwin has like Einstein→Relativity
Slot-Specific Search¶
Search within a single structural slot:
q = hb.query("facts", schema=schema)
# Search only in the subject slot
results = q.search("physicist", slot="subject")
# Multi-slot with different weights
results = q.search_slots({
"subject": ("Einstein", 2.0), # Double weight
"object": ("physics", 1.0),
})
Fuzzy Graph Traversal¶
Semantic path following (not just exact matching):
results = q.traverse_fuzzy(
start="Albert Einstein",
start_slot="subject",
path=["worked_at", "located_in"],
hop_threshold=0.6, # Minimum similarity per hop
)
Prototype Search¶
Find items similar to ANY of multiple examples:
results = hb.search_prototype(
examples=["Dune", "Foundation", "Neuromancer"],
field_name="title",
collection="books"
)
Intersections: The Glue Layer¶
Collections are isolated by default. Intersections declare relationships between collections, enabling cross-collection queries via .join().
flowchart LR
E[Employees] <-->|identity| X[Expertise] <-->|identity| P[Projects]
employees.id = expertise.employee_id · expertise.project_id = projects.id
Declaring Intersections¶
# Connect employees to their expertise
hb.intersect("employees.employee_id", "expertise.subject")
# Connect expertise to projects
hb.intersect("expertise.project_id", "projects.id")
Querying Across Collections¶
# Start in employees, join through expertise to projects
results = (
hb.query("employees")
.search("senior engineer")
.join("expertise")
.join("projects")
)
for r in results:
print(f"{r.source['name']} → {r.target['project_name']}")
Relation Types¶
| Relation | Matching | Use Case |
|---|---|---|
"identity" |
Exact value equality | IDs, foreign keys |
"semantic" |
Embedding similarity | Text content, descriptions |
"link" |
Explicit value mappings | Cross-encoding fields (flexible mode) |
The Bridge Pattern¶
Intersections enable a powerful pattern for connecting heterogeneous data:
flowchart LR
D["Documents<br/>(Fuzzy)"] <-->|semantic| K["Knowledge Graph<br/>(Entity Hub)"] <-->|identity| T["Catalogs<br/>(Exact)"]
The Knowledge Graph acts as a semantic index bridging fuzzy text mentions to exact structured lookups.
See Intersections API for full documentation.
Schema = Contract¶
A schema isn't just metadata - it's a contract that determines:
- What queries are valid:
find(subject=X)only works on Triple - How data is encoded: Chain vs bundle, semantic vs exact
- What results contain: Schema-aware result objects
Invalid queries fail with clear errors:
q = hb.query("facts", schema=triple_schema)
q.children("Alice") # SchemaError: 'children' requires Tree schema
Next Steps¶
- Molecules vs Compounds - Choosing the right schema
- API Reference: Molecules - Full molecule documentation
- API Reference: Fields - Field configuration options
- API Reference: Intersections - Cross-collection queries