The Compose System¶

Compose is HyperBinder's schema system that exposes the power of Hyperdimensional Computing (HDC) through a clean, typed API.

Why Compose?¶

The Problem with Traditional Vector DBs¶

Traditional vector databases treat embeddings as black boxes:

Data → Embedding → Similarity Search

You can ask "what's similar?" but you cannot:

Decompose a vector to extract components
Query specific structural slots
Perform analogical reasoning
Traverse relationships semantically

What HDC Enables¶

Hyperdimensional Computing provides compositional operations on vectors that preserve structural relationships:

Structure encoding: Combine vectors while preserving their relationship
Structural decomposition: Extract components from composed structures
Set representation: Represent collections of items in a single vector

Key insight: Unlike traditional embeddings, HyperBinder's encoding is compositional. If you encode a triple (Subject, Predicate, Object), you can later decompose it to extract individual components.

The Layers¶

flowchart LR
    subgraph atoms [ATOMS - Advanced]
        a["bind, unbind,<br/>bundle, similarity"]
    end

    subgraph molecules [MOLECULES]
        m["Pair, Triple, Bundle,<br/>Sequence, Tree, Graph"]
    end

    subgraph compounds [COMPOUNDS]
        c["KnowledgeGraph, TimeSeries,<br/>Hierarchy, Document, Network"]
    end

    atoms -.-> molecules --> compounds

Atoms are an internal implementation detail, described here for completeness.

Atoms (bind, unbind, bundle) are low-level HDC primitives that molecules are built from.

Molecules (Building Blocks)¶

The core structures you define schemas with. Each molecule defines:

Structure: The shape of your data
Slots: Named positions in the structure
Queries: What operations are supported

Molecule	Slots	Best For
Pair	left, right	Key-value, edges
Triple	subject, predicate, object	Knowledge graphs
Bundle	User-defined fields	Tabular data
Sequence	item	Ordered data
Tree	child, parent	Hierarchies
Graph	source, edge, target	Networks

from hybi.compose import Triple, Field, Encoding

schema = Triple(
    subject=Field("entity", encoding=Encoding.SEMANTIC),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("target", encoding=Encoding.SEMANTIC),
)

Compounds (Domain Templates)¶

Pre-configured molecules optimized for common use cases. They expand to molecules at definition time.

Compound	Expands To	Use Case
KnowledgeGraph	Triple	Entity-relation-entity facts
TimeSeries	Sequence	Time-ordered data
Hierarchy	Tree	Org charts, taxonomies
Document	Bundle	Document chunks with metadata
Network	Graph	Social graphs, citation networks
Catalog	Bundle	Generic tabular data

from hybi.compose import KnowledgeGraph

# This is equivalent to defining a Triple with sensible defaults
schema = KnowledgeGraph(
    entity_field="entity",
    relation_field="relation",
)

Structured vs Search-Optimized Encoding¶

Molecules use different encoding strategies optimized for different use cases:

Structured Encoding¶

Used by: Pair, Triple, Tree, Graph, Sequence

Fully decomposable: Extract individual components from queries
High precision: Exact matches score ~1.0
Enables: Slot-specific queries, traversal, structural decomposition

Search-Optimized Encoding¶

Used by: Bundle

Multi-field search: Search across many fields simultaneously
Similarity-based: Optimized for semantic similarity matching
Trade-off: Lower precision for faster multi-field search

Bundle limitations

Bundle uses search-optimized encoding, which means:

Individual fields cannot be reliably extracted
Bundles cannot be nested inside other molecules
Similarity scores decrease with field count

Encoding Types¶

Fields can have different encoding strategies:

Encoding	Behavior	Use When
`SEMANTIC`	Similar values → similar vectors	Names, descriptions, text
`EXACT`	Each unique value → distinct vector	IDs, categories, relations
`NUMERIC`	Numbers close in value → similar vectors	Prices, counts, ratings

Field("description", encoding=Encoding.SEMANTIC)  # Semantic similarity
Field("category", encoding=Encoding.EXACT)        # Exact matching
Field("price", encoding=Encoding.NUMERIC, similar_within=50)  # $50 = similar

Nesting Molecules¶

Structured molecules can be nested inside each other:

# Typed entity knowledge graph
schema = Triple(
    subject=Pair(
        left=Field("entity_type", encoding=Encoding.EXACT),
        right=Field("entity_name"),
    ),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Pair(
        left=Field("target_type", encoding=Encoding.EXACT),
        right=Field("target_name"),
    ),
)
# Nested structure is fully decomposable!

Nesting rules

Structured molecules (Pair, Triple, Tree, Graph, Sequence) can nest
Bundle cannot be nested (uses search-optimized encoding)
Validation happens at construction time

Unique Capabilities¶

These operations are impossible with traditional vector databases:

Analogical Reasoning¶

Find D where A:B :: C:D

# Einstein:Relativity :: Darwin:?
results = hb.analogy("Einstein", "Relativity", "Darwin",
                     field_name="subject", collection="facts")
# Finds "Evolution" - the relation Darwin has like Einstein→Relativity

Slot-Specific Search¶

Search within a single structural slot:

q = hb.query("facts", schema=schema)

# Search only in the subject slot
results = q.search("physicist", slot="subject")

# Multi-slot with different weights
results = q.search_slots({
    "subject": ("Einstein", 2.0),  # Double weight
    "object": ("physics", 1.0),
})

Fuzzy Graph Traversal¶

Semantic path following (not just exact matching):

results = q.traverse_fuzzy(
    start="Albert Einstein",
    start_slot="subject",
    path=["worked_at", "located_in"],
    hop_threshold=0.6,  # Minimum similarity per hop
)

Prototype Search¶

Find items similar to ANY of multiple examples:

results = hb.search_prototype(
    examples=["Dune", "Foundation", "Neuromancer"],
    field_name="title",
    collection="books"
)

Intersections: The Glue Layer¶

Collections are isolated by default. Intersections declare relationships between collections, enabling cross-collection queries via .join().

flowchart LR
    E[Employees] <-->|identity| X[Expertise] <-->|identity| P[Projects]

employees.id = expertise.employee_id · expertise.project_id = projects.id

Declaring Intersections¶

# Connect employees to their expertise
hb.intersect("employees.employee_id", "expertise.subject")

# Connect expertise to projects
hb.intersect("expertise.project_id", "projects.id")

Querying Across Collections¶

# Start in employees, join through expertise to projects
results = (
    hb.query("employees")
    .search("senior engineer")
    .join("expertise")
    .join("projects")
)

for r in results:
    print(f"{r.source['name']} → {r.target['project_name']}")

Relation Types¶

Relation	Matching	Use Case
`"identity"`	Exact value equality	IDs, foreign keys
`"semantic"`	Embedding similarity	Text content, descriptions
`"link"`	Explicit value mappings	Cross-encoding fields (flexible mode)

The Bridge Pattern¶

Intersections enable a powerful pattern for connecting heterogeneous data:

flowchart LR
    D["Documents<br/>(Fuzzy)"] <-->|semantic| K["Knowledge Graph<br/>(Entity Hub)"] <-->|identity| T["Catalogs<br/>(Exact)"]

The Knowledge Graph acts as a semantic index bridging fuzzy text mentions to exact structured lookups.

See Intersections API for full documentation.

Schema = Contract¶

A schema isn't just metadata - it's a contract that determines:

What queries are valid: find(subject=X) only works on Triple
How data is encoded: Chain vs bundle, semantic vs exact
What results contain: Schema-aware result objects

Invalid queries fail with clear errors:

q = hb.query("facts", schema=triple_schema)
q.children("Alice")  # SchemaError: 'children' requires Tree schema

Next Steps¶

Molecules vs Compounds - Choosing the right schema
API Reference: Molecules - Full molecule documentation
API Reference: Fields - Field configuration options
API Reference: Intersections - Cross-collection queries