Skip to content

The Compose System

Compose is HyperBinder's schema system that exposes the power of Hyperdimensional Computing (HDC) through a clean, typed API.

Why Compose?

The Problem with Traditional Vector DBs

Traditional vector databases treat embeddings as black boxes:

Data → Embedding → Similarity Search

You can ask "what's similar?" but you cannot:

  • Decompose a vector to extract components
  • Query specific structural slots
  • Perform analogical reasoning
  • Traverse relationships semantically

What HDC Enables

Hyperdimensional Computing provides compositional operations on vectors that preserve structural relationships:

  • Structure encoding: Combine vectors while preserving their relationship
  • Structural decomposition: Extract components from composed structures
  • Set representation: Represent collections of items in a single vector

Key insight: Unlike traditional embeddings, HyperBinder's encoding is compositional. If you encode a triple (Subject, Predicate, Object), you can later decompose it to extract individual components.

The Layers

flowchart LR
    subgraph atoms [ATOMS - Advanced]
        a["bind, unbind,<br/>bundle, similarity"]
    end

    subgraph molecules [MOLECULES]
        m["Pair, Triple, Bundle,<br/>Sequence, Tree, Graph"]
    end

    subgraph compounds [COMPOUNDS]
        c["KnowledgeGraph, TimeSeries,<br/>Hierarchy, Document, Network"]
    end

    atoms -.-> molecules --> compounds

Atoms are an internal implementation detail, described here for completeness.

Atoms (bind, unbind, bundle) are low-level HDC primitives that molecules are built from.

Molecules (Building Blocks)

The core structures you define schemas with. Each molecule defines:

  1. Structure: The shape of your data
  2. Slots: Named positions in the structure
  3. Queries: What operations are supported
Molecule Slots Best For
Pair left, right Key-value, edges
Triple subject, predicate, object Knowledge graphs
Bundle User-defined fields Tabular data
Sequence item Ordered data
Tree child, parent Hierarchies
Graph source, edge, target Networks
from hybi.compose import Triple, Field, Encoding

schema = Triple(
    subject=Field("entity", encoding=Encoding.SEMANTIC),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("target", encoding=Encoding.SEMANTIC),
)

Compounds (Domain Templates)

Pre-configured molecules optimized for common use cases. They expand to molecules at definition time.

Compound Expands To Use Case
KnowledgeGraph Triple Entity-relation-entity facts
TimeSeries Sequence Time-ordered data
Hierarchy Tree Org charts, taxonomies
Document Bundle Document chunks with metadata
Network Graph Social graphs, citation networks
Catalog Bundle Generic tabular data
from hybi.compose import KnowledgeGraph

# This is equivalent to defining a Triple with sensible defaults
schema = KnowledgeGraph(
    entity_field="entity",
    relation_field="relation",
)

Structured vs Search-Optimized Encoding

Molecules use different encoding strategies optimized for different use cases:

Structured Encoding

Used by: Pair, Triple, Tree, Graph, Sequence

  • Fully decomposable: Extract individual components from queries
  • High precision: Exact matches score ~1.0
  • Enables: Slot-specific queries, traversal, structural decomposition

Search-Optimized Encoding

Used by: Bundle

  • Multi-field search: Search across many fields simultaneously
  • Similarity-based: Optimized for semantic similarity matching
  • Trade-off: Lower precision for faster multi-field search

Bundle limitations

Bundle uses search-optimized encoding, which means:

  • Individual fields cannot be reliably extracted
  • Bundles cannot be nested inside other molecules
  • Similarity scores decrease with field count

Encoding Types

Fields can have different encoding strategies:

Encoding Behavior Use When
SEMANTIC Similar values → similar vectors Names, descriptions, text
EXACT Each unique value → distinct vector IDs, categories, relations
NUMERIC Numbers close in value → similar vectors Prices, counts, ratings
Field("description", encoding=Encoding.SEMANTIC)  # Semantic similarity
Field("category", encoding=Encoding.EXACT)        # Exact matching
Field("price", encoding=Encoding.NUMERIC, similar_within=50)  # $50 = similar

Nesting Molecules

Structured molecules can be nested inside each other:

# Typed entity knowledge graph
schema = Triple(
    subject=Pair(
        left=Field("entity_type", encoding=Encoding.EXACT),
        right=Field("entity_name"),
    ),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Pair(
        left=Field("target_type", encoding=Encoding.EXACT),
        right=Field("target_name"),
    ),
)
# Nested structure is fully decomposable!

Nesting rules

  • Structured molecules (Pair, Triple, Tree, Graph, Sequence) can nest
  • Bundle cannot be nested (uses search-optimized encoding)
  • Validation happens at construction time

Unique Capabilities

These operations are impossible with traditional vector databases:

Analogical Reasoning

Find D where A:B :: C:D

# Einstein:Relativity :: Darwin:?
results = hb.analogy("Einstein", "Relativity", "Darwin",
                     field_name="subject", collection="facts")
# Finds "Evolution" - the relation Darwin has like Einstein→Relativity

Search within a single structural slot:

q = hb.query("facts", schema=schema)

# Search only in the subject slot
results = q.search("physicist", slot="subject")

# Multi-slot with different weights
results = q.search_slots({
    "subject": ("Einstein", 2.0),  # Double weight
    "object": ("physics", 1.0),
})

Fuzzy Graph Traversal

Semantic path following (not just exact matching):

results = q.traverse_fuzzy(
    start="Albert Einstein",
    start_slot="subject",
    path=["worked_at", "located_in"],
    hop_threshold=0.6,  # Minimum similarity per hop
)

Find items similar to ANY of multiple examples:

results = hb.search_prototype(
    examples=["Dune", "Foundation", "Neuromancer"],
    field_name="title",
    collection="books"
)

Intersections: The Glue Layer

Collections are isolated by default. Intersections declare relationships between collections, enabling cross-collection queries via .join().

flowchart LR
    E[Employees] <-->|identity| X[Expertise] <-->|identity| P[Projects]

employees.id = expertise.employee_id · expertise.project_id = projects.id

Declaring Intersections

# Connect employees to their expertise
hb.intersect("employees.employee_id", "expertise.subject")

# Connect expertise to projects
hb.intersect("expertise.project_id", "projects.id")

Querying Across Collections

# Start in employees, join through expertise to projects
results = (
    hb.query("employees")
    .search("senior engineer")
    .join("expertise")
    .join("projects")
)

for r in results:
    print(f"{r.source['name']}{r.target['project_name']}")

Relation Types

Relation Matching Use Case
"identity" Exact value equality IDs, foreign keys
"semantic" Embedding similarity Text content, descriptions
"link" Explicit value mappings Cross-encoding fields (flexible mode)

The Bridge Pattern

Intersections enable a powerful pattern for connecting heterogeneous data:

flowchart LR
    D["Documents<br/>(Fuzzy)"] <-->|semantic| K["Knowledge Graph<br/>(Entity Hub)"] <-->|identity| T["Catalogs<br/>(Exact)"]

The Knowledge Graph acts as a semantic index bridging fuzzy text mentions to exact structured lookups.

See Intersections API for full documentation.


Schema = Contract

A schema isn't just metadata - it's a contract that determines:

  1. What queries are valid: find(subject=X) only works on Triple
  2. How data is encoded: Chain vs bundle, semantic vs exact
  3. What results contain: Schema-aware result objects

Invalid queries fail with clear errors:

q = hb.query("facts", schema=triple_schema)
q.children("Alice")  # SchemaError: 'children' requires Tree schema

Next Steps