Skip to content

Compounds

Compounds are pre-configured molecules for common domain patterns.

Overview

Compound Based On Use Case
KnowledgeGraph Triple Entity-relation-entity facts
Catalog Bundle Generic tabular data (read-heavy)
RelationalTable Row Mutable tables with CRUD
TimeSeries Sequence Time-ordered data
Hierarchy Tree Org charts, taxonomies
Document Bundle Document chunks with metadata
Network Graph Social graphs, citations

Compounds expand to molecules at definition time, so they have the same capabilities once created.


KnowledgeGraph

Pre-configured Triple for knowledge graph data.

from hybi.compose import KnowledgeGraph

schema = KnowledgeGraph(
    entity_field="person",
    relation_field="relationship",
    # Defaults: SEMANTIC for entities, EXACT for relations
)

Equivalent to:

Triple(
    subject=Field("person", encoding=Encoding.SEMANTIC),
    predicate=Field("relationship", encoding=Encoding.EXACT),
    object=Field("target", encoding=Encoding.SEMANTIC),
)

hybi.compose.KnowledgeGraph dataclass

Bases: BaseMolecule

Knowledge graph compound: entity-relation-entity triples.

A convenience wrapper around Triple with sensible defaults for knowledge graph use cases (semantic entities, exact relations).

Expands to

Triple( subject=Field(entity_field, encoding=SEMANTIC), predicate=Field(relation_field, encoding=EXACT), object=Field(entity_field, encoding=SEMANTIC), )

Example

Simple usage - defaults to entity/relation columns

schema = KnowledgeGraph() hb.ingest(facts_df, collection="kg", schema=schema)

Custom field names

schema = KnowledgeGraph( ... entity_field="person", ... relation_field="relationship", ... )

With custom encoding

schema = KnowledgeGraph( ... entity_field="entity", ... relation_field="predicate", ... entity_encoding=Encoding.EXACT, # For IDs instead of text ... )

__init__(entity_field='entity', relation_field='relation', subject_field=None, object_field=None, entity_encoding=Encoding.SEMANTIC, relation_encoding=Encoding.EXACT, entity_weight=1.0, relation_weight=1.0)


Catalog

Pre-configured Bundle for tabular data.

from hybi.compose import Catalog, Field, Encoding

schema = Catalog(
    columns={
        "name": Field(encoding=Encoding.SEMANTIC, weight=1.5),
        "category": Field(encoding=Encoding.EXACT),
        "price": Field(encoding=Encoding.NUMERIC, similar_within=50),
    }
)

hybi.compose.Catalog dataclass

Bases: BaseMolecule

Catalog compound: searchable collection with SQL-like operations.

A convenience wrapper around Bundle optimized for tabular data with a familiar SQL-like query interface. Catalog provides a bridge between traditional relational thinking and hyperdimensional computing.

Expands to

Bundle(fields={ column_name: Field(encoding=..., weight=...), ... })

Unlike pure relational tables, Catalog supports: - Semantic search: Find rows by meaning, not just exact values - Fuzzy matching: Similarity-based lookups with configurable thresholds - Vector joins: Join collections by semantic similarity, not just key equality

Operations Map
Catalog Method HDC Operation
select() Field projection (SelectQuery)
where() Exact filter + similarity search
join() JoinQuery (exact or semantic)
aggregate() AggregateQuery (GROUP BY)
search() Vector similarity search
Example

Define a products catalog

schema = Catalog( ... columns={ ... "name": Field(encoding=Encoding.SEMANTIC, weight=2.0), ... "description": Field(encoding=Encoding.SEMANTIC), ... "category": Field(encoding=Encoding.EXACT), ... "price": Field(encoding=Encoding.NUMERIC, similar_within=50), ... }, ... primary_key="id", ... ) hb.ingest(products_df, collection="products", schema=schema)

Traditional-style query

results = hb.query("products").where(category="electronics")

Semantic query (HDC advantage)

results = hb.query("products").search("lightweight laptop for travel")

Join with another catalog

order_schema = Catalog( ... columns={"product_id": Field(encoding=Encoding.EXACT), ...} ... ) joined = hb.query("orders").join("products", on="product_id")

Aggregation

stats = hb.query("products").aggregate( ... group_by=["category"], ... aggregations={"avg_price": ("price", "avg")} ... )

Notes
  • primary_key is metadata only; HDC doesn't require explicit keys
  • For semantic joins, use Encoding.SEMANTIC on join columns
  • The underlying Bundle uses bundle encoding (lossy but searchable)

__init__(columns=dict(), primary_key=None, catalog_name=None)


RelationalTable

SQL-like table with full CRUD support.

RelationalTable provides familiar relational database semantics with atomic row-level operations. Unlike Catalog (which is optimized for search), RelationalTable uses structured encoding which enables true field-level updates.

Catalog vs RelationalTable

Aspect Catalog RelationalTable
Encoding Search-optimized Structured
Search Fast Moderate
UPDATE/DELETE Not supported Full support
Use case Search catalog Mutable tables

Use Catalog when you primarily search and append data. Use RelationalTable when you need UPDATE/DELETE operations.

from hybi.compose import RelationalTable, Field, Encoding

schema = RelationalTable(
    columns={
        "user_id": Field(encoding=Encoding.EXACT),
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
    primary_key="user_id",
)

CRUD Operations:

# Ingest data
hb.ingest(users_df, collection="users", schema=schema)

# Read by primary key
user = hb.query("users", schema).get(user_id="U001")

# Update fields atomically
hb.update(
    "users",
    where={"user_id": "U001"},
    set={"email": "new@example.com", "salary": 120000},
    schema=schema,
)

# Delete row
hb.delete("users", where={"user_id": "U001"}, schema=schema)

# Upsert (insert or update)
hb.upsert("users", row={"user_id": "U001", ...}, schema=schema)

Equivalent to:

Row(
    primary_key=Field("user_id", encoding=Encoding.EXACT),
    fields={
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
)

Search & CRUD Architecture

For optimal performance, use Catalog for search and RelationalTable for CRUD:

flowchart TB
    subgraph Catalog["CATALOG (Search)"]
        C1[Search-optimized<br/>fast]
        C2[Semantic Discovery]
        C1 --> C2
    end

    subgraph RelationalTable["RELATIONAL TABLE (CRUD)"]
        R1[Structured<br/>exact]
        R2[PK Lookups]
        R1 --> R2
    end

    C2 --> Bridge
    R2 --> Bridge
    Bridge[BRIDGE<br/>Primary Keys] --> Mutations[Deterministic Mutations]

Recommended pattern: - Use Catalog for semantic search (optimized for similarity matching) - Use RelationalTable for CRUD (optimized for exact field updates) - Bridge between them using shared primary keys

Single-schema alternative: RelationalTable can handle both search and CRUD, but search performance is slower than dedicated Catalog.

Fuzzy-to-Exact Bridge Pattern

When you need to combine semantic discovery with exact mutations, use the bridge pattern:

  1. Fuzzy search casts a wide net using semantic similarity
  2. Exact filters narrow to deterministic boundaries
  3. CRUD via PKs operates on the refined set
# 1. Semantic search finds candidates
candidates = hb.query("users", schema).search("machine learning expert", top_k=50)

# 2. Exact filtering narrows to deterministic set
refined = [r for r in candidates
           if r.data["department"] == "Engineering"
           and r.data["status"] == "active"]

# 3. CRUD via primary keys (safe - deterministic)
for r in refined:
    hb.update("users", where={"user_id": r.data["user_id"]}, set={...}, schema=schema)

This pattern leverages fuzzy search for discovery ("I don't know the exact term") while ensuring mutations operate on deterministic, exactly-identified rows.

See Fuzzy-to-Exact Pattern for a complete implementation.

hybi.compose.RelationalTable dataclass

Bases: BaseMolecule

SQL-like table with full CRUD support.

RelationalTable provides familiar relational database semantics: - Row-level UPDATE: Modify individual fields - Row-level DELETE: Remove rows by primary key - Field extraction: Read individual field values cleanly - ACID guarantees: Single-row atomicity

Unlike Catalog (which uses lossy Bundle encoding optimized for search), RelationalTable uses Row encoding with chain binding, which is lossless. This enables true field-level updates without re-encoding entire rows.

Trade-offs vs Catalog
Aspect Catalog RelationalTable
Encoding Bundle (lossy) Row (lossless)
Search Fast Moderate
UPDATE/DELETE Not supported Full support
Use case Search catalog Mutable tables

Use RelationalTable when you need UPDATE/DELETE operations. Use Catalog when you primarily search and append data.

Expands to

Row( primary_key=Field(pk_column, encoding=EXACT), fields={...other columns...}, )

Example

Define a users table

schema = RelationalTable( ... columns={ ... "user_id": Field(encoding=Encoding.EXACT), ... "email": Field(encoding=Encoding.EXACT), ... "name": Field(encoding=Encoding.SEMANTIC), ... "salary": Field(encoding=Encoding.NUMERIC), ... }, ... primary_key="user_id", ... ) hb.ingest(users_df, collection="users", schema=schema)

Read by primary key

user = hb.query("users", schema).get(user_id="U001")

Update fields

hb.update( ... "users", ... where={"user_id": "U001"}, ... set={"email": "new@example.com"}, ... schema=schema, ... )

Delete row

hb.delete("users", where={"user_id": "U001"}, schema=schema)

Notes
  • Primary key is required and must use EXACT encoding
  • Primary key cannot be updated (immutable row identity)
  • Updates are atomic at the row level

columns = dataclass_field(default_factory=dict) class-attribute instance-attribute

Column definitions mapping column names to Field configurations.

Must include the primary key column.

Example

columns={ "id": Field(encoding=Encoding.EXACT), "name": Field(encoding=Encoding.SEMANTIC), "email": Field(encoding=Encoding.EXACT), }

primary_key = None class-attribute instance-attribute

Name of the primary key column.

Required. The referenced column must: - Exist in columns - Use EXACT encoding

The primary key provides: - O(1) row lookup via PK index - Row identity for UPDATE/DELETE operations - Uniqueness constraint on ingest

__init__(columns=dict(), primary_key=None, table_name=None)


TimeSeries

Pre-configured molecule for time-ordered data. Supports two modes:

Temporal Mode (with timestamp_field)

When timestamp_field is provided, expands to a Pair enabling temporal queries:

from hybi.compose import TimeSeries

schema = TimeSeries(
    value_field="measurement",
    timestamp_field="recorded_at",  # Enables at_time(), time_range(), when()
)

Supported queries: search, find, at_time, time_range, when

Positional Mode (without timestamp_field)

When timestamp_field is None, expands to a Sequence enabling position-based queries:

schema = TimeSeries(
    value_field="message",
    timestamp_field=None,  # Position-based mode
    position_encoding="random",
    max_length=512,
)

Supported queries: search, at, contains, prefix

See timeseries_demo.py for a complete example using positional mode.

hybi.compose.TimeSeries dataclass

Bases: BaseMolecule

Time series compound: temporal data with timestamp-value binding.

TimeSeries encodes time-indexed data using hyperdimensional temporal binding. When a timestamp_field is provided, each row is encoded as:

timestamp ⊛ value

This enables powerful temporal queries: - at_time(ts): Find values at/near a specific timestamp - time_range(start, end): Find values within a time window - when(value): Find timestamps when a value occurred

Expands to (when timestamp_field provided): Pair( left=Field(timestamp_field, encoding=TEMPORAL), right=Field(value_field, encoding=value_encoding), )

Expands to (when timestamp_field is None - legacy mode): Sequence( item=Field(value_field, encoding=value_encoding), position_encoding="sinusoidal", max_length=max_length, )

Example

schema = TimeSeries( ... value_field="temperature", ... timestamp_field="recorded_at", ... value_encoding=Encoding.NUMERIC, ... ) hb.ingest(sensor_df, collection="readings", schema=schema)

Query: What was the temperature at 2pm?

results = hb.query("readings").at_time("2024-01-15 14:00:00")

Query: Temperatures between 1pm and 3pm

results = hb.query("readings").time_range( ... start="2024-01-15 13:00:00", ... end="2024-01-15 15:00:00", ... )

Legacy mode: without timestamp (uses row position)

schema = TimeSeries(value_field="price") # timestamp_field=None

Note: This mode only supports positional queries, not temporal

__init__(value_field='value', timestamp_field=None, value_encoding=Encoding.SEMANTIC, value_weight=1.0, timestamp_weight=1.0, position_encoding='sinusoidal', max_length=512)


Hierarchy

Pre-configured Tree for parent-child relationships.

from hybi.compose import Hierarchy

schema = Hierarchy(
    node_field="employee",
    parent_field="manager",
)

hybi.compose.Hierarchy dataclass

Bases: BaseMolecule

Hierarchy compound: parent-child organizational structures.

A convenience wrapper around Tree optimized for hierarchical data like org charts, file systems, taxonomies, or nested categories.

Expands to

Tree( child=Field(node_field, encoding=node_encoding), parent=Field(parent_field, encoding=node_encoding), level=Field(level_field) if level_field else None, )

Example

Org chart

schema = Hierarchy( ... node_field="employee", ... parent_field="manager", ... ) hb.ingest(org_df, collection="org", schema=schema)

File system with depth tracking

schema = Hierarchy( ... node_field="path", ... parent_field="parent_path", ... level_field="depth", ... )

Taxonomy with exact matching

schema = Hierarchy( ... node_field="category", ... parent_field="parent_category", ... node_encoding=Encoding.EXACT, ... )

__init__(node_field='node', parent_field='parent', level_field=None, node_encoding=Encoding.SEMANTIC, node_weight=1.0)


Document

Pre-configured Bundle for document chunks.

from hybi.compose import Document, Field, Encoding

schema = Document(
    content_field="text",
    metadata_fields={
        "source": Field(),
        "page": Field(encoding=Encoding.NUMERIC),
        "section": Field(encoding=Encoding.EXACT),
    },
)

hybi.compose.Document dataclass

Bases: BaseMolecule

Document compound: structured content with metadata.

A convenience wrapper around Bundle optimized for document storage with a primary content field and associated metadata fields.

Expands to

Bundle(fields={ content_field: Field(encoding=SEMANTIC, weight=content_weight), **{name: field for name, field in metadata_fields.items()}, })

Example

Simple document with title and content

schema = Document( ... content_field="body", ... metadata_fields={"title": Field(), "author": Field()}, ... ) hb.ingest(docs_df, collection="docs", schema=schema)

Article with categories

schema = Document( ... content_field="text", ... content_weight=2.0, # Boost content in search ... metadata_fields={ ... "headline": Field(weight=1.5), ... "category": Field(encoding=Encoding.EXACT), ... "published_date": Field(encoding=Encoding.TEMPORAL), ... }, ... )

__init__(content_field='content', content_encoding=Encoding.SEMANTIC, content_weight=1.0, metadata_fields=None)


Network

Pre-configured Graph for network data.

from hybi.compose import Network

schema = Network(
    node_field="user",
    edge_field="interaction",
    directed=True,
    # Optional: use separate columns for source/target nodes
    # source_field="from_user",
    # target_field="to_user",
)

hybi.compose.Network dataclass

Bases: BaseMolecule

Network compound: node-edge-node graph structures.

A convenience wrapper around Graph optimized for social networks, citation graphs, dependency graphs, and other network structures.

Expands to

Graph( node=Field(node_field, encoding=node_encoding), edge=Field(edge_field, encoding=edge_encoding), directed=directed, )

Example

Social network

schema = Network( ... node_field="user", ... edge_field="connection_type", ... ) hb.ingest(social_df, collection="social", schema=schema)

Citation network (undirected)

schema = Network( ... node_field="paper_id", ... edge_field="citation_type", ... node_encoding=Encoding.EXACT, ... directed=False, ... )

Dependency graph

schema = Network( ... node_field="package", ... edge_field="dependency_type", ... source_field="dependent", ... target_field="dependency", ... )

__init__(node_field='node', edge_field='edge', source_field=None, target_field=None, node_encoding=Encoding.SEMANTIC, edge_encoding=Encoding.EXACT, node_weight=1.0, edge_weight=1.0, directed=True)


When to Use Compounds vs Molecules

Use compounds when:

  • Your data fits a common pattern
  • You want sensible encoding defaults
  • You're prototyping quickly

Use molecules when:

  • You need custom encodings per field
  • You're nesting structures
  • You need fine-grained control over weights

See Molecules vs Compounds for detailed guidance.


Example Code

Complete runnable examples for each compound type:

Compound Example File Description
KnowledgeGraph knowledge_graph_demo.py Entity-relation-entity facts with traversal
Document document_demo.py Document chunks with metadata
Hierarchy hierarchy_demo.py Org charts and taxonomies
TimeSeries timeseries_demo.py Time-ordered data
Network network_demo.py Social graphs and citations
Catalog product_catalog_demo.py Product catalogs with search
RelationalTable fuzzy_to_exact_demo.py CRUD with fuzzy-to-exact pattern

Run any example from the SDK directory:

cd sdk
python examples/compose/knowledge_graph_demo.py

See Examples README for the full example index.