Compounds¶
Compounds are pre-configured molecules for common domain patterns.
Overview¶
| Compound | Based On | Use Case |
|---|---|---|
| KnowledgeGraph | Triple | Entity-relation-entity facts |
| Catalog | Bundle | Generic tabular data (read-heavy) |
| RelationalTable | Row | Mutable tables with CRUD |
| TimeSeries | Sequence | Time-ordered data |
| Hierarchy | Tree | Org charts, taxonomies |
| Document | Bundle | Document chunks with metadata |
| Network | Graph | Social graphs, citations |
Compounds expand to molecules at definition time, so they have the same capabilities once created.
KnowledgeGraph¶
Pre-configured Triple for knowledge graph data.
from hybi.compose import KnowledgeGraph
schema = KnowledgeGraph(
entity_field="person",
relation_field="relationship",
# Defaults: SEMANTIC for entities, EXACT for relations
)
Equivalent to:
Triple(
subject=Field("person", encoding=Encoding.SEMANTIC),
predicate=Field("relationship", encoding=Encoding.EXACT),
object=Field("target", encoding=Encoding.SEMANTIC),
)
hybi.compose.KnowledgeGraph
dataclass
¶
Bases: BaseMolecule
Knowledge graph compound: entity-relation-entity triples.
A convenience wrapper around Triple with sensible defaults for knowledge graph use cases (semantic entities, exact relations).
Example
Simple usage - defaults to entity/relation columns¶
schema = KnowledgeGraph() hb.ingest(facts_df, collection="kg", schema=schema)
Custom field names¶
schema = KnowledgeGraph( ... entity_field="person", ... relation_field="relationship", ... )
With custom encoding¶
schema = KnowledgeGraph( ... entity_field="entity", ... relation_field="predicate", ... entity_encoding=Encoding.EXACT, # For IDs instead of text ... )
__init__(entity_field='entity', relation_field='relation', subject_field=None, object_field=None, entity_encoding=Encoding.SEMANTIC, relation_encoding=Encoding.EXACT, entity_weight=1.0, relation_weight=1.0)
¶
Catalog¶
Pre-configured Bundle for tabular data.
from hybi.compose import Catalog, Field, Encoding
schema = Catalog(
columns={
"name": Field(encoding=Encoding.SEMANTIC, weight=1.5),
"category": Field(encoding=Encoding.EXACT),
"price": Field(encoding=Encoding.NUMERIC, similar_within=50),
}
)
hybi.compose.Catalog
dataclass
¶
Bases: BaseMolecule
Catalog compound: searchable collection with SQL-like operations.
A convenience wrapper around Bundle optimized for tabular data with a familiar SQL-like query interface. Catalog provides a bridge between traditional relational thinking and hyperdimensional computing.
Unlike pure relational tables, Catalog supports: - Semantic search: Find rows by meaning, not just exact values - Fuzzy matching: Similarity-based lookups with configurable thresholds - Vector joins: Join collections by semantic similarity, not just key equality
Operations Map
| Catalog Method | HDC Operation |
|---|---|
| select() | Field projection (SelectQuery) |
| where() | Exact filter + similarity search |
| join() | JoinQuery (exact or semantic) |
| aggregate() | AggregateQuery (GROUP BY) |
| search() | Vector similarity search |
Example
Define a products catalog¶
schema = Catalog( ... columns={ ... "name": Field(encoding=Encoding.SEMANTIC, weight=2.0), ... "description": Field(encoding=Encoding.SEMANTIC), ... "category": Field(encoding=Encoding.EXACT), ... "price": Field(encoding=Encoding.NUMERIC, similar_within=50), ... }, ... primary_key="id", ... ) hb.ingest(products_df, collection="products", schema=schema)
Traditional-style query¶
results = hb.query("products").where(category="electronics")
Semantic query (HDC advantage)¶
results = hb.query("products").search("lightweight laptop for travel")
Join with another catalog¶
order_schema = Catalog( ... columns={"product_id": Field(encoding=Encoding.EXACT), ...} ... ) joined = hb.query("orders").join("products", on="product_id")
Aggregation¶
stats = hb.query("products").aggregate( ... group_by=["category"], ... aggregations={"avg_price": ("price", "avg")} ... )
Notes
- primary_key is metadata only; HDC doesn't require explicit keys
- For semantic joins, use Encoding.SEMANTIC on join columns
- The underlying Bundle uses bundle encoding (lossy but searchable)
__init__(columns=dict(), primary_key=None, catalog_name=None)
¶
RelationalTable¶
SQL-like table with full CRUD support.
RelationalTable provides familiar relational database semantics with atomic row-level operations. Unlike Catalog (which is optimized for search), RelationalTable uses structured encoding which enables true field-level updates.
Catalog vs RelationalTable¶
| Aspect | Catalog | RelationalTable |
|---|---|---|
| Encoding | Search-optimized | Structured |
| Search | Fast | Moderate |
| UPDATE/DELETE | Not supported | Full support |
| Use case | Search catalog | Mutable tables |
Use Catalog when you primarily search and append data. Use RelationalTable when you need UPDATE/DELETE operations.
from hybi.compose import RelationalTable, Field, Encoding
schema = RelationalTable(
columns={
"user_id": Field(encoding=Encoding.EXACT),
"email": Field(encoding=Encoding.EXACT),
"name": Field(encoding=Encoding.SEMANTIC),
"salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
},
primary_key="user_id",
)
CRUD Operations:
# Ingest data
hb.ingest(users_df, collection="users", schema=schema)
# Read by primary key
user = hb.query("users", schema).get(user_id="U001")
# Update fields atomically
hb.update(
"users",
where={"user_id": "U001"},
set={"email": "new@example.com", "salary": 120000},
schema=schema,
)
# Delete row
hb.delete("users", where={"user_id": "U001"}, schema=schema)
# Upsert (insert or update)
hb.upsert("users", row={"user_id": "U001", ...}, schema=schema)
Equivalent to:
Row(
primary_key=Field("user_id", encoding=Encoding.EXACT),
fields={
"email": Field(encoding=Encoding.EXACT),
"name": Field(encoding=Encoding.SEMANTIC),
"salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
},
)
Search & CRUD Architecture¶
For optimal performance, use Catalog for search and RelationalTable for CRUD:
flowchart TB
subgraph Catalog["CATALOG (Search)"]
C1[Search-optimized<br/>fast]
C2[Semantic Discovery]
C1 --> C2
end
subgraph RelationalTable["RELATIONAL TABLE (CRUD)"]
R1[Structured<br/>exact]
R2[PK Lookups]
R1 --> R2
end
C2 --> Bridge
R2 --> Bridge
Bridge[BRIDGE<br/>Primary Keys] --> Mutations[Deterministic Mutations]
Recommended pattern: - Use Catalog for semantic search (optimized for similarity matching) - Use RelationalTable for CRUD (optimized for exact field updates) - Bridge between them using shared primary keys
Single-schema alternative: RelationalTable can handle both search and CRUD, but search performance is slower than dedicated Catalog.
Fuzzy-to-Exact Bridge Pattern¶
When you need to combine semantic discovery with exact mutations, use the bridge pattern:
- Fuzzy search casts a wide net using semantic similarity
- Exact filters narrow to deterministic boundaries
- CRUD via PKs operates on the refined set
# 1. Semantic search finds candidates
candidates = hb.query("users", schema).search("machine learning expert", top_k=50)
# 2. Exact filtering narrows to deterministic set
refined = [r for r in candidates
if r.data["department"] == "Engineering"
and r.data["status"] == "active"]
# 3. CRUD via primary keys (safe - deterministic)
for r in refined:
hb.update("users", where={"user_id": r.data["user_id"]}, set={...}, schema=schema)
This pattern leverages fuzzy search for discovery ("I don't know the exact term") while ensuring mutations operate on deterministic, exactly-identified rows.
See Fuzzy-to-Exact Pattern for a complete implementation.
hybi.compose.RelationalTable
dataclass
¶
Bases: BaseMolecule
SQL-like table with full CRUD support.
RelationalTable provides familiar relational database semantics: - Row-level UPDATE: Modify individual fields - Row-level DELETE: Remove rows by primary key - Field extraction: Read individual field values cleanly - ACID guarantees: Single-row atomicity
Unlike Catalog (which uses lossy Bundle encoding optimized for search), RelationalTable uses Row encoding with chain binding, which is lossless. This enables true field-level updates without re-encoding entire rows.
Trade-offs vs Catalog
| Aspect | Catalog | RelationalTable |
|---|---|---|
| Encoding | Bundle (lossy) | Row (lossless) |
| Search | Fast | Moderate |
| UPDATE/DELETE | Not supported | Full support |
| Use case | Search catalog | Mutable tables |
Use RelationalTable when you need UPDATE/DELETE operations. Use Catalog when you primarily search and append data.
Example
Define a users table¶
schema = RelationalTable( ... columns={ ... "user_id": Field(encoding=Encoding.EXACT), ... "email": Field(encoding=Encoding.EXACT), ... "name": Field(encoding=Encoding.SEMANTIC), ... "salary": Field(encoding=Encoding.NUMERIC), ... }, ... primary_key="user_id", ... ) hb.ingest(users_df, collection="users", schema=schema)
Read by primary key¶
user = hb.query("users", schema).get(user_id="U001")
Update fields¶
hb.update( ... "users", ... where={"user_id": "U001"}, ... set={"email": "new@example.com"}, ... schema=schema, ... )
Delete row¶
hb.delete("users", where={"user_id": "U001"}, schema=schema)
Notes
- Primary key is required and must use EXACT encoding
- Primary key cannot be updated (immutable row identity)
- Updates are atomic at the row level
columns = dataclass_field(default_factory=dict)
class-attribute
instance-attribute
¶
Column definitions mapping column names to Field configurations.
Must include the primary key column.
Example
columns={ "id": Field(encoding=Encoding.EXACT), "name": Field(encoding=Encoding.SEMANTIC), "email": Field(encoding=Encoding.EXACT), }
primary_key = None
class-attribute
instance-attribute
¶
Name of the primary key column.
Required. The referenced column must: - Exist in columns - Use EXACT encoding
The primary key provides: - O(1) row lookup via PK index - Row identity for UPDATE/DELETE operations - Uniqueness constraint on ingest
__init__(columns=dict(), primary_key=None, table_name=None)
¶
TimeSeries¶
Pre-configured molecule for time-ordered data. Supports two modes:
Temporal Mode (with timestamp_field)¶
When timestamp_field is provided, expands to a Pair enabling temporal queries:
from hybi.compose import TimeSeries
schema = TimeSeries(
value_field="measurement",
timestamp_field="recorded_at", # Enables at_time(), time_range(), when()
)
Supported queries: search, find, at_time, time_range, when
Positional Mode (without timestamp_field)¶
When timestamp_field is None, expands to a Sequence enabling position-based queries:
schema = TimeSeries(
value_field="message",
timestamp_field=None, # Position-based mode
position_encoding="random",
max_length=512,
)
Supported queries: search, at, contains, prefix
See timeseries_demo.py for a complete example using positional mode.
hybi.compose.TimeSeries
dataclass
¶
Bases: BaseMolecule
Time series compound: temporal data with timestamp-value binding.
TimeSeries encodes time-indexed data using hyperdimensional temporal binding. When a timestamp_field is provided, each row is encoded as:
timestamp ⊛ value
This enables powerful temporal queries: - at_time(ts): Find values at/near a specific timestamp - time_range(start, end): Find values within a time window - when(value): Find timestamps when a value occurred
Expands to (when timestamp_field provided): Pair( left=Field(timestamp_field, encoding=TEMPORAL), right=Field(value_field, encoding=value_encoding), )
Expands to (when timestamp_field is None - legacy mode): Sequence( item=Field(value_field, encoding=value_encoding), position_encoding="sinusoidal", max_length=max_length, )
Example
Recommended: with timestamp field (enables temporal queries)¶
schema = TimeSeries( ... value_field="temperature", ... timestamp_field="recorded_at", ... value_encoding=Encoding.NUMERIC, ... ) hb.ingest(sensor_df, collection="readings", schema=schema)
Query: What was the temperature at 2pm?¶
results = hb.query("readings").at_time("2024-01-15 14:00:00")
Query: Temperatures between 1pm and 3pm¶
results = hb.query("readings").time_range( ... start="2024-01-15 13:00:00", ... end="2024-01-15 15:00:00", ... )
Legacy mode: without timestamp (uses row position)¶
schema = TimeSeries(value_field="price") # timestamp_field=None
Note: This mode only supports positional queries, not temporal¶
__init__(value_field='value', timestamp_field=None, value_encoding=Encoding.SEMANTIC, value_weight=1.0, timestamp_weight=1.0, position_encoding='sinusoidal', max_length=512)
¶
Hierarchy¶
Pre-configured Tree for parent-child relationships.
from hybi.compose import Hierarchy
schema = Hierarchy(
node_field="employee",
parent_field="manager",
)
hybi.compose.Hierarchy
dataclass
¶
Bases: BaseMolecule
Hierarchy compound: parent-child organizational structures.
A convenience wrapper around Tree optimized for hierarchical data like org charts, file systems, taxonomies, or nested categories.
Example
Org chart¶
schema = Hierarchy( ... node_field="employee", ... parent_field="manager", ... ) hb.ingest(org_df, collection="org", schema=schema)
File system with depth tracking¶
schema = Hierarchy( ... node_field="path", ... parent_field="parent_path", ... level_field="depth", ... )
Taxonomy with exact matching¶
schema = Hierarchy( ... node_field="category", ... parent_field="parent_category", ... node_encoding=Encoding.EXACT, ... )
__init__(node_field='node', parent_field='parent', level_field=None, node_encoding=Encoding.SEMANTIC, node_weight=1.0)
¶
Document¶
Pre-configured Bundle for document chunks.
from hybi.compose import Document, Field, Encoding
schema = Document(
content_field="text",
metadata_fields={
"source": Field(),
"page": Field(encoding=Encoding.NUMERIC),
"section": Field(encoding=Encoding.EXACT),
},
)
hybi.compose.Document
dataclass
¶
Bases: BaseMolecule
Document compound: structured content with metadata.
A convenience wrapper around Bundle optimized for document storage with a primary content field and associated metadata fields.
Example
Simple document with title and content¶
schema = Document( ... content_field="body", ... metadata_fields={"title": Field(), "author": Field()}, ... ) hb.ingest(docs_df, collection="docs", schema=schema)
Article with categories¶
schema = Document( ... content_field="text", ... content_weight=2.0, # Boost content in search ... metadata_fields={ ... "headline": Field(weight=1.5), ... "category": Field(encoding=Encoding.EXACT), ... "published_date": Field(encoding=Encoding.TEMPORAL), ... }, ... )
__init__(content_field='content', content_encoding=Encoding.SEMANTIC, content_weight=1.0, metadata_fields=None)
¶
Network¶
Pre-configured Graph for network data.
from hybi.compose import Network
schema = Network(
node_field="user",
edge_field="interaction",
directed=True,
# Optional: use separate columns for source/target nodes
# source_field="from_user",
# target_field="to_user",
)
hybi.compose.Network
dataclass
¶
Bases: BaseMolecule
Network compound: node-edge-node graph structures.
A convenience wrapper around Graph optimized for social networks, citation graphs, dependency graphs, and other network structures.
Example
Social network¶
schema = Network( ... node_field="user", ... edge_field="connection_type", ... ) hb.ingest(social_df, collection="social", schema=schema)
Citation network (undirected)¶
schema = Network( ... node_field="paper_id", ... edge_field="citation_type", ... node_encoding=Encoding.EXACT, ... directed=False, ... )
Dependency graph¶
schema = Network( ... node_field="package", ... edge_field="dependency_type", ... source_field="dependent", ... target_field="dependency", ... )
__init__(node_field='node', edge_field='edge', source_field=None, target_field=None, node_encoding=Encoding.SEMANTIC, edge_encoding=Encoding.EXACT, node_weight=1.0, edge_weight=1.0, directed=True)
¶
When to Use Compounds vs Molecules¶
Use compounds when:
- Your data fits a common pattern
- You want sensible encoding defaults
- You're prototyping quickly
Use molecules when:
- You need custom encodings per field
- You're nesting structures
- You need fine-grained control over weights
See Molecules vs Compounds for detailed guidance.
Example Code¶
Complete runnable examples for each compound type:
| Compound | Example File | Description |
|---|---|---|
| KnowledgeGraph | knowledge_graph_demo.py |
Entity-relation-entity facts with traversal |
| Document | document_demo.py |
Document chunks with metadata |
| Hierarchy | hierarchy_demo.py |
Org charts and taxonomies |
| TimeSeries | timeseries_demo.py |
Time-ordered data |
| Network | network_demo.py |
Social graphs and citations |
| Catalog | product_catalog_demo.py |
Product catalogs with search |
| RelationalTable | fuzzy_to_exact_demo.py |
CRUD with fuzzy-to-exact pattern |
Run any example from the SDK directory:
See Examples README for the full example index.