Molecules¶
Molecules are the core compositional building blocks for defining schemas.
Overview¶
| Molecule | Slots | Best For |
|---|---|---|
| Pair | left, right | Key-value, edges |
| Triple | subject, predicate, object | Knowledge graphs |
| Bundle | User-defined | Tabular data (read-heavy) |
| Row | pk + fields | Mutable tables (CRUD) |
| Sequence | item | Ordered data |
| Tree | child, parent | Hierarchies |
| Graph | source, edge, target | Networks |
Structured vs Search-Optimized¶
Molecules use two encoding strategies:
- Structured (Pair, Triple, Row, Tree, Graph, Sequence): Fully decomposable, supports field extraction
- Search-optimized (Bundle): Optimized for multi-field similarity search, cannot be nested
Row is particularly important because it enables SQL-like CRUD operations on individual rows, unlike Bundle which is optimized for search but doesn't support field-level updates.
Universal Query Methods¶
All molecule types support these common query methods:
| Method | Description | Common Parameters |
|---|---|---|
search(query, top_k=10) |
Hybrid semantic + symbolic search | query: str or dict, top_k: max results |
search_slots(**slot_queries) |
Per-slot weighted search | slot_queries: dict mapping slots to queries |
search_prototype(examples, top_k=10) |
Bundle examples as search prototype | examples: list of values |
Common Query Parameters¶
top_k: Maximum number of results to return (default: 10)mode: Query mode - "exact" or "fuzzy"/"semantic" (varies by method)similarity_threshold: Minimum similarity score for results (0.0-1.0)
Pair¶
Two-element structure with left and right slots.
from hybi.compose import Pair, Field, Encoding
schema = Pair(
left=Field("key", encoding=Encoding.EXACT),
right=Field("value", encoding=Encoding.SEMANTIC),
)
Supported queries: find(left=X), find(right=Y), search()
hybi.compose.Pair
dataclass
¶
Bases: BaseMolecule
Two-element binding: A ⊛ B.
Pair encodes a relationship between two elements, like key-value pairs, directed edges, or binary relations.
Slots can be Fields or nested Molecules:
# Simple: both slots are Fields
Pair(
left=Field("key", encoding=Encoding.EXACT),
right=Field("value"),
)
# Nested: left slot contains another Pair
Pair(
left=Pair(
left=Field("namespace"),
right=Field("key"),
),
right=Field("value"),
)
# Encoding: (namespace ⊛ key) ⊛ value
Encoding
left ⊛ right
Supported Queries
- find(left=X): Find right values bound with X
- find(right=Y): Find left values bound with Y
- search(): Semantic similarity search
Example
schema = Pair( ... left=Field("key", encoding=Encoding.EXACT), ... right=Field("value", encoding=Encoding.SEMANTIC), ... ) hb.ingest(df, collection="kv_store", schema=schema)
Query: What value is associated with key "config_123"?¶
results = hb.query("kv_store").find(left="config_123")
Triple¶
Three-element structure with subject, predicate, and object slots.
from hybi.compose import Triple, Field, Encoding
schema = Triple(
subject=Field("entity", encoding=Encoding.SEMANTIC),
predicate=Field("relation", encoding=Encoding.EXACT),
object=Field("target", encoding=Encoding.SEMANTIC),
)
Supported queries: find(), traverse(), search(), neighbors(), path()
hybi.compose.Triple
dataclass
¶
Bases: BaseMolecule
Three-element binding: Subject ⊛ Predicate ⊛ Object.
Triple encodes knowledge graph facts and relationships. Optimized for queries that retrieve by any slot combination.
Slots can be Fields or nested Molecules:
# Simple: all slots are Fields
Triple(
subject=Field("entity"),
predicate=Field("relation", encoding=Encoding.EXACT),
object=Field("target"),
)
# Nested: subject is a typed entity (Pair)
Triple(
subject=Pair(
left=Field("entity_type", encoding=Encoding.EXACT),
right=Field("entity_name"),
),
predicate=Field("relation"),
object=Field("target"),
)
# Encoding: (type ⊛ name) ⊛ relation ⊛ target
Encoding
subject ⊛ predicate ⊛ object
Supported Queries
- find(subject=X): Find predicates and objects for X
- find(object=Y): Find subjects and predicates pointing to Y
- find(predicate=P): Find all facts with relation P
- find(subject=X, predicate=P): Find objects
- traverse(start, path): Multi-hop graph traversal
- search(): Semantic similarity search
Example
schema = Triple( ... subject=Field("entity", encoding=Encoding.SEMANTIC), ... predicate=Field("relation", encoding=Encoding.EXACT), ... object=Field("target", encoding=Encoding.SEMANTIC), ... ) hb.ingest(facts_df, collection="knowledge", schema=schema)
Query: Who does Alice work with?¶
results = hb.query("knowledge").find( ... subject="Alice", ... predicate="works_with", ... )
Multi-hop: Alice -> works_at -> ? -> located_in -> ?¶
locations = hb.query("knowledge").traverse( ... start={"subject": "Alice"}, ... path=["works_at", "located_in"], ... )
molecule_type
property
¶
subject
instance-attribute
¶
Configuration for the subject (Field or nested Molecule).
predicate
instance-attribute
¶
Configuration for the predicate (Field or nested Molecule).
object
instance-attribute
¶
Configuration for the object (Field or nested Molecule).
__init__(subject, predicate, object, _skip_validation=False)
¶
slots()
¶
Bundle¶
Multi-field structure with user-defined fields.
from hybi.compose import Bundle, Field, Encoding
schema = Bundle(
fields={
"name": Field(encoding=Encoding.SEMANTIC, weight=1.5),
"category": Field(encoding=Encoding.EXACT),
"price": Field(encoding=Encoding.NUMERIC, similar_within=50),
}
)
Bundle limitations
Bundle uses search-optimized encoding. This means:
- Similarity scores decrease with field count
- Individual fields cannot be reliably extracted
- Bundles cannot be nested inside other molecules
Supported queries: search(), filter(), select()
hybi.compose.Bundle
dataclass
¶
Bases: BaseMolecule
Bundled field-value bindings: Σ(role_i ⊛ value_i).
Bundle is the default encoding used by HyperBinder for structured data. Each field is bound with its role vector and all are bundled.
This molecule makes the implicit Bundle encoding explicit and configurable with per-field settings.
Composable Bundles:
By default, Bundle is terminal — it cannot be nested inside other
molecules. Set composable=True to enable nesting. Composable
bundles use the bundle→repair_unitarity→braid pattern:
- Encode each field and bind with its role vector
- Weighted bundle (circular mean) all role-bound fillers
- repair_unitarity (QR projection back to unitary group)
- braid (bind with deterministic level key)
The result is a unitary vector that can safely participate in
further bind/unbind chains. See atoms.braided_bundle().
Encoding
bundle(role_1 ⊛ value_1, role_2 ⊛ value_2, ...)
Supported Queries
- search(): Semantic similarity search
- filter(where=[...]): Exact/range filtering
- select(fields=[...]): Projection
Example
schema = Bundle(fields={ ... "name": Field(encoding=Encoding.SEMANTIC, weight=1.5), ... "description": Field(encoding=Encoding.SEMANTIC, weight=2.0), ... "category": Field(encoding=Encoding.EXACT), ... "price": Field(encoding=Encoding.NUMERIC, similar_within=50), ... }) hb.ingest(products_df, collection="products", schema=schema)
Composable bundle nested inside a Pair:¶
product_info = Bundle(composable=True, fields={ ... "name": Field(encoding=Encoding.SEMANTIC), ... "price": Field(encoding=Encoding.NUMERIC), ... }) schema = Pair(left=product_info, right=Field(encoding=Encoding.SEMANTIC))
Sequence¶
Ordered items with positional encoding.
from hybi.compose import Sequence, Field
schema = Sequence(
item=Field("token", encoding=Encoding.SEMANTIC),
max_length=512,
position_encoding="random", # or "sinusoidal"
)
Parameters¶
item: Field configuration for sequence elementsmax_length: Maximum sequence length (default: 512)position_encoding: How positions are encoded - "random" (default) or "sinusoidal"
Supported queries: search(), at(position), contains(), prefix()
hybi.compose.Sequence
dataclass
¶
Bases: BaseMolecule
Ordered elements with positional encoding: item ⊛ position_vec(pos).
Sequence encodes ordered data where position matters, like text tokens, time series, or ordered lists. Each row represents one item at a specific position. The item is bound with a deterministic position vector.
The item slot can contain nested molecules:
# Simple: item is a Field
Sequence(
item=Field("token"),
position=Field("position"),
)
# Nested: items are Pairs
Sequence(
item=Pair(
left=Field("word"),
right=Field("pos_tag", encoding=Encoding.EXACT),
),
position=Field("position"),
)
# Encoding: (word ⊛ pos_tag) ⊛ position_vec
Data Format (one row per item): | position | token | |----------|---------| | 0 | apple | | 1 | banana | | 2 | cherry |
Encoding
item_vec ⊛ position_vec(pos)
Supported Queries
- search(): Semantic similarity search on items
- at(position=N): Find items at position N
Example
schema = Sequence( ... item=Field("token", encoding=Encoding.SEMANTIC), ... position=Field("position"), ... ) hb.ingest(tokens_df, collection="corpus", schema=schema)
Query: What items are at position 0?¶
results = hb.query("corpus").at(position=0)
Semantic search on items¶
results = hb.query("corpus").search("fruit")
Tree¶
Parent-child relationships with child and parent slots.
from hybi.compose import Tree, Field
schema = Tree(
child=Field("employee"),
parent=Field("manager"),
level=Field("depth"), # optional
)
Parameters¶
child: Field for child/node identifierparent: Field for parent identifierlevel: Optional Field for depth/level in hierarchy
Supported queries: search(), find(), children(), parent(), ancestors(), descendants(), siblings()
hybi.compose.Tree
dataclass
¶
Bases: BaseMolecule
Hierarchical parent-child relationships: child ⊛ parent ⊛ level(n).
Tree encodes hierarchical structures like org charts, file systems, or taxonomies. Enables traversal queries up and down the hierarchy.
Slots can contain nested molecules:
# Simple: all slots are Fields
Tree(
child=Field("employee"),
parent=Field("manager"),
)
# Nested: nodes are typed entities
Tree(
child=Pair(
left=Field("dept", encoding=Encoding.EXACT),
right=Field("employee_name"),
),
parent=Pair(
left=Field("dept", encoding=Encoding.EXACT),
right=Field("manager_name"),
),
)
# Encoding: (dept ⊛ employee) ⊛ (dept ⊛ manager) ⊛ level
Encoding
child ⊛ parent ⊛ level(depth)
Supported Queries
- search(): Semantic similarity search
- find(child=X): Find parent of X
- find(parent=Y): Find children of Y
- children(node): Direct children of a node
- parent(node): Direct parent of a node
- ancestors(node, depth): N levels up
- descendants(node, depth): N levels down
- siblings(node): Nodes with same parent
Graph¶
Node-edge-node structure with source, edge, and target slots.
from hybi.compose import Graph, Field
# Define with a single node Field (used for both source and target)
schema = Graph(
node=Field("entity"),
edge=Field("relation", encoding=Encoding.EXACT),
directed=True,
)
Querying Graph¶
Graph has three slots (source, edge, target) but only two Field configurations (node, edge). The node Field applies to both source and target. You can query using either:
q = hb.query("social", schema=schema)
# By column name (what appears in your DataFrame)
q.find(entity="Alice") # Matches both source and target
# By slot name (the structural role)
q.find(source="Alice") # Only source nodes
q.find(target="Bob") # Only target nodes
Both column names and slot names are valid query keys. This lets you distinguish between finding rows where a node appears as source vs target.
Supported queries: neighbors(), path(), subgraph(), traverse()
hybi.compose.Graph
dataclass
¶
Bases: BaseMolecule
General graph structure: node ⊛ edge ⊛ node.
Graph encodes network relationships with typed edges. Supports both directed and undirected graphs, and enables traversal, path finding, and neighborhood queries.
The node and edge slots can contain nested molecules:
# Simple: all slots are Fields
Graph(
node=Field("entity"),
edge=Field("relation", encoding=Encoding.EXACT),
)
# Nested: nodes are typed entities
Graph(
node=Pair(
left=Field("node_type", encoding=Encoding.EXACT),
right=Field("node_name"),
),
edge=Field("relation"),
)
# Encoding: (type ⊛ name) ⊛ relation ⊛ (type ⊛ name)
Encoding
source_node ⊛ edge ⊛ target_node
Supported Queries
- search(): Semantic similarity search
- find(source=X): Find edges and targets from X
- find(target=Y): Find sources and edges pointing to Y
- find(edge=E): Find all connections with edge type E
- neighbors(node, edge): Connected nodes via edge type
- path(from, to, max_hops): Path finding
- subgraph(seed, radius): Local neighborhood
- traverse(start, pattern): Pattern matching traversal
Example
schema = Graph( ... node=Field("entity"), ... edge=Field("relation", encoding=Encoding.EXACT), ... directed=True, ... ) hb.ingest(network_df, collection="social", schema=schema)
Query: Who are Alice's friends?¶
friends = hb.query("social").neighbors("Alice", edge="friends_with")
Query: Path from Alice to Charlie¶
path = hb.query("social").path(from_node="Alice", to_node="Charlie", max_hops=3)
molecule_type
property
¶
node
instance-attribute
¶
Configuration for node values (Field or nested Molecule).
Note: In wire format, this is used for both source and target nodes.
edge
instance-attribute
¶
Configuration for edge/relationship types (Field or nested Molecule).
directed = True
class-attribute
instance-attribute
¶
Whether edges are directed.
If True, source -> target (asymmetric). If False, connections are symmetric.
__init__(node, edge, directed=True, _skip_validation=False)
¶
slots()
¶
Row¶
Primary-key addressed fields for SQL-like CRUD operations.
Row enables SQL-like CRUD operations by using structured encoding instead of search-optimized encoding. This makes individual fields extractable and updatable without affecting other fields.
from hybi.compose import Row, Field, Encoding
schema = Row(
primary_key=Field("user_id", encoding=Encoding.EXACT),
fields={
"email": Field(encoding=Encoding.EXACT),
"name": Field(encoding=Encoding.SEMANTIC),
"salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
},
field_order=["name", "email", "salary"], # optional explicit ordering
)
Parameters¶
primary_key: Field for the primary key (must use EXACT encoding)fields: Dict mapping field names to Field configurationsfield_order: Optional list specifying field encoding order
Use RelationalTable instead
Row is the underlying molecule, but most users should use the RelationalTable compound which provides a more intuitive column-based API.
Supported queries: get(pk=X), update(pk=X, set={...}), delete(pk=X), search(), filter()
hybi.compose.Row
dataclass
¶
Bases: BaseMolecule
Primary-key addressed row with named fields: pk ⊛ (f1 ⊛ v1) ⊛ (f2 ⊛ v2) ⊛ ...
Row encodes database-like records with a primary key for addressing and named fields for data. Unlike Bundle (which uses lossy superposition), Row uses chain binding which is lossless - any field can be extracted cleanly by unbinding with its role vector.
This enables SQL-like CRUD operations: - Read individual fields without noise from other fields - Update specific fields without re-encoding entire row - Delete rows by primary key
The primary key must use EXACT encoding for deterministic lookups.
Encoding
pk_value ⊛ (field1_role ⊛ field1_value) ⊛ (field2_role ⊛ field2_value) ⊛ ...
Supported Queries
- get(pk=X): O(1) lookup by primary key
- filter(where=[...]): Filter by field values
- search(): Semantic similarity search
- update(pk=X, set={...}): Update field values
- delete(pk=X): Delete row
Example
schema = Row( ... primary_key=Field("user_id", encoding=Encoding.EXACT), ... fields={ ... "email": Field(encoding=Encoding.EXACT), ... "name": Field(encoding=Encoding.SEMANTIC), ... "salary": Field(encoding=Encoding.NUMERIC), ... }, ... )
Note: Row is typically used via the RelationalTable compound, which provides a more user-friendly API with column-based definition.
molecule_type
property
¶
primary_key
instance-attribute
¶
Field configuration for the primary key (must use EXACT encoding).
fields = dataclass_field(default_factory=dict)
class-attribute
instance-attribute
¶
Mapping of field names to their configurations.
__init__(primary_key, fields=dict(), field_order=None)
¶
slots()
¶
Return PK name followed by field names in order.
Nesting Molecules¶
Structured molecules can be nested:
# Typed entity knowledge graph
schema = Triple(
subject=Pair(
left=Field("entity_type", encoding=Encoding.EXACT),
right=Field("entity_name"),
),
predicate=Field("relation", encoding=Encoding.EXACT),
object=Pair(
left=Field("target_type", encoding=Encoding.EXACT),
right=Field("target_name"),
),
)
Nesting rules:
- Structured molecules (Pair, Triple, Row, Tree, Graph, Sequence) can nest inside each other
- Bundle cannot be nested (uses search-optimized encoding)
- Row is typically used at the top level (not nested) for table schemas
- Validation happens at construction time
Querying Nested Schemas¶
When querying nested schemas, you can use column names directly instead of navigating the slot hierarchy.
# For the schema above, all of these work:
q = hb.query("entities", schema=schema)
# By column name (recommended for nested schemas)
results = q.find(entity_type="Person", entity_name="Alice")
# By top-level slot (backward compatible)
results = q.find(subject="Alice")
# By dot notation (explicit path)
results = q.find(**{"subject.left": "Person"})
Field Resolution¶
The resolve_field() method maps query field names to their schema paths:
| Input | Resolves To | Example |
|---|---|---|
| Column name | Leaf field | "entity_type" → subject.left |
| Top-level slot | First matching field | "subject" → subject.left |
| Dot notation | Explicit path | "subject.left" → subject.left |
Valid Query Fields¶
Use valid_query_fields() to see all accepted field names:
>>> schema.valid_query_fields()
['entity_type', 'entity_name', 'relation', 'target_type', 'target_name',
'subject', 'predicate', 'object']
Ambiguous Field Names¶
If the same column name appears in multiple nested locations, use dot notation to disambiguate: