Molecules¶

Molecules are the core compositional building blocks for defining schemas.

Overview¶

Molecule	Slots	Best For
Pair	left, right	Key-value, edges
Triple	subject, predicate, object	Knowledge graphs
Bundle	User-defined	Tabular data (read-heavy)
Row	pk + fields	Mutable tables (CRUD)
Sequence	item	Ordered data
Tree	child, parent	Hierarchies
Graph	source, edge, target	Networks

Structured vs Search-Optimized¶

Molecules use two encoding strategies:

Structured (Pair, Triple, Row, Tree, Graph, Sequence): Fully decomposable, supports field extraction
Search-optimized (Bundle): Optimized for multi-field similarity search, cannot be nested

Row is particularly important because it enables SQL-like CRUD operations on individual rows, unlike Bundle which is optimized for search but doesn't support field-level updates.

Universal Query Methods¶

All molecule types support these common query methods:

Method	Description	Common Parameters
`search(query, top_k=10)`	Hybrid semantic + symbolic search	`query`: str or dict, `top_k`: max results
`search_slots(**slot_queries)`	Per-slot weighted search	`slot_queries`: dict mapping slots to queries
`search_prototype(examples, top_k=10)`	Bundle examples as search prototype	`examples`: list of values

Common Query Parameters¶

top_k: Maximum number of results to return (default: 10)
mode: Query mode - "exact" or "fuzzy"/"semantic" (varies by method)
similarity_threshold: Minimum similarity score for results (0.0-1.0)

Pair¶

Two-element structure with left and right slots.

from hybi.compose import Pair, Field, Encoding

schema = Pair(
    left=Field("key", encoding=Encoding.EXACT),
    right=Field("value", encoding=Encoding.SEMANTIC),
)

Supported queries: find(left=X), find(right=Y), search()

`hybi.compose.Pair` `dataclass` ¶

Bases: BaseMolecule

Two-element binding: A ⊛ B.

Pair encodes a relationship between two elements, like key-value pairs, directed edges, or binary relations.

Slots can be Fields or nested Molecules:

# Simple: both slots are Fields
Pair(
    left=Field("key", encoding=Encoding.EXACT),
    right=Field("value"),
)

# Nested: left slot contains another Pair
Pair(
    left=Pair(
        left=Field("namespace"),
        right=Field("key"),
    ),
    right=Field("value"),
)
# Encoding: (namespace ⊛ key) ⊛ value

Encoding

left ⊛ right

Supported Queries

find(left=X): Find right values bound with X
find(right=Y): Find left values bound with Y
search(): Semantic similarity search

Example

schema = Pair( ... left=Field("key", encoding=Encoding.EXACT), ... right=Field("value", encoding=Encoding.SEMANTIC), ... ) hb.ingest(df, collection="kv_store", schema=schema)

Query: What value is associated with key "config_123"?¶

results = hb.query("kv_store").find(left="config_123")

`molecule_type` `property` ¶

`left` `instance-attribute` ¶

Configuration for the left element (Field or nested Molecule).

`right` `instance-attribute` ¶

Configuration for the right element (Field or nested Molecule).

`init(left, right, _skip_validation=False)` ¶

`slots()` ¶

Triple¶

Three-element structure with subject, predicate, and object slots.

from hybi.compose import Triple, Field, Encoding

schema = Triple(
    subject=Field("entity", encoding=Encoding.SEMANTIC),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("target", encoding=Encoding.SEMANTIC),
)

Supported queries: find(), traverse(), search(), neighbors(), path()

`hybi.compose.Triple` `dataclass` ¶

Bases: BaseMolecule

Three-element binding: Subject ⊛ Predicate ⊛ Object.

Triple encodes knowledge graph facts and relationships. Optimized for queries that retrieve by any slot combination.

Slots can be Fields or nested Molecules:

# Simple: all slots are Fields
Triple(
    subject=Field("entity"),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("target"),
)

# Nested: subject is a typed entity (Pair)
Triple(
    subject=Pair(
        left=Field("entity_type", encoding=Encoding.EXACT),
        right=Field("entity_name"),
    ),
    predicate=Field("relation"),
    object=Field("target"),
)
# Encoding: (type ⊛ name) ⊛ relation ⊛ target

Encoding

subject ⊛ predicate ⊛ object

Supported Queries

find(subject=X): Find predicates and objects for X
find(object=Y): Find subjects and predicates pointing to Y
find(predicate=P): Find all facts with relation P
find(subject=X, predicate=P): Find objects
traverse(start, path): Multi-hop graph traversal
search(): Semantic similarity search

Example

schema = Triple( ... subject=Field("entity", encoding=Encoding.SEMANTIC), ... predicate=Field("relation", encoding=Encoding.EXACT), ... object=Field("target", encoding=Encoding.SEMANTIC), ... ) hb.ingest(facts_df, collection="knowledge", schema=schema)

Query: Who does Alice work with?¶

results = hb.query("knowledge").find( ... subject="Alice", ... predicate="works_with", ... )

Multi-hop: Alice -> works_at -> ? -> located_in -> ?¶

locations = hb.query("knowledge").traverse( ... start={"subject": "Alice"}, ... path=["works_at", "located_in"], ... )

`molecule_type` `property` ¶

`subject` `instance-attribute` ¶

Configuration for the subject (Field or nested Molecule).

`predicate` `instance-attribute` ¶

Configuration for the predicate (Field or nested Molecule).

`object` `instance-attribute` ¶

Configuration for the object (Field or nested Molecule).

`init(subject, predicate, object, _skip_validation=False)` ¶

`slots()` ¶

Bundle¶

Multi-field structure with user-defined fields.

from hybi.compose import Bundle, Field, Encoding

schema = Bundle(
    fields={
        "name": Field(encoding=Encoding.SEMANTIC, weight=1.5),
        "category": Field(encoding=Encoding.EXACT),
        "price": Field(encoding=Encoding.NUMERIC, similar_within=50),
    }
)

Bundle limitations

Bundle uses search-optimized encoding. This means:

Similarity scores decrease with field count
Individual fields cannot be reliably extracted
Bundles cannot be nested inside other molecules

Supported queries: search(), filter(), select()

`hybi.compose.Bundle` `dataclass` ¶

Bases: BaseMolecule

Bundled field-value bindings: Σ(role_i ⊛ value_i).

Bundle is the default encoding used by HyperBinder for structured data. Each field is bound with its role vector and all are bundled.

This molecule makes the implicit Bundle encoding explicit and configurable with per-field settings.

Composable Bundles:

By default, Bundle is terminal — it cannot be nested inside other molecules. Set composable=True to enable nesting. Composable bundles use the bundle→repair_unitarity→braid pattern:

Encode each field and bind with its role vector
Weighted bundle (circular mean) all role-bound fillers
repair_unitarity (QR projection back to unitary group)
braid (bind with deterministic level key)

The result is a unitary vector that can safely participate in further bind/unbind chains. See atoms.braided_bundle().

Encoding

bundle(role_1 ⊛ value_1, role_2 ⊛ value_2, ...)

Supported Queries

search(): Semantic similarity search
filter(where=[...]): Exact/range filtering
select(fields=[...]): Projection

Example

schema = Bundle(fields={ ... "name": Field(encoding=Encoding.SEMANTIC, weight=1.5), ... "description": Field(encoding=Encoding.SEMANTIC, weight=2.0), ... "category": Field(encoding=Encoding.EXACT), ... "price": Field(encoding=Encoding.NUMERIC, similar_within=50), ... }) hb.ingest(products_df, collection="products", schema=schema)

Composable bundle nested inside a Pair:¶

product_info = Bundle(composable=True, fields={ ... "name": Field(encoding=Encoding.SEMANTIC), ... "price": Field(encoding=Encoding.NUMERIC), ... }) schema = Pair(left=product_info, right=Field(encoding=Encoding.SEMANTIC))

`molecule_type` `property` ¶

`fields = dataclass_field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

Mapping of field names to their configurations.

`init(fields=dict(), composable=False)` ¶

`slots()` ¶

Sequence¶

Ordered items with positional encoding.

from hybi.compose import Sequence, Field

schema = Sequence(
    item=Field("token", encoding=Encoding.SEMANTIC),
    max_length=512,
    position_encoding="random",  # or "sinusoidal"
)

Parameters¶

item: Field configuration for sequence elements
max_length: Maximum sequence length (default: 512)
position_encoding: How positions are encoded - "random" (default) or "sinusoidal"

Supported queries: search(), at(position), contains(), prefix()

`hybi.compose.Sequence` `dataclass` ¶

Bases: BaseMolecule

Ordered elements with positional encoding: item ⊛ position_vec(pos).

Sequence encodes ordered data where position matters, like text tokens, time series, or ordered lists. Each row represents one item at a specific position. The item is bound with a deterministic position vector.

The item slot can contain nested molecules:

# Simple: item is a Field
Sequence(
    item=Field("token"),
    position=Field("position"),
)

# Nested: items are Pairs
Sequence(
    item=Pair(
        left=Field("word"),
        right=Field("pos_tag", encoding=Encoding.EXACT),
    ),
    position=Field("position"),
)
# Encoding: (word ⊛ pos_tag) ⊛ position_vec

Data Format (one row per item): | position | token | |----------|---------| | 0 | apple | | 1 | banana | | 2 | cherry |

Encoding

item_vec ⊛ position_vec(pos)

Supported Queries

search(): Semantic similarity search on items
at(position=N): Find items at position N

Example

schema = Sequence( ... item=Field("token", encoding=Encoding.SEMANTIC), ... position=Field("position"), ... ) hb.ingest(tokens_df, collection="corpus", schema=schema)

Query: What items are at position 0?¶

results = hb.query("corpus").at(position=0)

Semantic search on items¶

results = hb.query("corpus").search("fruit")

`molecule_type` `property` ¶

`item` `instance-attribute` ¶

Configuration for the sequence elements (Field or nested Molecule).

`init(item, position=None, position_encoding='random', max_length=512, _skip_validation=False)` ¶

`slots()` ¶

Tree¶

Parent-child relationships with child and parent slots.

from hybi.compose import Tree, Field

schema = Tree(
    child=Field("employee"),
    parent=Field("manager"),
    level=Field("depth"),  # optional
)

Parameters¶

child: Field for child/node identifier
parent: Field for parent identifier
level: Optional Field for depth/level in hierarchy

Supported queries: search(), find(), children(), parent(), ancestors(), descendants(), siblings()

`hybi.compose.Tree` `dataclass` ¶

Bases: BaseMolecule

Hierarchical parent-child relationships: child ⊛ parent ⊛ level(n).

Tree encodes hierarchical structures like org charts, file systems, or taxonomies. Enables traversal queries up and down the hierarchy.

Slots can contain nested molecules:

# Simple: all slots are Fields
Tree(
    child=Field("employee"),
    parent=Field("manager"),
)

# Nested: nodes are typed entities
Tree(
    child=Pair(
        left=Field("dept", encoding=Encoding.EXACT),
        right=Field("employee_name"),
    ),
    parent=Pair(
        left=Field("dept", encoding=Encoding.EXACT),
        right=Field("manager_name"),
    ),
)
# Encoding: (dept ⊛ employee) ⊛ (dept ⊛ manager) ⊛ level

Encoding

child ⊛ parent ⊛ level(depth)

Supported Queries

search(): Semantic similarity search
find(child=X): Find parent of X
find(parent=Y): Find children of Y
children(node): Direct children of a node
parent(node): Direct parent of a node
ancestors(node, depth): N levels up
descendants(node, depth): N levels down
siblings(node): Nodes with same parent

Example

schema = Tree( ... child=Field("employee"), ... parent=Field("manager"), ... ) hb.ingest(org_df, collection="org", schema=schema)

Query: Who reports to Alice?¶

reports = hb.query("org").children("Alice")

Query: Full reporting chain for Bob¶

chain = hb.query("org").ancestors("Bob")

`molecule_type` `property` ¶

`child` `instance-attribute` ¶

Configuration for the child node (Field or nested Molecule).

`parent` `instance-attribute` ¶

Configuration for the parent node (Field or nested Molecule).

`init(child, parent, level=None, _skip_validation=False)` ¶

`slots()` ¶

Graph¶

Node-edge-node structure with source, edge, and target slots.

from hybi.compose import Graph, Field

# Define with a single node Field (used for both source and target)
schema = Graph(
    node=Field("entity"),
    edge=Field("relation", encoding=Encoding.EXACT),
    directed=True,
)

Querying Graph¶

Graph has three slots (source, edge, target) but only two Field configurations (node, edge). The node Field applies to both source and target. You can query using either:

q = hb.query("social", schema=schema)

# By column name (what appears in your DataFrame)
q.find(entity="Alice")     # Matches both source and target

# By slot name (the structural role)
q.find(source="Alice")     # Only source nodes
q.find(target="Bob")       # Only target nodes

Both column names and slot names are valid query keys. This lets you distinguish between finding rows where a node appears as source vs target.

Supported queries: neighbors(), path(), subgraph(), traverse()

`hybi.compose.Graph` `dataclass` ¶

Bases: BaseMolecule

General graph structure: node ⊛ edge ⊛ node.

Graph encodes network relationships with typed edges. Supports both directed and undirected graphs, and enables traversal, path finding, and neighborhood queries.

The node and edge slots can contain nested molecules:

# Simple: all slots are Fields
Graph(
    node=Field("entity"),
    edge=Field("relation", encoding=Encoding.EXACT),
)

# Nested: nodes are typed entities
Graph(
    node=Pair(
        left=Field("node_type", encoding=Encoding.EXACT),
        right=Field("node_name"),
    ),
    edge=Field("relation"),
)
# Encoding: (type ⊛ name) ⊛ relation ⊛ (type ⊛ name)

Encoding

source_node ⊛ edge ⊛ target_node

Supported Queries

search(): Semantic similarity search
find(source=X): Find edges and targets from X
find(target=Y): Find sources and edges pointing to Y
find(edge=E): Find all connections with edge type E
neighbors(node, edge): Connected nodes via edge type
path(from, to, max_hops): Path finding
subgraph(seed, radius): Local neighborhood
traverse(start, pattern): Pattern matching traversal

Example

schema = Graph( ... node=Field("entity"), ... edge=Field("relation", encoding=Encoding.EXACT), ... directed=True, ... ) hb.ingest(network_df, collection="social", schema=schema)

Query: Who are Alice's friends?¶

friends = hb.query("social").neighbors("Alice", edge="friends_with")

Query: Path from Alice to Charlie¶

path = hb.query("social").path(from_node="Alice", to_node="Charlie", max_hops=3)

`molecule_type` `property` ¶

`node` `instance-attribute` ¶

Configuration for node values (Field or nested Molecule).

Note: In wire format, this is used for both source and target nodes.

`edge` `instance-attribute` ¶

Configuration for edge/relationship types (Field or nested Molecule).

`directed = True` `class-attribute` `instance-attribute` ¶

Whether edges are directed.

If True, source -> target (asymmetric). If False, connections are symmetric.

`init(node, edge, directed=True, _skip_validation=False)` ¶

`slots()` ¶

Row¶

Primary-key addressed fields for SQL-like CRUD operations.

Row enables SQL-like CRUD operations by using structured encoding instead of search-optimized encoding. This makes individual fields extractable and updatable without affecting other fields.

from hybi.compose import Row, Field, Encoding

schema = Row(
    primary_key=Field("user_id", encoding=Encoding.EXACT),
    fields={
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
    field_order=["name", "email", "salary"],  # optional explicit ordering
)

Parameters¶

primary_key: Field for the primary key (must use EXACT encoding)
fields: Dict mapping field names to Field configurations
field_order: Optional list specifying field encoding order

Use RelationalTable instead

Row is the underlying molecule, but most users should use the RelationalTable compound which provides a more intuitive column-based API.

Supported queries: get(pk=X), update(pk=X, set={...}), delete(pk=X), search(), filter()

`hybi.compose.Row` `dataclass` ¶

Bases: BaseMolecule

Primary-key addressed row with named fields: pk ⊛ (f1 ⊛ v1) ⊛ (f2 ⊛ v2) ⊛ ...

Row encodes database-like records with a primary key for addressing and named fields for data. Unlike Bundle (which uses lossy superposition), Row uses chain binding which is lossless - any field can be extracted cleanly by unbinding with its role vector.

This enables SQL-like CRUD operations: - Read individual fields without noise from other fields - Update specific fields without re-encoding entire row - Delete rows by primary key

The primary key must use EXACT encoding for deterministic lookups.

Encoding

pk_value ⊛ (field1_role ⊛ field1_value) ⊛ (field2_role ⊛ field2_value) ⊛ ...

Supported Queries

get(pk=X): O(1) lookup by primary key
filter(where=[...]): Filter by field values
search(): Semantic similarity search
update(pk=X, set={...}): Update field values
delete(pk=X): Delete row

Example

schema = Row( ... primary_key=Field("user_id", encoding=Encoding.EXACT), ... fields={ ... "email": Field(encoding=Encoding.EXACT), ... "name": Field(encoding=Encoding.SEMANTIC), ... "salary": Field(encoding=Encoding.NUMERIC), ... }, ... )

Note: Row is typically used via the RelationalTable compound, which provides a more user-friendly API with column-based definition.

`molecule_type` `property` ¶

`primary_key` `instance-attribute` ¶

Field configuration for the primary key (must use EXACT encoding).

`fields = dataclass_field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

Mapping of field names to their configurations.

`init(primary_key, fields=dict(), field_order=None)` ¶

`slots()` ¶

Return PK name followed by field names in order.

Nesting Molecules¶

Structured molecules can be nested:

# Typed entity knowledge graph
schema = Triple(
    subject=Pair(
        left=Field("entity_type", encoding=Encoding.EXACT),
        right=Field("entity_name"),
    ),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Pair(
        left=Field("target_type", encoding=Encoding.EXACT),
        right=Field("target_name"),
    ),
)

Nesting rules:

Structured molecules (Pair, Triple, Row, Tree, Graph, Sequence) can nest inside each other
Bundle cannot be nested (uses search-optimized encoding)
Row is typically used at the top level (not nested) for table schemas
Validation happens at construction time

Querying Nested Schemas¶

When querying nested schemas, you can use column names directly instead of navigating the slot hierarchy.

# For the schema above, all of these work:
q = hb.query("entities", schema=schema)

# By column name (recommended for nested schemas)
results = q.find(entity_type="Person", entity_name="Alice")

# By top-level slot (backward compatible)
results = q.find(subject="Alice")

# By dot notation (explicit path)
results = q.find(**{"subject.left": "Person"})

Field Resolution¶

The resolve_field() method maps query field names to their schema paths:

Input	Resolves To	Example
Column name	Leaf field	`"entity_type"` → `subject.left`
Top-level slot	First matching field	`"subject"` → `subject.left`
Dot notation	Explicit path	`"subject.left"` → `subject.left`

Valid Query Fields¶

Use valid_query_fields() to see all accepted field names:

>>> schema.valid_query_fields()
['entity_type', 'entity_name', 'relation', 'target_type', 'target_name',
 'subject', 'predicate', 'object']

Ambiguous Field Names¶

If the same column name appears in multiple nested locations, use dot notation to disambiguate:

# If both subject and object have a "name" field with the same column name,
# an AmbiguousFieldError is raised. Use dot notation instead:
results = q.find(**{"subject.right": "Alice"})

Molecules¶

Overview¶

Structured vs Search-Optimized¶

Universal Query Methods¶

Common Query Parameters¶

Pair¶

hybi.compose.Pair dataclass ¶

Query: What value is associated with key "config_123"?¶

molecule_type property ¶

left instance-attribute ¶

right instance-attribute ¶

__init__(left, right, _skip_validation=False) ¶

slots() ¶

Triple¶

hybi.compose.Triple dataclass ¶

Query: Who does Alice work with?¶

Multi-hop: Alice -> works_at -> ? -> located_in -> ?¶

molecule_type property ¶

subject instance-attribute ¶

predicate instance-attribute ¶

object instance-attribute ¶

__init__(subject, predicate, object, _skip_validation=False) ¶

slots() ¶

Bundle¶

hybi.compose.Bundle dataclass ¶

Composable bundle nested inside a Pair:¶

molecule_type property ¶

fields = dataclass_field(default_factory=dict) class-attribute instance-attribute ¶

__init__(fields=dict(), composable=False) ¶

slots() ¶

Sequence¶

Parameters¶

hybi.compose.Sequence dataclass ¶

Query: What items are at position 0?¶

Semantic search on items¶

molecule_type property ¶

item instance-attribute ¶

__init__(item, position=None, position_encoding='random', max_length=512, _skip_validation=False) ¶

slots() ¶

Tree¶

Parameters¶

hybi.compose.Tree dataclass ¶

Query: Who reports to Alice?¶

Query: Full reporting chain for Bob¶

molecule_type property ¶

child instance-attribute ¶

parent instance-attribute ¶

__init__(child, parent, level=None, _skip_validation=False) ¶

slots() ¶

Graph¶

Querying Graph¶

hybi.compose.Graph dataclass ¶

Query: Who are Alice's friends?¶

Query: Path from Alice to Charlie¶

molecule_type property ¶

node instance-attribute ¶

edge instance-attribute ¶

directed = True class-attribute instance-attribute ¶

__init__(node, edge, directed=True, _skip_validation=False) ¶

slots() ¶

Row¶

Parameters¶

hybi.compose.Row dataclass ¶

molecule_type property ¶

primary_key instance-attribute ¶

fields = dataclass_field(default_factory=dict) class-attribute instance-attribute ¶

__init__(primary_key, fields=dict(), field_order=None) ¶

slots() ¶

Nesting Molecules¶

Querying Nested Schemas¶

Field Resolution¶

Valid Query Fields¶

Ambiguous Field Names¶

`hybi.compose.Pair` `dataclass` ¶

`molecule_type` `property` ¶

`left` `instance-attribute` ¶

`right` `instance-attribute` ¶

`init(left, right, _skip_validation=False)` ¶

`slots()` ¶

`hybi.compose.Triple` `dataclass` ¶

`molecule_type` `property` ¶

`subject` `instance-attribute` ¶

`predicate` `instance-attribute` ¶

`object` `instance-attribute` ¶

`init(subject, predicate, object, _skip_validation=False)` ¶

`slots()` ¶

`hybi.compose.Bundle` `dataclass` ¶

`molecule_type` `property` ¶

`fields = dataclass_field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`init(fields=dict(), composable=False)` ¶

`slots()` ¶

`hybi.compose.Sequence` `dataclass` ¶

`molecule_type` `property` ¶

`item` `instance-attribute` ¶

`init(item, position=None, position_encoding='random', max_length=512, _skip_validation=False)` ¶

`slots()` ¶

`hybi.compose.Tree` `dataclass` ¶

`molecule_type` `property` ¶

`child` `instance-attribute` ¶

`parent` `instance-attribute` ¶

`init(child, parent, level=None, _skip_validation=False)` ¶

`slots()` ¶

`hybi.compose.Graph` `dataclass` ¶

`molecule_type` `property` ¶

`node` `instance-attribute` ¶

`edge` `instance-attribute` ¶

`directed = True` `class-attribute` `instance-attribute` ¶

`init(node, edge, directed=True, _skip_validation=False)` ¶

`slots()` ¶

`hybi.compose.Row` `dataclass` ¶

`molecule_type` `property` ¶

`primary_key` `instance-attribute` ¶

`fields = dataclass_field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`init(primary_key, fields=dict(), field_order=None)` ¶

`slots()` ¶