Skip to content

Molecules

Molecules are the core compositional building blocks for defining schemas.

Overview

Molecule Slots Best For
Pair left, right Key-value, edges
Triple subject, predicate, object Knowledge graphs
Bundle User-defined Tabular data (read-heavy)
Row pk + fields Mutable tables (CRUD)
Sequence item Ordered data
Tree child, parent Hierarchies
Graph source, edge, target Networks

Structured vs Search-Optimized

Molecules use two encoding strategies:

  • Structured (Pair, Triple, Row, Tree, Graph, Sequence): Fully decomposable, supports field extraction
  • Search-optimized (Bundle): Optimized for multi-field similarity search, cannot be nested

Row is particularly important because it enables SQL-like CRUD operations on individual rows, unlike Bundle which is optimized for search but doesn't support field-level updates.


Universal Query Methods

All molecule types support these common query methods:

Method Description Common Parameters
search(query, top_k=10) Hybrid semantic + symbolic search query: str or dict, top_k: max results
search_slots(**slot_queries) Per-slot weighted search slot_queries: dict mapping slots to queries
search_prototype(examples, top_k=10) Bundle examples as search prototype examples: list of values

Common Query Parameters

  • top_k: Maximum number of results to return (default: 10)
  • mode: Query mode - "exact" or "fuzzy"/"semantic" (varies by method)
  • similarity_threshold: Minimum similarity score for results (0.0-1.0)

Pair

Two-element structure with left and right slots.

from hybi.compose import Pair, Field, Encoding

schema = Pair(
    left=Field("key", encoding=Encoding.EXACT),
    right=Field("value", encoding=Encoding.SEMANTIC),
)

Supported queries: find(left=X), find(right=Y), search()

hybi.compose.Pair dataclass

Bases: BaseMolecule

Two-element binding: A ⊛ B.

Pair encodes a relationship between two elements, like key-value pairs, directed edges, or binary relations.

Slots can be Fields or nested Molecules:

# Simple: both slots are Fields
Pair(
    left=Field("key", encoding=Encoding.EXACT),
    right=Field("value"),
)

# Nested: left slot contains another Pair
Pair(
    left=Pair(
        left=Field("namespace"),
        right=Field("key"),
    ),
    right=Field("value"),
)
# Encoding: (namespace ⊛ key) ⊛ value
Encoding

left ⊛ right

Supported Queries
  • find(left=X): Find right values bound with X
  • find(right=Y): Find left values bound with Y
  • search(): Semantic similarity search
Example

schema = Pair( ... left=Field("key", encoding=Encoding.EXACT), ... right=Field("value", encoding=Encoding.SEMANTIC), ... ) hb.ingest(df, collection="kv_store", schema=schema)

Query: What value is associated with key "config_123"?

results = hb.query("kv_store").find(left="config_123")

molecule_type property

left instance-attribute

Configuration for the left element (Field or nested Molecule).

right instance-attribute

Configuration for the right element (Field or nested Molecule).

__init__(left, right, _skip_validation=False)

slots()


Triple

Three-element structure with subject, predicate, and object slots.

from hybi.compose import Triple, Field, Encoding

schema = Triple(
    subject=Field("entity", encoding=Encoding.SEMANTIC),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("target", encoding=Encoding.SEMANTIC),
)

Supported queries: find(), traverse(), search(), neighbors(), path()

hybi.compose.Triple dataclass

Bases: BaseMolecule

Three-element binding: Subject ⊛ Predicate ⊛ Object.

Triple encodes knowledge graph facts and relationships. Optimized for queries that retrieve by any slot combination.

Slots can be Fields or nested Molecules:

# Simple: all slots are Fields
Triple(
    subject=Field("entity"),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("target"),
)

# Nested: subject is a typed entity (Pair)
Triple(
    subject=Pair(
        left=Field("entity_type", encoding=Encoding.EXACT),
        right=Field("entity_name"),
    ),
    predicate=Field("relation"),
    object=Field("target"),
)
# Encoding: (type ⊛ name) ⊛ relation ⊛ target
Encoding

subject ⊛ predicate ⊛ object

Supported Queries
  • find(subject=X): Find predicates and objects for X
  • find(object=Y): Find subjects and predicates pointing to Y
  • find(predicate=P): Find all facts with relation P
  • find(subject=X, predicate=P): Find objects
  • traverse(start, path): Multi-hop graph traversal
  • search(): Semantic similarity search
Example

schema = Triple( ... subject=Field("entity", encoding=Encoding.SEMANTIC), ... predicate=Field("relation", encoding=Encoding.EXACT), ... object=Field("target", encoding=Encoding.SEMANTIC), ... ) hb.ingest(facts_df, collection="knowledge", schema=schema)

Query: Who does Alice work with?

results = hb.query("knowledge").find( ... subject="Alice", ... predicate="works_with", ... )

Multi-hop: Alice -> works_at -> ? -> located_in -> ?

locations = hb.query("knowledge").traverse( ... start={"subject": "Alice"}, ... path=["works_at", "located_in"], ... )

molecule_type property

subject instance-attribute

Configuration for the subject (Field or nested Molecule).

predicate instance-attribute

Configuration for the predicate (Field or nested Molecule).

object instance-attribute

Configuration for the object (Field or nested Molecule).

__init__(subject, predicate, object, _skip_validation=False)

slots()


Bundle

Multi-field structure with user-defined fields.

from hybi.compose import Bundle, Field, Encoding

schema = Bundle(
    fields={
        "name": Field(encoding=Encoding.SEMANTIC, weight=1.5),
        "category": Field(encoding=Encoding.EXACT),
        "price": Field(encoding=Encoding.NUMERIC, similar_within=50),
    }
)

Bundle limitations

Bundle uses search-optimized encoding. This means:

  • Similarity scores decrease with field count
  • Individual fields cannot be reliably extracted
  • Bundles cannot be nested inside other molecules

Supported queries: search(), filter(), select()

hybi.compose.Bundle dataclass

Bases: BaseMolecule

Bundled field-value bindings: Σ(role_i ⊛ value_i).

Bundle is the default encoding used by HyperBinder for structured data. Each field is bound with its role vector and all are bundled.

This molecule makes the implicit Bundle encoding explicit and configurable with per-field settings.

Composable Bundles:

By default, Bundle is terminal — it cannot be nested inside other molecules. Set composable=True to enable nesting. Composable bundles use the bundle→repair_unitarity→braid pattern:

  1. Encode each field and bind with its role vector
  2. Weighted bundle (circular mean) all role-bound fillers
  3. repair_unitarity (QR projection back to unitary group)
  4. braid (bind with deterministic level key)

The result is a unitary vector that can safely participate in further bind/unbind chains. See atoms.braided_bundle().

Encoding

bundle(role_1 ⊛ value_1, role_2 ⊛ value_2, ...)

Supported Queries
  • search(): Semantic similarity search
  • filter(where=[...]): Exact/range filtering
  • select(fields=[...]): Projection
Example

schema = Bundle(fields={ ... "name": Field(encoding=Encoding.SEMANTIC, weight=1.5), ... "description": Field(encoding=Encoding.SEMANTIC, weight=2.0), ... "category": Field(encoding=Encoding.EXACT), ... "price": Field(encoding=Encoding.NUMERIC, similar_within=50), ... }) hb.ingest(products_df, collection="products", schema=schema)

Composable bundle nested inside a Pair:

product_info = Bundle(composable=True, fields={ ... "name": Field(encoding=Encoding.SEMANTIC), ... "price": Field(encoding=Encoding.NUMERIC), ... }) schema = Pair(left=product_info, right=Field(encoding=Encoding.SEMANTIC))

molecule_type property

fields = dataclass_field(default_factory=dict) class-attribute instance-attribute

Mapping of field names to their configurations.

__init__(fields=dict(), composable=False)

slots()


Sequence

Ordered items with positional encoding.

from hybi.compose import Sequence, Field

schema = Sequence(
    item=Field("token", encoding=Encoding.SEMANTIC),
    max_length=512,
    position_encoding="random",  # or "sinusoidal"
)

Parameters

  • item: Field configuration for sequence elements
  • max_length: Maximum sequence length (default: 512)
  • position_encoding: How positions are encoded - "random" (default) or "sinusoidal"

Supported queries: search(), at(position), contains(), prefix()

hybi.compose.Sequence dataclass

Bases: BaseMolecule

Ordered elements with positional encoding: item ⊛ position_vec(pos).

Sequence encodes ordered data where position matters, like text tokens, time series, or ordered lists. Each row represents one item at a specific position. The item is bound with a deterministic position vector.

The item slot can contain nested molecules:

# Simple: item is a Field
Sequence(
    item=Field("token"),
    position=Field("position"),
)

# Nested: items are Pairs
Sequence(
    item=Pair(
        left=Field("word"),
        right=Field("pos_tag", encoding=Encoding.EXACT),
    ),
    position=Field("position"),
)
# Encoding: (word ⊛ pos_tag) ⊛ position_vec

Data Format (one row per item): | position | token | |----------|---------| | 0 | apple | | 1 | banana | | 2 | cherry |

Encoding

item_vec ⊛ position_vec(pos)

Supported Queries
  • search(): Semantic similarity search on items
  • at(position=N): Find items at position N
Example

schema = Sequence( ... item=Field("token", encoding=Encoding.SEMANTIC), ... position=Field("position"), ... ) hb.ingest(tokens_df, collection="corpus", schema=schema)

Query: What items are at position 0?

results = hb.query("corpus").at(position=0)

Semantic search on items

results = hb.query("corpus").search("fruit")

molecule_type property

item instance-attribute

Configuration for the sequence elements (Field or nested Molecule).

__init__(item, position=None, position_encoding='random', max_length=512, _skip_validation=False)

slots()


Tree

Parent-child relationships with child and parent slots.

from hybi.compose import Tree, Field

schema = Tree(
    child=Field("employee"),
    parent=Field("manager"),
    level=Field("depth"),  # optional
)

Parameters

  • child: Field for child/node identifier
  • parent: Field for parent identifier
  • level: Optional Field for depth/level in hierarchy

Supported queries: search(), find(), children(), parent(), ancestors(), descendants(), siblings()

hybi.compose.Tree dataclass

Bases: BaseMolecule

Hierarchical parent-child relationships: child ⊛ parent ⊛ level(n).

Tree encodes hierarchical structures like org charts, file systems, or taxonomies. Enables traversal queries up and down the hierarchy.

Slots can contain nested molecules:

# Simple: all slots are Fields
Tree(
    child=Field("employee"),
    parent=Field("manager"),
)

# Nested: nodes are typed entities
Tree(
    child=Pair(
        left=Field("dept", encoding=Encoding.EXACT),
        right=Field("employee_name"),
    ),
    parent=Pair(
        left=Field("dept", encoding=Encoding.EXACT),
        right=Field("manager_name"),
    ),
)
# Encoding: (dept ⊛ employee) ⊛ (dept ⊛ manager) ⊛ level
Encoding

child ⊛ parent ⊛ level(depth)

Supported Queries
  • search(): Semantic similarity search
  • find(child=X): Find parent of X
  • find(parent=Y): Find children of Y
  • children(node): Direct children of a node
  • parent(node): Direct parent of a node
  • ancestors(node, depth): N levels up
  • descendants(node, depth): N levels down
  • siblings(node): Nodes with same parent
Example

schema = Tree( ... child=Field("employee"), ... parent=Field("manager"), ... ) hb.ingest(org_df, collection="org", schema=schema)

Query: Who reports to Alice?

reports = hb.query("org").children("Alice")

Query: Full reporting chain for Bob

chain = hb.query("org").ancestors("Bob")

molecule_type property

child instance-attribute

Configuration for the child node (Field or nested Molecule).

parent instance-attribute

Configuration for the parent node (Field or nested Molecule).

__init__(child, parent, level=None, _skip_validation=False)

slots()


Graph

Node-edge-node structure with source, edge, and target slots.

from hybi.compose import Graph, Field

# Define with a single node Field (used for both source and target)
schema = Graph(
    node=Field("entity"),
    edge=Field("relation", encoding=Encoding.EXACT),
    directed=True,
)

Querying Graph

Graph has three slots (source, edge, target) but only two Field configurations (node, edge). The node Field applies to both source and target. You can query using either:

q = hb.query("social", schema=schema)

# By column name (what appears in your DataFrame)
q.find(entity="Alice")     # Matches both source and target

# By slot name (the structural role)
q.find(source="Alice")     # Only source nodes
q.find(target="Bob")       # Only target nodes

Both column names and slot names are valid query keys. This lets you distinguish between finding rows where a node appears as source vs target.

Supported queries: neighbors(), path(), subgraph(), traverse()

hybi.compose.Graph dataclass

Bases: BaseMolecule

General graph structure: node ⊛ edge ⊛ node.

Graph encodes network relationships with typed edges. Supports both directed and undirected graphs, and enables traversal, path finding, and neighborhood queries.

The node and edge slots can contain nested molecules:

# Simple: all slots are Fields
Graph(
    node=Field("entity"),
    edge=Field("relation", encoding=Encoding.EXACT),
)

# Nested: nodes are typed entities
Graph(
    node=Pair(
        left=Field("node_type", encoding=Encoding.EXACT),
        right=Field("node_name"),
    ),
    edge=Field("relation"),
)
# Encoding: (type ⊛ name) ⊛ relation ⊛ (type ⊛ name)
Encoding

source_node ⊛ edge ⊛ target_node

Supported Queries
  • search(): Semantic similarity search
  • find(source=X): Find edges and targets from X
  • find(target=Y): Find sources and edges pointing to Y
  • find(edge=E): Find all connections with edge type E
  • neighbors(node, edge): Connected nodes via edge type
  • path(from, to, max_hops): Path finding
  • subgraph(seed, radius): Local neighborhood
  • traverse(start, pattern): Pattern matching traversal
Example

schema = Graph( ... node=Field("entity"), ... edge=Field("relation", encoding=Encoding.EXACT), ... directed=True, ... ) hb.ingest(network_df, collection="social", schema=schema)

Query: Who are Alice's friends?

friends = hb.query("social").neighbors("Alice", edge="friends_with")

Query: Path from Alice to Charlie

path = hb.query("social").path(from_node="Alice", to_node="Charlie", max_hops=3)

molecule_type property

node instance-attribute

Configuration for node values (Field or nested Molecule).

Note: In wire format, this is used for both source and target nodes.

edge instance-attribute

Configuration for edge/relationship types (Field or nested Molecule).

directed = True class-attribute instance-attribute

Whether edges are directed.

If True, source -> target (asymmetric). If False, connections are symmetric.

__init__(node, edge, directed=True, _skip_validation=False)

slots()


Row

Primary-key addressed fields for SQL-like CRUD operations.

Row enables SQL-like CRUD operations by using structured encoding instead of search-optimized encoding. This makes individual fields extractable and updatable without affecting other fields.

from hybi.compose import Row, Field, Encoding

schema = Row(
    primary_key=Field("user_id", encoding=Encoding.EXACT),
    fields={
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
    field_order=["name", "email", "salary"],  # optional explicit ordering
)

Parameters

  • primary_key: Field for the primary key (must use EXACT encoding)
  • fields: Dict mapping field names to Field configurations
  • field_order: Optional list specifying field encoding order

Use RelationalTable instead

Row is the underlying molecule, but most users should use the RelationalTable compound which provides a more intuitive column-based API.

Supported queries: get(pk=X), update(pk=X, set={...}), delete(pk=X), search(), filter()

hybi.compose.Row dataclass

Bases: BaseMolecule

Primary-key addressed row with named fields: pk ⊛ (f1 ⊛ v1) ⊛ (f2 ⊛ v2) ⊛ ...

Row encodes database-like records with a primary key for addressing and named fields for data. Unlike Bundle (which uses lossy superposition), Row uses chain binding which is lossless - any field can be extracted cleanly by unbinding with its role vector.

This enables SQL-like CRUD operations: - Read individual fields without noise from other fields - Update specific fields without re-encoding entire row - Delete rows by primary key

The primary key must use EXACT encoding for deterministic lookups.

Encoding

pk_value ⊛ (field1_role ⊛ field1_value) ⊛ (field2_role ⊛ field2_value) ⊛ ...

Supported Queries
  • get(pk=X): O(1) lookup by primary key
  • filter(where=[...]): Filter by field values
  • search(): Semantic similarity search
  • update(pk=X, set={...}): Update field values
  • delete(pk=X): Delete row
Example

schema = Row( ... primary_key=Field("user_id", encoding=Encoding.EXACT), ... fields={ ... "email": Field(encoding=Encoding.EXACT), ... "name": Field(encoding=Encoding.SEMANTIC), ... "salary": Field(encoding=Encoding.NUMERIC), ... }, ... )

Note: Row is typically used via the RelationalTable compound, which provides a more user-friendly API with column-based definition.

molecule_type property

primary_key instance-attribute

Field configuration for the primary key (must use EXACT encoding).

fields = dataclass_field(default_factory=dict) class-attribute instance-attribute

Mapping of field names to their configurations.

__init__(primary_key, fields=dict(), field_order=None)

slots()

Return PK name followed by field names in order.


Nesting Molecules

Structured molecules can be nested:

# Typed entity knowledge graph
schema = Triple(
    subject=Pair(
        left=Field("entity_type", encoding=Encoding.EXACT),
        right=Field("entity_name"),
    ),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Pair(
        left=Field("target_type", encoding=Encoding.EXACT),
        right=Field("target_name"),
    ),
)

Nesting rules:

  • Structured molecules (Pair, Triple, Row, Tree, Graph, Sequence) can nest inside each other
  • Bundle cannot be nested (uses search-optimized encoding)
  • Row is typically used at the top level (not nested) for table schemas
  • Validation happens at construction time

Querying Nested Schemas

When querying nested schemas, you can use column names directly instead of navigating the slot hierarchy.

# For the schema above, all of these work:
q = hb.query("entities", schema=schema)

# By column name (recommended for nested schemas)
results = q.find(entity_type="Person", entity_name="Alice")

# By top-level slot (backward compatible)
results = q.find(subject="Alice")

# By dot notation (explicit path)
results = q.find(**{"subject.left": "Person"})

Field Resolution

The resolve_field() method maps query field names to their schema paths:

Input Resolves To Example
Column name Leaf field "entity_type"subject.left
Top-level slot First matching field "subject"subject.left
Dot notation Explicit path "subject.left"subject.left

Valid Query Fields

Use valid_query_fields() to see all accepted field names:

>>> schema.valid_query_fields()
['entity_type', 'entity_name', 'relation', 'target_type', 'target_name',
 'subject', 'predicate', 'object']

Ambiguous Field Names

If the same column name appears in multiple nested locations, use dot notation to disambiguate:

# If both subject and object have a "name" field with the same column name,
# an AmbiguousFieldError is raised. Use dot notation instead:
results = q.find(**{"subject.right": "Alice"})