Compounds¶

Compounds are pre-configured molecules for common domain patterns.

Overview¶

Compound	Based On	Use Case
KnowledgeGraph	Triple	Entity-relation-entity facts
Catalog	Bundle	Generic tabular data (read-heavy)
RelationalTable	Row	Mutable tables with CRUD
TimeSeries	Sequence	Time-ordered data
Hierarchy	Tree	Org charts, taxonomies
Document	Row	Chunked documents with tree-shaped navigation
Network	Graph	Social graphs, citations

Compounds expand to molecules at definition time, so they have the same capabilities once created.

KnowledgeGraph¶

Pre-configured Triple for knowledge graph data.

from hybi.compose import KnowledgeGraph

schema = KnowledgeGraph(
    entity_field="person",
    relation_field="relationship",
    # Defaults: SEMANTIC for entities, EXACT for relations
)

Equivalent to:

Triple(
    subject=Field("person", encoding=Encoding.SEMANTIC),
    predicate=Field("relationship", encoding=Encoding.EXACT),
    object=Field("target", encoding=Encoding.SEMANTIC),
)

`hybi.compose.KnowledgeGraph` `dataclass` ¶

Bases: BaseMolecule

Knowledge graph compound: entity-relation-entity triples.

A convenience wrapper around Triple with sensible defaults for knowledge graph use cases (semantic entities, exact relations).

Expands to

Triple( subject=Field(entity_field, encoding=SEMANTIC), predicate=Field(relation_field, encoding=EXACT), object=Field(entity_field, encoding=SEMANTIC), )

Example

Simple usage - defaults to entity/relation columns¶

schema = KnowledgeGraph() hb.ingest(facts_df, collection="kg", schema=schema)

Custom field names¶

schema = KnowledgeGraph( ... entity_field="person", ... relation_field="relationship", ... )

With custom encoding¶

schema = KnowledgeGraph( ... entity_field="entity", ... relation_field="predicate", ... entity_encoding=Encoding.EXACT, # For IDs instead of text ... )

`init(entity_field='entity', relation_field='relation', subject_field=None, object_field=None, entity_encoding=Encoding.SEMANTIC, relation_encoding=Encoding.EXACT, entity_weight=1.0, relation_weight=1.0)` ¶

Catalog¶

Pre-configured Bundle for tabular data.

from hybi.compose import Catalog, Field, Encoding

schema = Catalog(
    columns={
        "name": Field(encoding=Encoding.SEMANTIC, weight=1.5),
        "category": Field(encoding=Encoding.EXACT),
        "price": Field(encoding=Encoding.NUMERIC, similar_within=50),
    }
)

`hybi.compose.Catalog` `dataclass` ¶

Bases: BaseMolecule

Catalog compound: searchable collection with SQL-like operations.

A convenience wrapper around Bundle optimized for tabular data with a familiar SQL-like query interface. Catalog provides a bridge between traditional relational thinking and hyperdimensional computing.

Expands to

Bundle(fields={ column_name: Field(encoding=..., weight=...), ... })

Unlike pure relational tables, Catalog supports: - Semantic search: Find rows by meaning, not just exact values - Fuzzy matching: Similarity-based lookups with configurable thresholds - Vector joins: Join collections by semantic similarity, not just key equality

Operations Map

Catalog Method	HDC Operation
select()	Field projection (SelectQuery)
where()	Exact filter + similarity search
join()	JoinQuery (exact or semantic)
aggregate()	AggregateQuery (GROUP BY)
search()	Vector similarity search

Example

Define a products catalog¶

schema = Catalog( ... columns={ ... "name": Field(encoding=Encoding.SEMANTIC, weight=2.0), ... "description": Field(encoding=Encoding.SEMANTIC), ... "category": Field(encoding=Encoding.EXACT), ... "price": Field(encoding=Encoding.NUMERIC, similar_within=50), ... }, ... primary_key="id", ... ) hb.ingest(products_df, collection="products", schema=schema)

Traditional-style query¶

results = hb.query("products").where(category="electronics")

Semantic query (HDC advantage)¶

results = hb.query("products").search("lightweight laptop for travel")

Join with another catalog¶

order_schema = Catalog( ... columns={"product_id": Field(encoding=Encoding.EXACT), ...} ... ) joined = hb.query("orders").join("products", on="product_id")

Aggregation¶

stats = hb.query("products").aggregate( ... group_by=["category"], ... aggregations={"avg_price": ("price", "avg")} ... )

Notes

primary_key is metadata only; HDC doesn't require explicit keys
For semantic joins, use Encoding.SEMANTIC on join columns
The underlying Bundle uses bundle encoding (lossy but searchable)

`init(columns=dict(), primary_key=None, catalog_name=None)` ¶

RelationalTable¶

SQL-like table with full CRUD support.

RelationalTable provides familiar relational database semantics with atomic row-level operations. Unlike Catalog (which is optimized for search), RelationalTable uses structured encoding which enables true field-level updates.

Catalog vs RelationalTable¶

Aspect	Catalog	RelationalTable
Encoding	Search-optimized	Structured
Search	Fast	Moderate
UPDATE/DELETE	Not supported	Full support
Use case	Search catalog	Mutable tables

Use Catalog when you primarily search and append data. Use RelationalTable when you need UPDATE/DELETE operations.

from hybi.compose import RelationalTable, Field, Encoding

schema = RelationalTable(
    columns={
        "user_id": Field(encoding=Encoding.EXACT),
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
    primary_key="user_id",
)

CRUD Operations:

# Ingest data
hb.ingest(users_df, collection="users", schema=schema)

# Read by primary key
user = hb.query("users", schema).get(user_id="U001")

# Update fields atomically
hb.update(
    "users",
    where={"user_id": "U001"},
    set={"email": "new@example.com", "salary": 120000},
    schema=schema,
)

# Delete row
hb.delete("users", where={"user_id": "U001"}, schema=schema)

# Upsert (insert or update)
hb.upsert("users", row={"user_id": "U001", ...}, schema=schema)

Equivalent to:

Row(
    primary_key=Field("user_id", encoding=Encoding.EXACT),
    fields={
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
)

Search & CRUD Architecture¶

For optimal performance, use Catalog for search and RelationalTable for CRUD:

flowchart TB
    subgraph Catalog["CATALOG (Search)"]
        C1[Search-optimized<br/>fast]
        C2[Semantic Discovery]
        C1 --> C2
    end

    subgraph RelationalTable["RELATIONAL TABLE (CRUD)"]
        R1[Structured<br/>exact]
        R2[PK Lookups]
        R1 --> R2
    end

    C2 --> Bridge
    R2 --> Bridge
    Bridge[BRIDGE<br/>Primary Keys] --> Mutations[Deterministic Mutations]

Recommended pattern: - Use Catalog for semantic search (optimized for similarity matching) - Use RelationalTable for CRUD (optimized for exact field updates) - Bridge between them using shared primary keys

Single-schema alternative: RelationalTable can handle both search and CRUD, but search performance is slower than dedicated Catalog.

Fuzzy-to-Exact Bridge Pattern¶

When you need to combine semantic discovery with exact mutations, use the bridge pattern:

Fuzzy search casts a wide net using semantic similarity
Exact filters narrow to deterministic boundaries
CRUD via PKs operates on the refined set

# 1. Semantic search finds candidates
candidates = hb.query("users", schema).search("machine learning expert", top_k=50)

# 2. Exact filtering narrows to deterministic set
refined = [r for r in candidates
           if r.data["department"] == "Engineering"
           and r.data["status"] == "active"]

# 3. CRUD via primary keys (safe - deterministic)
for r in refined:
    hb.update("users", where={"user_id": r.data["user_id"]}, set={...}, schema=schema)

This pattern leverages fuzzy search for discovery ("I don't know the exact term") while ensuring mutations operate on deterministic, exactly-identified rows.

See Fuzzy-to-Exact Pattern for a complete implementation.

`hybi.compose.RelationalTable` `dataclass` ¶

Bases: BaseMolecule

SQL-like table with full CRUD support.

RelationalTable provides familiar relational database semantics: - Row-level UPDATE: Modify individual fields - Row-level DELETE: Remove rows by primary key - Field extraction: Read individual field values cleanly - ACID guarantees: Single-row atomicity

Unlike Catalog (which uses lossy Bundle encoding optimized for search), RelationalTable uses Row encoding with chain binding, which is lossless. This enables true field-level updates without re-encoding entire rows.

Trade-offs vs Catalog

Aspect	Catalog	RelationalTable
Encoding	Bundle (lossy)	Row (lossless)
Search	Fast	Moderate
UPDATE/DELETE	Not supported	Full support
Use case	Search catalog	Mutable tables

Use RelationalTable when you need UPDATE/DELETE operations. Use Catalog when you primarily search and append data.

Expands to

Row( primary_key=Field(pk_column, encoding=EXACT), fields={...other columns...}, )

Example

Define a users table¶

schema = RelationalTable( ... columns={ ... "user_id": Field(encoding=Encoding.EXACT), ... "email": Field(encoding=Encoding.EXACT), ... "name": Field(encoding=Encoding.SEMANTIC), ... "salary": Field(encoding=Encoding.NUMERIC), ... }, ... primary_key="user_id", ... ) hb.ingest(users_df, collection="users", schema=schema)

Read by primary key¶

user = hb.query("users", schema).get(user_id="U001")

Update fields¶

hb.update( ... "users", ... where={"user_id": "U001"}, ... set={"email": "new@example.com"}, ... schema=schema, ... )

Delete row¶

hb.delete("users", where={"user_id": "U001"}, schema=schema)

Notes

Primary key is required and must use EXACT encoding
Primary key cannot be updated (immutable row identity)
Updates are atomic at the row level

`columns = dataclass_field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

Column definitions mapping column names to Field configurations.

Must include the primary key column.

Example

columns={ "id": Field(encoding=Encoding.EXACT), "name": Field(encoding=Encoding.SEMANTIC), "email": Field(encoding=Encoding.EXACT), }

`primary_key = None` `class-attribute` `instance-attribute` ¶

Name of the primary key column.

Required. The referenced column must: - Exist in columns - Use EXACT encoding

The primary key provides: - O(1) row lookup via PK index - Row identity for UPDATE/DELETE operations - Uniqueness constraint on ingest

`init(columns=dict(), primary_key=None, table_name=None)` ¶

TimeSeries¶

Pre-configured molecule for time-ordered data. Supports two modes:

Temporal Mode (with timestamp_field)¶

When timestamp_field is provided, expands to a Pair enabling temporal queries:

from hybi.compose import TimeSeries

schema = TimeSeries(
    value_field="measurement",
    timestamp_field="recorded_at",  # Enables at_time(), time_range(), when()
)

Supported queries: search, find, at_time, time_range, when

Positional Mode (without timestamp_field)¶

When timestamp_field is None, expands to a Sequence enabling position-based queries:

schema = TimeSeries(
    value_field="message",
    timestamp_field=None,  # Position-based mode
    position_encoding="random",
    max_length=512,
)

Supported queries: search, at, contains, prefix

See timeseries_demo.py for a complete example using positional mode.

`hybi.compose.TimeSeries` `dataclass` ¶

Bases: BaseMolecule

Time series compound: temporal data with timestamp-value binding.

TimeSeries encodes time-indexed data using hyperdimensional temporal binding. When a timestamp_field is provided, each row is encoded as:

timestamp ⊛ value

This enables powerful temporal queries: - at_time(ts): Find values at/near a specific timestamp - time_range(start, end): Find values within a time window - when(value): Find timestamps when a value occurred

Expands to (when timestamp_field provided): Pair( left=Field(timestamp_field, encoding=TEMPORAL), right=Field(value_field, encoding=value_encoding), )

Expands to (when timestamp_field is None - legacy mode): Sequence( item=Field(value_field, encoding=value_encoding), position_encoding="sinusoidal", max_length=max_length, )

Example

Recommended: with timestamp field (enables temporal queries)¶

schema = TimeSeries( ... value_field="temperature", ... timestamp_field="recorded_at", ... value_encoding=Encoding.NUMERIC, ... ) hb.ingest(sensor_df, collection="readings", schema=schema)

Query: What was the temperature at 2pm?¶

results = hb.query("readings").at_time("2024-01-15 14:00:00")

Query: Temperatures between 1pm and 3pm¶

results = hb.query("readings").time_range( ... start="2024-01-15 13:00:00", ... end="2024-01-15 15:00:00", ... )

Legacy mode: without timestamp (uses row position)¶

schema = TimeSeries(value_field="price") # timestamp_field=None

Note: This mode only supports positional queries, not temporal¶

`init(value_field='value', timestamp_field=None, value_encoding=Encoding.SEMANTIC, value_weight=1.0, timestamp_weight=1.0, position_encoding='sinusoidal', max_length=512)` ¶

Hierarchy¶

Pre-configured Tree for parent-child relationships.

from hybi.compose import Hierarchy

schema = Hierarchy(
    node_field="employee",
    parent_field="manager",
)

`hybi.compose.Hierarchy` `dataclass` ¶

Bases: BaseMolecule

Hierarchy compound: parent-child organizational structures.

A convenience wrapper around Tree optimized for hierarchical data like org charts, file systems, taxonomies, or nested categories.

Expands to

Tree( child=Field(node_field, encoding=node_encoding), parent=Field(parent_field, encoding=node_encoding), level=Field(level_field) if level_field else None, )

Example

Org chart¶

schema = Hierarchy( ... node_field="employee", ... parent_field="manager", ... ) hb.ingest(org_df, collection="org", schema=schema)

File system with depth tracking¶

schema = Hierarchy( ... node_field="path", ... parent_field="parent_path", ... level_field="depth", ... )

Taxonomy with exact matching¶

schema = Hierarchy( ... node_field="category", ... parent_field="parent_category", ... node_encoding=Encoding.EXACT, ... )

`init(node_field='node', parent_field='parent', level_field=None, node_encoding=Encoding.SEMANTIC, node_weight=1.0)` ¶

Document¶

Use this compound when you have long-form content and need to search inside it (not just retrieve whole documents), navigate its structure (e.g. "every paragraph under chapter 3"), and pinpoint exact sections (e.g. /ch3/sec2/p4) — all against the same collection.

Alternatives fall short in different ways: collapsing each document to one vector loses chunk identity; a flat chunk store loses the structure the chunker gave you; a plain tree loses sibling order. Document keeps all of it — every chunk is a first-class row with a path, a parent, and a position.

A pluggable Chunker decides how the source splits — paragraphs, Markdown headings, or your own strategy. Swapping chunkers changes only row contents, so the same queries work across formats.

Three ways to address a chunk¶

Semantic — find chunks whose content matches a query.
Structural — walk the tree (ancestors / descendants / siblings) without a separate index.
Path — O(1) lookup of the exact chunk at /ch1/sec2/p3.

Defining a Document schema¶

from hybi.compose import Document, Field, Encoding
from hybi.compose.chunkers import MarkdownChunker

schema = Document(
    content_field="body",
    chunker=MarkdownChunker(),
    metadata_fields={
        "author": Field(encoding=Encoding.EXACT),
        "published": Field(encoding=Encoding.TEMPORAL),
    },
)

Ingest shape¶

The ingest DataFrame is document-level (one row per source document) and must include a document_id column plus the configured content_field. The chunker expands each row into per-chunk Rows at ingest time.

import pandas as pd

df = pd.DataFrame([
    {"document_id": "doc-1", "body": "# Intro\n\nFirst paragraph.\n\n## Details\n\nMore.",
     "author": "Alice", "published": "2024-01-15"},
    # ...
])

hb.ingest(df, collection="articles", schema=schema)

Reserved structural fields written into every chunk Row: chunk_id (primary key), document_id, path, parent_id, sibling_index, depth. User-supplied metadata_fields cannot collide with these.

Querying¶

Use Document.attach to get a client/collection-bound view:

pdoc = schema.attach(hb, "articles")

# Rank whole documents
docs = pdoc.search_documents("attention mechanisms", top_k=5)

# Rank individual chunks (composable ChunkHandles)
handles = pdoc.find_and_bind("async Python", top_k=3).materialize()

# Structural navigation
sec = pdoc.descendants("/intro")        # every chunk under /intro
siblings = pdoc.siblings(parent_id)     # peers under the same parent
chunk = pdoc.at("/intro/p0", document_id="doc-1")  # direct lookup

# Subtree as a first-class algebraic citizen
subtree = pdoc.subtree("/intro", document_id="doc-1")
related = subtree.intersect(hb.query("concepts"))  # cross-compound join

Use search_documents when the answer is "which document?" and find_and_bind when the answer is "which passage?". find_and_bind excludes root chunks by default; pass include_root=True for a mixed candidate set.

When multiple documents share a path (e.g. every doc has an /intro), at() and subtree() require document_id= to disambiguate. Unscoped calls on ambiguous paths raise ValueError.

structural_index="tree" opts into the Rust-accelerated descendant walk (BFS over the chunk namespace's parent_id index); the default "path" backend emits path LIKE /root/% filters and works out of the box.

Document's tree-navigation methods (descendants / ancestors / siblings / subtree) delegate to an internal OrderedTree. The structural_index kwarg forwards to it; you can also construct an OrderedTree directly for the same navigation surface without the chunker / Row-schema apparatus.

`hybi.compose.Document` `dataclass` ¶

Bases: BaseMolecule

Document compound: a queryable subtree of first-class chunks.

Each source document is decomposed by a pluggable Chunker into N chunks; every chunk becomes a lossless Row with three addressing modes:

Semantic: search hits the SEMANTIC content field, returning individual chunks (not whole documents).
Structural: per-row parent_id, sibling_index, and depth carry the tree shape, so ancestors / descendants / siblings navigate without a separate Tree index in v1.
Path: the EXACT path field (e.g. /ch1/sec2/p3) gives O(1) direct lookup.

The compound is fixed — swapping Chunker strategies changes only row contents, not the wire schema, so a FlatChunker and a MarkdownChunker ingest into the same shape.

Expands to

Row( primary_key=Field("chunk_id", EXACT), fields={ "document_id": Field(EXACT), "path": Field(EXACT), "parent_id": Field(EXACT), "sibling_index": Field(NUMERIC), "depth": Field(NUMERIC), content_field: Field(SEMANTIC, weight=content_weight), **metadata_fields, }, )

Example

Phase 1+ will wire a Chunker in; Phase 0 exposes the schema shape.¶

schema = Document( ... content_field="body", ... metadata_fields={"section_title": Field(encoding=Encoding.EXACT)}, ... )

`init(content_field='content', content_encoding=Encoding.SEMANTIC, content_weight=1.0, metadata_fields=None, chunker=None, structural_index='path')` ¶

`attach(client, collection)` ¶

Return a client/collection-bound view of this Document.

The bound view mirrors every Document method that needs a client and collection (subtree, find_and_bind, descendants, ancestors, siblings, at) without the _client=/ _collection= keyword noise. Prefer this for application code; the raw Document methods remain available as an escape hatch.

Example

pdoc = doc.attach(hb, "papers") pdoc.subtree("/abstract", document_id="1706.03762").materialize() pdoc.rollup("/abstract", document_id="1706.03762") # handle-free shortcut pdoc.find_and_bind("transformers", top_k=3) pdoc.descendants("/")

`find_and_bind(query, *, top_k, include_root=False, _client=None, _collection=None)` ¶

Semantic search within the document, returning a handle-shaped spec that can be composed with other collections or atoms.

top_k is required positional-or-keyword-only with no default: the choice between "best match" and "candidate set" is consequential and should be explicit at the call site.

By default roots (path == "/") are excluded from the search because their content_vec carries the per-document subtree rollup — appropriate for search_documents, not chunk-level retrieval. Pass include_root=True to opt back in when you want a mixed chunk + document-level candidate set.

The returned object is lazy — it does no server I/O until the caller invokes .materialize() or a method that requires concrete values (e.g. .intersect). For top_k == 1 the spec resolves to a single ChunkHandle on materialization; for top_k > 1 it resolves to a list of ChunkHandles.

`subtree(path, *, document_id=None, union=False, _client=None, _collection=None)` ¶

Return a virtual ChunkHandle whose content_vec bundles the chunk at path plus every descendant.

Per-document by default: an ambiguous path (multiple documents) raises. Pass document_id=<id> to scope, or union=True to opt into bundling across matching documents.

Delegates to the internal :class:OrderedTree.

`descendants(query, root_path, top_k=100)` ¶

Return every chunk whose path sits strictly under root_path.

`ancestors(query, path, top_k=100)` ¶

Return chunks at every ancestor path of path, parent-first.

`siblings(query, parent_id, top_k=100)` ¶

Return every chunk sharing parent_id. Caller sorts by sibling_index for ordered traversal.

`at(path, *, document_id=None, _client=None, _collection=None)` ¶

O(1) direct lookup by path.

Returns the chunk Row at path (or None if no chunk matches). EXACT encoding on the path field makes this a primary-key-style hit.

Per-document by default: if multiple documents in the collection share the same path (e.g. every Markdown doc has an /intro section), pass document_id to disambiguate. Omitting it when the path is ambiguous raises ValueError rather than returning a nondeterministic row — silent cross-document leakage is the kind of bug that only surfaces once the corpus grows past toy size.

`reingest(client, collection, source_df)` ¶

Replace every chunk belonging to the given document_id(s).

For each unique document_id in source_df, looks up that document's existing chunks by document_id and deletes them by primary key, then ingests the new rows. Leaves other documents in the collection untouched.

Fail-before-delete. The new DataFrame is run through expand_dataframe (chunker + orphan check + required-column validation) BEFORE any deletion happens. If the new data is malformed — missing columns, chunker bugs, duplicate paths — reingest raises without touching storage. This closes the "deleted old, new ingest failed, collection in half-state" class of failure for the common malformed-input case.

Ingest-phase failures that surface AFTER the pre-validation (e.g. transient storage errors) still leave the collection in a half-state; the docstring for those conditions remains best-effort. Per-row delete errors are swallowed so a row already removed by a concurrent caller doesn't block the rest.

Raises:

Type	Description
`SchemaError`	if `source_df` lacks the `document_id` column or the chunker produces malformed output.

PK-based deletion avoids row-id-counter issues that a bulk delete-by-filter can induce when mixed with re-ingest.

BoundDocument¶

A Document bound to a specific (client, collection) pair, produced by Document.attach. Mirrors every Document method that needs a client and collection (subtree, find_and_bind, descendants, etc.) without the _client= / _collection= keyword noise.

`hybi.compose.BoundDocument` ¶

A Document bound to a specific (client, collection) pair.

Produced by :meth:Document.attach. Exposes every structural / rollup / retrieval operation on Document with client/collection implicit, so call sites read as pdoc.subtree("/abstract", ...) instead of schema.subtree("/abstract", ..., _client=hb, _collection="papers").

The Document schema itself stays stateless; BoundDocument is a thin adapter. Safe to construct, discard, and re-bind the same Document to different collections.

`subtree(path, *, document_id=None, union=False)` ¶

Return a virtual ChunkHandle for the subtree rooted at path. See :meth:Document.subtree for scoping semantics.

`rollup(path, *, document_id=None, union=False)` ¶

Shortcut: return the subtree's rollup vector directly (np.ndarray).

Equivalent to self.subtree(path, ...).materialize(); provided because the handle is immediately materialized in most call sites.

`find_and_bind(query_text, *, top_k, include_root=False)` ¶

Semantic search that returns composable ChunkHandle objects. Excludes root chunks by default — those carry document-level rollups and belong to :meth:search_documents. Pass include_root=True to search across chunks and root rollups together. See :meth:Document.find_and_bind.

`search_documents(query_text, *, top_k)` ¶

Document-level semantic search: rank whole documents by the similarity of their persisted subtree rollup to query_text.

Implementation: a path-scoped semantic search over root chunks (path == "/"), whose content_vec is the per-document subtree rollup written by :meth:Document._finalize_rollups at ingest time. Works out of the box for Documents ingested through HyperBinder.ingest on a client that exposes set_row_atom; falls back to empty results when rollups haven't been persisted.

top_k is required (no default): document-scale search is expensive to paginate and the choice between "one best match" and "candidate set" should be explicit.

`descendants(root_path, top_k=100)` ¶

Every chunk whose path sits strictly under root_path.

`ancestors(path, top_k=100)` ¶

Every chunk on the path from root to path, parent-first.

`siblings(parent_id, top_k=100)` ¶

Every chunk sharing parent_id.

`at(path, *, document_id=None)` ¶

O(1) direct lookup of the chunk at path. Pass document_id when the path exists in more than one document. See :meth:Document.at.

Network¶

Pre-configured Graph for network data.

from hybi.compose import Network

schema = Network(
    node_field="user",
    edge_field="interaction",
    directed=True,
    # Optional: use separate columns for source/target nodes
    # source_field="from_user",
    # target_field="to_user",
)

`hybi.compose.Network` `dataclass` ¶

Bases: BaseMolecule

Network compound: node-edge-node graph structures.

A convenience wrapper around Graph optimized for social networks, citation graphs, dependency graphs, and other network structures.

Expands to

Graph( node=Field(node_field, encoding=node_encoding), edge=Field(edge_field, encoding=edge_encoding), directed=directed, )

Example

Social network¶

schema = Network( ... node_field="user", ... edge_field="connection_type", ... ) hb.ingest(social_df, collection="social", schema=schema)

Citation network (undirected)¶

schema = Network( ... node_field="paper_id", ... edge_field="citation_type", ... node_encoding=Encoding.EXACT, ... directed=False, ... )

Dependency graph¶

schema = Network( ... node_field="package", ... edge_field="dependency_type", ... source_field="dependent", ... target_field="dependency", ... )

`init(node_field='node', edge_field='edge', source_field=None, target_field=None, node_encoding=Encoding.SEMANTIC, edge_encoding=Encoding.EXACT, node_weight=1.0, edge_weight=1.0, directed=True)` ¶

When to Use Compounds vs Molecules¶

Use compounds when:

Your data fits a common pattern
You want sensible encoding defaults
You're prototyping quickly

Use molecules when:

You need custom encodings per field
You're nesting structures
You need fine-grained control over weights

See Molecules vs Compounds for detailed guidance.

Example Code¶

Complete runnable examples for each compound type:

Compound	Example File	Description
KnowledgeGraph	`knowledge_graph_demo.py`	Entity-relation-entity facts with traversal
Document	`document_demo.py`	Chunked documents with tree-shaped navigation
Document (arXiv)	`document_arxiv_demo.py`	Real arXiv papers + semantic search + cross-compound hyperedges
Hierarchy	`hierarchy_demo.py`	Org charts and taxonomies
TimeSeries	`timeseries_demo.py`	Time-ordered data
Network	`network_demo.py`	Social graphs and citations
Catalog	`product_catalog_demo.py`	Product catalogs with search
RelationalTable	`fuzzy_to_exact_demo.py`	CRUD with fuzzy-to-exact pattern

Run any example from the SDK directory:

cd sdk
python examples/compose/knowledge_graph_demo.py

See Examples README for the full example index.

Compounds¶

Overview¶

KnowledgeGraph¶

hybi.compose.KnowledgeGraph dataclass ¶

Simple usage - defaults to entity/relation columns¶

Custom field names¶

With custom encoding¶

__init__(entity_field='entity', relation_field='relation', subject_field=None, object_field=None, entity_encoding=Encoding.SEMANTIC, relation_encoding=Encoding.EXACT, entity_weight=1.0, relation_weight=1.0) ¶

Catalog¶

hybi.compose.Catalog dataclass ¶

Define a products catalog¶

Traditional-style query¶

Semantic query (HDC advantage)¶

Join with another catalog¶

Aggregation¶

__init__(columns=dict(), primary_key=None, catalog_name=None) ¶

RelationalTable¶

Catalog vs RelationalTable¶

Search & CRUD Architecture¶

Fuzzy-to-Exact Bridge Pattern¶

hybi.compose.RelationalTable dataclass ¶

Define a users table¶

Read by primary key¶

Update fields¶

Delete row¶

columns = dataclass_field(default_factory=dict) class-attribute instance-attribute ¶

primary_key = None class-attribute instance-attribute ¶

__init__(columns=dict(), primary_key=None, table_name=None) ¶

TimeSeries¶

Temporal Mode (with timestamp_field)¶

Positional Mode (without timestamp_field)¶

hybi.compose.TimeSeries dataclass ¶

Recommended: with timestamp field (enables temporal queries)¶

Query: What was the temperature at 2pm?¶

Query: Temperatures between 1pm and 3pm¶

Legacy mode: without timestamp (uses row position)¶

Note: This mode only supports positional queries, not temporal¶

__init__(value_field='value', timestamp_field=None, value_encoding=Encoding.SEMANTIC, value_weight=1.0, timestamp_weight=1.0, position_encoding='sinusoidal', max_length=512) ¶

Hierarchy¶

hybi.compose.Hierarchy dataclass ¶

Org chart¶

File system with depth tracking¶

Taxonomy with exact matching¶

__init__(node_field='node', parent_field='parent', level_field=None, node_encoding=Encoding.SEMANTIC, node_weight=1.0) ¶

Document¶

Three ways to address a chunk¶

Defining a Document schema¶

Ingest shape¶

Querying¶

hybi.compose.Document dataclass ¶

Phase 1+ will wire a Chunker in; Phase 0 exposes the schema shape.¶

__init__(content_field='content', content_encoding=Encoding.SEMANTIC, content_weight=1.0, metadata_fields=None, chunker=None, structural_index='path') ¶

attach(client, collection) ¶

find_and_bind(query, *, top_k, include_root=False, _client=None, _collection=None) ¶

subtree(path, *, document_id=None, union=False, _client=None, _collection=None) ¶

descendants(query, root_path, top_k=100) ¶

ancestors(query, path, top_k=100) ¶

siblings(query, parent_id, top_k=100) ¶

at(path, *, document_id=None, _client=None, _collection=None) ¶

reingest(client, collection, source_df) ¶

BoundDocument¶

hybi.compose.BoundDocument ¶

subtree(path, *, document_id=None, union=False) ¶

rollup(path, *, document_id=None, union=False) ¶

find_and_bind(query_text, *, top_k, include_root=False) ¶

search_documents(query_text, *, top_k) ¶

descendants(root_path, top_k=100) ¶

ancestors(path, top_k=100) ¶

siblings(parent_id, top_k=100) ¶

at(path, *, document_id=None) ¶

Network¶

hybi.compose.Network dataclass ¶

Social network¶

Citation network (undirected)¶

Dependency graph¶

__init__(node_field='node', edge_field='edge', source_field=None, target_field=None, node_encoding=Encoding.SEMANTIC, edge_encoding=Encoding.EXACT, node_weight=1.0, edge_weight=1.0, directed=True) ¶

When to Use Compounds vs Molecules¶

Example Code¶

`hybi.compose.KnowledgeGraph` `dataclass` ¶

`init(entity_field='entity', relation_field='relation', subject_field=None, object_field=None, entity_encoding=Encoding.SEMANTIC, relation_encoding=Encoding.EXACT, entity_weight=1.0, relation_weight=1.0)` ¶

`hybi.compose.Catalog` `dataclass` ¶

`init(columns=dict(), primary_key=None, catalog_name=None)` ¶

`hybi.compose.RelationalTable` `dataclass` ¶

`columns = dataclass_field(default_factory=dict)` `class-attribute` `instance-attribute` ¶

`primary_key = None` `class-attribute` `instance-attribute` ¶

`init(columns=dict(), primary_key=None, table_name=None)` ¶

`hybi.compose.TimeSeries` `dataclass` ¶

`init(value_field='value', timestamp_field=None, value_encoding=Encoding.SEMANTIC, value_weight=1.0, timestamp_weight=1.0, position_encoding='sinusoidal', max_length=512)` ¶

`hybi.compose.Hierarchy` `dataclass` ¶

`init(node_field='node', parent_field='parent', level_field=None, node_encoding=Encoding.SEMANTIC, node_weight=1.0)` ¶

`hybi.compose.Document` `dataclass` ¶

`init(content_field='content', content_encoding=Encoding.SEMANTIC, content_weight=1.0, metadata_fields=None, chunker=None, structural_index='path')` ¶

`attach(client, collection)` ¶

`find_and_bind(query, *, top_k, include_root=False, _client=None, _collection=None)` ¶

`subtree(path, *, document_id=None, union=False, _client=None, _collection=None)` ¶

`descendants(query, root_path, top_k=100)` ¶

`ancestors(query, path, top_k=100)` ¶

`siblings(query, parent_id, top_k=100)` ¶

`at(path, *, document_id=None, _client=None, _collection=None)` ¶

`reingest(client, collection, source_df)` ¶

`hybi.compose.BoundDocument` ¶

`subtree(path, *, document_id=None, union=False)` ¶

`rollup(path, *, document_id=None, union=False)` ¶

`find_and_bind(query_text, *, top_k, include_root=False)` ¶

`search_documents(query_text, *, top_k)` ¶

`descendants(root_path, top_k=100)` ¶

`ancestors(path, top_k=100)` ¶

`siblings(parent_id, top_k=100)` ¶

`at(path, *, document_id=None)` ¶

`hybi.compose.Network` `dataclass` ¶

`init(node_field='node', edge_field='edge', source_field=None, target_field=None, node_encoding=Encoding.SEMANTIC, edge_encoding=Encoding.EXACT, node_weight=1.0, edge_weight=1.0, directed=True)` ¶