Skip to content

Compounds

Compounds are pre-configured molecules for common domain patterns.

Overview

Compound Based On Use Case
KnowledgeGraph Triple Entity-relation-entity facts
Catalog Bundle Generic tabular data (read-heavy)
RelationalTable Row Mutable tables with CRUD
TimeSeries Sequence Time-ordered data
Hierarchy Tree Org charts, taxonomies
Document Row Chunked documents with tree-shaped navigation
Network Graph Social graphs, citations

Compounds expand to molecules at definition time, so they have the same capabilities once created.


KnowledgeGraph

Pre-configured Triple for knowledge graph data.

from hybi.compose import KnowledgeGraph

schema = KnowledgeGraph(
    entity_field="person",
    relation_field="relationship",
    # Defaults: SEMANTIC for entities, EXACT for relations
)

Equivalent to:

Triple(
    subject=Field("person", encoding=Encoding.SEMANTIC),
    predicate=Field("relationship", encoding=Encoding.EXACT),
    object=Field("target", encoding=Encoding.SEMANTIC),
)

hybi.compose.KnowledgeGraph dataclass

Bases: BaseMolecule

Knowledge graph compound: entity-relation-entity triples.

A convenience wrapper around Triple with sensible defaults for knowledge graph use cases (semantic entities, exact relations).

Expands to

Triple( subject=Field(entity_field, encoding=SEMANTIC), predicate=Field(relation_field, encoding=EXACT), object=Field(entity_field, encoding=SEMANTIC), )

Example

Simple usage - defaults to entity/relation columns

schema = KnowledgeGraph() hb.ingest(facts_df, collection="kg", schema=schema)

Custom field names

schema = KnowledgeGraph( ... entity_field="person", ... relation_field="relationship", ... )

With custom encoding

schema = KnowledgeGraph( ... entity_field="entity", ... relation_field="predicate", ... entity_encoding=Encoding.EXACT, # For IDs instead of text ... )

__init__(entity_field='entity', relation_field='relation', subject_field=None, object_field=None, entity_encoding=Encoding.SEMANTIC, relation_encoding=Encoding.EXACT, entity_weight=1.0, relation_weight=1.0)


Catalog

Pre-configured Bundle for tabular data.

from hybi.compose import Catalog, Field, Encoding

schema = Catalog(
    columns={
        "name": Field(encoding=Encoding.SEMANTIC, weight=1.5),
        "category": Field(encoding=Encoding.EXACT),
        "price": Field(encoding=Encoding.NUMERIC, similar_within=50),
    }
)

hybi.compose.Catalog dataclass

Bases: BaseMolecule

Catalog compound: searchable collection with SQL-like operations.

A convenience wrapper around Bundle optimized for tabular data with a familiar SQL-like query interface. Catalog provides a bridge between traditional relational thinking and hyperdimensional computing.

Expands to

Bundle(fields={ column_name: Field(encoding=..., weight=...), ... })

Unlike pure relational tables, Catalog supports: - Semantic search: Find rows by meaning, not just exact values - Fuzzy matching: Similarity-based lookups with configurable thresholds - Vector joins: Join collections by semantic similarity, not just key equality

Operations Map
Catalog Method HDC Operation
select() Field projection (SelectQuery)
where() Exact filter + similarity search
join() JoinQuery (exact or semantic)
aggregate() AggregateQuery (GROUP BY)
search() Vector similarity search
Example

Define a products catalog

schema = Catalog( ... columns={ ... "name": Field(encoding=Encoding.SEMANTIC, weight=2.0), ... "description": Field(encoding=Encoding.SEMANTIC), ... "category": Field(encoding=Encoding.EXACT), ... "price": Field(encoding=Encoding.NUMERIC, similar_within=50), ... }, ... primary_key="id", ... ) hb.ingest(products_df, collection="products", schema=schema)

Traditional-style query

results = hb.query("products").where(category="electronics")

Semantic query (HDC advantage)

results = hb.query("products").search("lightweight laptop for travel")

Join with another catalog

order_schema = Catalog( ... columns={"product_id": Field(encoding=Encoding.EXACT), ...} ... ) joined = hb.query("orders").join("products", on="product_id")

Aggregation

stats = hb.query("products").aggregate( ... group_by=["category"], ... aggregations={"avg_price": ("price", "avg")} ... )

Notes
  • primary_key is metadata only; HDC doesn't require explicit keys
  • For semantic joins, use Encoding.SEMANTIC on join columns
  • The underlying Bundle uses bundle encoding (lossy but searchable)

__init__(columns=dict(), primary_key=None, catalog_name=None)


RelationalTable

SQL-like table with full CRUD support.

RelationalTable provides familiar relational database semantics with atomic row-level operations. Unlike Catalog (which is optimized for search), RelationalTable uses structured encoding which enables true field-level updates.

Catalog vs RelationalTable

Aspect Catalog RelationalTable
Encoding Search-optimized Structured
Search Fast Moderate
UPDATE/DELETE Not supported Full support
Use case Search catalog Mutable tables

Use Catalog when you primarily search and append data. Use RelationalTable when you need UPDATE/DELETE operations.

from hybi.compose import RelationalTable, Field, Encoding

schema = RelationalTable(
    columns={
        "user_id": Field(encoding=Encoding.EXACT),
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
    primary_key="user_id",
)

CRUD Operations:

# Ingest data
hb.ingest(users_df, collection="users", schema=schema)

# Read by primary key
user = hb.query("users", schema).get(user_id="U001")

# Update fields atomically
hb.update(
    "users",
    where={"user_id": "U001"},
    set={"email": "new@example.com", "salary": 120000},
    schema=schema,
)

# Delete row
hb.delete("users", where={"user_id": "U001"}, schema=schema)

# Upsert (insert or update)
hb.upsert("users", row={"user_id": "U001", ...}, schema=schema)

Equivalent to:

Row(
    primary_key=Field("user_id", encoding=Encoding.EXACT),
    fields={
        "email": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "salary": Field(encoding=Encoding.NUMERIC, similar_within=10000),
    },
)

Search & CRUD Architecture

For optimal performance, use Catalog for search and RelationalTable for CRUD:

flowchart TB
    subgraph Catalog["CATALOG (Search)"]
        C1[Search-optimized<br/>fast]
        C2[Semantic Discovery]
        C1 --> C2
    end

    subgraph RelationalTable["RELATIONAL TABLE (CRUD)"]
        R1[Structured<br/>exact]
        R2[PK Lookups]
        R1 --> R2
    end

    C2 --> Bridge
    R2 --> Bridge
    Bridge[BRIDGE<br/>Primary Keys] --> Mutations[Deterministic Mutations]

Recommended pattern: - Use Catalog for semantic search (optimized for similarity matching) - Use RelationalTable for CRUD (optimized for exact field updates) - Bridge between them using shared primary keys

Single-schema alternative: RelationalTable can handle both search and CRUD, but search performance is slower than dedicated Catalog.

Fuzzy-to-Exact Bridge Pattern

When you need to combine semantic discovery with exact mutations, use the bridge pattern:

  1. Fuzzy search casts a wide net using semantic similarity
  2. Exact filters narrow to deterministic boundaries
  3. CRUD via PKs operates on the refined set
# 1. Semantic search finds candidates
candidates = hb.query("users", schema).search("machine learning expert", top_k=50)

# 2. Exact filtering narrows to deterministic set
refined = [r for r in candidates
           if r.data["department"] == "Engineering"
           and r.data["status"] == "active"]

# 3. CRUD via primary keys (safe - deterministic)
for r in refined:
    hb.update("users", where={"user_id": r.data["user_id"]}, set={...}, schema=schema)

This pattern leverages fuzzy search for discovery ("I don't know the exact term") while ensuring mutations operate on deterministic, exactly-identified rows.

See Fuzzy-to-Exact Pattern for a complete implementation.

hybi.compose.RelationalTable dataclass

Bases: BaseMolecule

SQL-like table with full CRUD support.

RelationalTable provides familiar relational database semantics: - Row-level UPDATE: Modify individual fields - Row-level DELETE: Remove rows by primary key - Field extraction: Read individual field values cleanly - ACID guarantees: Single-row atomicity

Unlike Catalog (which uses lossy Bundle encoding optimized for search), RelationalTable uses Row encoding with chain binding, which is lossless. This enables true field-level updates without re-encoding entire rows.

Trade-offs vs Catalog
Aspect Catalog RelationalTable
Encoding Bundle (lossy) Row (lossless)
Search Fast Moderate
UPDATE/DELETE Not supported Full support
Use case Search catalog Mutable tables

Use RelationalTable when you need UPDATE/DELETE operations. Use Catalog when you primarily search and append data.

Expands to

Row( primary_key=Field(pk_column, encoding=EXACT), fields={...other columns...}, )

Example

Define a users table

schema = RelationalTable( ... columns={ ... "user_id": Field(encoding=Encoding.EXACT), ... "email": Field(encoding=Encoding.EXACT), ... "name": Field(encoding=Encoding.SEMANTIC), ... "salary": Field(encoding=Encoding.NUMERIC), ... }, ... primary_key="user_id", ... ) hb.ingest(users_df, collection="users", schema=schema)

Read by primary key

user = hb.query("users", schema).get(user_id="U001")

Update fields

hb.update( ... "users", ... where={"user_id": "U001"}, ... set={"email": "new@example.com"}, ... schema=schema, ... )

Delete row

hb.delete("users", where={"user_id": "U001"}, schema=schema)

Notes
  • Primary key is required and must use EXACT encoding
  • Primary key cannot be updated (immutable row identity)
  • Updates are atomic at the row level

columns = dataclass_field(default_factory=dict) class-attribute instance-attribute

Column definitions mapping column names to Field configurations.

Must include the primary key column.

Example

columns={ "id": Field(encoding=Encoding.EXACT), "name": Field(encoding=Encoding.SEMANTIC), "email": Field(encoding=Encoding.EXACT), }

primary_key = None class-attribute instance-attribute

Name of the primary key column.

Required. The referenced column must: - Exist in columns - Use EXACT encoding

The primary key provides: - O(1) row lookup via PK index - Row identity for UPDATE/DELETE operations - Uniqueness constraint on ingest

__init__(columns=dict(), primary_key=None, table_name=None)


TimeSeries

Pre-configured molecule for time-ordered data. Supports two modes:

Temporal Mode (with timestamp_field)

When timestamp_field is provided, expands to a Pair enabling temporal queries:

from hybi.compose import TimeSeries

schema = TimeSeries(
    value_field="measurement",
    timestamp_field="recorded_at",  # Enables at_time(), time_range(), when()
)

Supported queries: search, find, at_time, time_range, when

Positional Mode (without timestamp_field)

When timestamp_field is None, expands to a Sequence enabling position-based queries:

schema = TimeSeries(
    value_field="message",
    timestamp_field=None,  # Position-based mode
    position_encoding="random",
    max_length=512,
)

Supported queries: search, at, contains, prefix

See timeseries_demo.py for a complete example using positional mode.

hybi.compose.TimeSeries dataclass

Bases: BaseMolecule

Time series compound: temporal data with timestamp-value binding.

TimeSeries encodes time-indexed data using hyperdimensional temporal binding. When a timestamp_field is provided, each row is encoded as:

timestamp ⊛ value

This enables powerful temporal queries: - at_time(ts): Find values at/near a specific timestamp - time_range(start, end): Find values within a time window - when(value): Find timestamps when a value occurred

Expands to (when timestamp_field provided): Pair( left=Field(timestamp_field, encoding=TEMPORAL), right=Field(value_field, encoding=value_encoding), )

Expands to (when timestamp_field is None - legacy mode): Sequence( item=Field(value_field, encoding=value_encoding), position_encoding="sinusoidal", max_length=max_length, )

Example

schema = TimeSeries( ... value_field="temperature", ... timestamp_field="recorded_at", ... value_encoding=Encoding.NUMERIC, ... ) hb.ingest(sensor_df, collection="readings", schema=schema)

Query: What was the temperature at 2pm?

results = hb.query("readings").at_time("2024-01-15 14:00:00")

Query: Temperatures between 1pm and 3pm

results = hb.query("readings").time_range( ... start="2024-01-15 13:00:00", ... end="2024-01-15 15:00:00", ... )

Legacy mode: without timestamp (uses row position)

schema = TimeSeries(value_field="price") # timestamp_field=None

Note: This mode only supports positional queries, not temporal

__init__(value_field='value', timestamp_field=None, value_encoding=Encoding.SEMANTIC, value_weight=1.0, timestamp_weight=1.0, position_encoding='sinusoidal', max_length=512)


Hierarchy

Pre-configured Tree for parent-child relationships.

from hybi.compose import Hierarchy

schema = Hierarchy(
    node_field="employee",
    parent_field="manager",
)

hybi.compose.Hierarchy dataclass

Bases: BaseMolecule

Hierarchy compound: parent-child organizational structures.

A convenience wrapper around Tree optimized for hierarchical data like org charts, file systems, taxonomies, or nested categories.

Expands to

Tree( child=Field(node_field, encoding=node_encoding), parent=Field(parent_field, encoding=node_encoding), level=Field(level_field) if level_field else None, )

Example

Org chart

schema = Hierarchy( ... node_field="employee", ... parent_field="manager", ... ) hb.ingest(org_df, collection="org", schema=schema)

File system with depth tracking

schema = Hierarchy( ... node_field="path", ... parent_field="parent_path", ... level_field="depth", ... )

Taxonomy with exact matching

schema = Hierarchy( ... node_field="category", ... parent_field="parent_category", ... node_encoding=Encoding.EXACT, ... )

__init__(node_field='node', parent_field='parent', level_field=None, node_encoding=Encoding.SEMANTIC, node_weight=1.0)


Document

Use this compound when you have long-form content and need to search inside it (not just retrieve whole documents), navigate its structure (e.g. "every paragraph under chapter 3"), and pinpoint exact sections (e.g. /ch3/sec2/p4) — all against the same collection.

Alternatives fall short in different ways: collapsing each document to one vector loses chunk identity; a flat chunk store loses the structure the chunker gave you; a plain tree loses sibling order. Document keeps all of it — every chunk is a first-class row with a path, a parent, and a position.

A pluggable Chunker decides how the source splits — paragraphs, Markdown headings, or your own strategy. Swapping chunkers changes only row contents, so the same queries work across formats.

Three ways to address a chunk

  • Semantic — find chunks whose content matches a query.
  • Structural — walk the tree (ancestors / descendants / siblings) without a separate index.
  • Path — O(1) lookup of the exact chunk at /ch1/sec2/p3.

Defining a Document schema

from hybi.compose import Document, Field, Encoding
from hybi.compose.chunkers import MarkdownChunker

schema = Document(
    content_field="body",
    chunker=MarkdownChunker(),
    metadata_fields={
        "author": Field(encoding=Encoding.EXACT),
        "published": Field(encoding=Encoding.TEMPORAL),
    },
)

Ingest shape

The ingest DataFrame is document-level (one row per source document) and must include a document_id column plus the configured content_field. The chunker expands each row into per-chunk Rows at ingest time.

import pandas as pd

df = pd.DataFrame([
    {"document_id": "doc-1", "body": "# Intro\n\nFirst paragraph.\n\n## Details\n\nMore.",
     "author": "Alice", "published": "2024-01-15"},
    # ...
])

hb.ingest(df, collection="articles", schema=schema)

Reserved structural fields written into every chunk Row: chunk_id (primary key), document_id, path, parent_id, sibling_index, depth. User-supplied metadata_fields cannot collide with these.

Querying

Use Document.attach to get a client/collection-bound view:

pdoc = schema.attach(hb, "articles")

# Rank whole documents
docs = pdoc.search_documents("attention mechanisms", top_k=5)

# Rank individual chunks (composable ChunkHandles)
handles = pdoc.find_and_bind("async Python", top_k=3).materialize()

# Structural navigation
sec = pdoc.descendants("/intro")        # every chunk under /intro
siblings = pdoc.siblings(parent_id)     # peers under the same parent
chunk = pdoc.at("/intro/p0", document_id="doc-1")  # direct lookup

# Subtree as a first-class algebraic citizen
subtree = pdoc.subtree("/intro", document_id="doc-1")
related = subtree.intersect(hb.query("concepts"))  # cross-compound join

Use search_documents when the answer is "which document?" and find_and_bind when the answer is "which passage?". find_and_bind excludes root chunks by default; pass include_root=True for a mixed candidate set.

When multiple documents share a path (e.g. every doc has an /intro), at() and subtree() require document_id= to disambiguate. Unscoped calls on ambiguous paths raise ValueError.

structural_index="tree" opts into the Rust-accelerated descendant walk (BFS over the chunk namespace's parent_id index); the default "path" backend emits path LIKE /root/% filters and works out of the box.

Document's tree-navigation methods (descendants / ancestors / siblings / subtree) delegate to an internal OrderedTree. The structural_index kwarg forwards to it; you can also construct an OrderedTree directly for the same navigation surface without the chunker / Row-schema apparatus.

hybi.compose.Document dataclass

Bases: BaseMolecule

Document compound: a queryable subtree of first-class chunks.

Each source document is decomposed by a pluggable Chunker into N chunks; every chunk becomes a lossless Row with three addressing modes:

  • Semantic: search hits the SEMANTIC content field, returning individual chunks (not whole documents).
  • Structural: per-row parent_id, sibling_index, and depth carry the tree shape, so ancestors / descendants / siblings navigate without a separate Tree index in v1.
  • Path: the EXACT path field (e.g. /ch1/sec2/p3) gives O(1) direct lookup.

The compound is fixed — swapping Chunker strategies changes only row contents, not the wire schema, so a FlatChunker and a MarkdownChunker ingest into the same shape.

Expands to

Row( primary_key=Field("chunk_id", EXACT), fields={ "document_id": Field(EXACT), "path": Field(EXACT), "parent_id": Field(EXACT), "sibling_index": Field(NUMERIC), "depth": Field(NUMERIC), content_field: Field(SEMANTIC, weight=content_weight), **metadata_fields, }, )

Example

Phase 1+ will wire a Chunker in; Phase 0 exposes the schema shape.

schema = Document( ... content_field="body", ... metadata_fields={"section_title": Field(encoding=Encoding.EXACT)}, ... )

__init__(content_field='content', content_encoding=Encoding.SEMANTIC, content_weight=1.0, metadata_fields=None, chunker=None, structural_index='path')

attach(client, collection)

Return a client/collection-bound view of this Document.

The bound view mirrors every Document method that needs a client and collection (subtree, find_and_bind, descendants, ancestors, siblings, at) without the _client=/ _collection= keyword noise. Prefer this for application code; the raw Document methods remain available as an escape hatch.

Example

pdoc = doc.attach(hb, "papers") pdoc.subtree("/abstract", document_id="1706.03762").materialize() pdoc.rollup("/abstract", document_id="1706.03762") # handle-free shortcut pdoc.find_and_bind("transformers", top_k=3) pdoc.descendants("/")

find_and_bind(query, *, top_k, include_root=False, _client=None, _collection=None)

Semantic search within the document, returning a handle-shaped spec that can be composed with other collections or atoms.

top_k is required positional-or-keyword-only with no default: the choice between "best match" and "candidate set" is consequential and should be explicit at the call site.

By default roots (path == "/") are excluded from the search because their content_vec carries the per-document subtree rollup — appropriate for search_documents, not chunk-level retrieval. Pass include_root=True to opt back in when you want a mixed chunk + document-level candidate set.

The returned object is lazy — it does no server I/O until the caller invokes .materialize() or a method that requires concrete values (e.g. .intersect). For top_k == 1 the spec resolves to a single ChunkHandle on materialization; for top_k > 1 it resolves to a list of ChunkHandles.

subtree(path, *, document_id=None, union=False, _client=None, _collection=None)

Return a virtual ChunkHandle whose content_vec bundles the chunk at path plus every descendant.

Per-document by default: an ambiguous path (multiple documents) raises. Pass document_id=<id> to scope, or union=True to opt into bundling across matching documents.

Delegates to the internal :class:OrderedTree.

descendants(query, root_path, top_k=100)

Return every chunk whose path sits strictly under root_path.

ancestors(query, path, top_k=100)

Return chunks at every ancestor path of path, parent-first.

siblings(query, parent_id, top_k=100)

Return every chunk sharing parent_id. Caller sorts by sibling_index for ordered traversal.

at(path, *, document_id=None, _client=None, _collection=None)

O(1) direct lookup by path.

Returns the chunk Row at path (or None if no chunk matches). EXACT encoding on the path field makes this a primary-key-style hit.

Per-document by default: if multiple documents in the collection share the same path (e.g. every Markdown doc has an /intro section), pass document_id to disambiguate. Omitting it when the path is ambiguous raises ValueError rather than returning a nondeterministic row — silent cross-document leakage is the kind of bug that only surfaces once the corpus grows past toy size.

reingest(client, collection, source_df)

Replace every chunk belonging to the given document_id(s).

For each unique document_id in source_df, looks up that document's existing chunks by document_id and deletes them by primary key, then ingests the new rows. Leaves other documents in the collection untouched.

Fail-before-delete. The new DataFrame is run through expand_dataframe (chunker + orphan check + required-column validation) BEFORE any deletion happens. If the new data is malformed — missing columns, chunker bugs, duplicate paths — reingest raises without touching storage. This closes the "deleted old, new ingest failed, collection in half-state" class of failure for the common malformed-input case.

Ingest-phase failures that surface AFTER the pre-validation (e.g. transient storage errors) still leave the collection in a half-state; the docstring for those conditions remains best-effort. Per-row delete errors are swallowed so a row already removed by a concurrent caller doesn't block the rest.

Raises:

Type Description
SchemaError

if source_df lacks the document_id column or the chunker produces malformed output.

PK-based deletion avoids row-id-counter issues that a bulk delete-by-filter can induce when mixed with re-ingest.


BoundDocument

A Document bound to a specific (client, collection) pair, produced by Document.attach. Mirrors every Document method that needs a client and collection (subtree, find_and_bind, descendants, etc.) without the _client= / _collection= keyword noise.

hybi.compose.BoundDocument

A Document bound to a specific (client, collection) pair.

Produced by :meth:Document.attach. Exposes every structural / rollup / retrieval operation on Document with client/collection implicit, so call sites read as pdoc.subtree("/abstract", ...) instead of schema.subtree("/abstract", ..., _client=hb, _collection="papers").

The Document schema itself stays stateless; BoundDocument is a thin adapter. Safe to construct, discard, and re-bind the same Document to different collections.

subtree(path, *, document_id=None, union=False)

Return a virtual ChunkHandle for the subtree rooted at path. See :meth:Document.subtree for scoping semantics.

rollup(path, *, document_id=None, union=False)

Shortcut: return the subtree's rollup vector directly (np.ndarray).

Equivalent to self.subtree(path, ...).materialize(); provided because the handle is immediately materialized in most call sites.

find_and_bind(query_text, *, top_k, include_root=False)

Semantic search that returns composable ChunkHandle objects. Excludes root chunks by default — those carry document-level rollups and belong to :meth:search_documents. Pass include_root=True to search across chunks and root rollups together. See :meth:Document.find_and_bind.

search_documents(query_text, *, top_k)

Document-level semantic search: rank whole documents by the similarity of their persisted subtree rollup to query_text.

Implementation: a path-scoped semantic search over root chunks (path == "/"), whose content_vec is the per-document subtree rollup written by :meth:Document._finalize_rollups at ingest time. Works out of the box for Documents ingested through HyperBinder.ingest on a client that exposes set_row_atom; falls back to empty results when rollups haven't been persisted.

top_k is required (no default): document-scale search is expensive to paginate and the choice between "one best match" and "candidate set" should be explicit.

descendants(root_path, top_k=100)

Every chunk whose path sits strictly under root_path.

ancestors(path, top_k=100)

Every chunk on the path from root to path, parent-first.

siblings(parent_id, top_k=100)

Every chunk sharing parent_id.

at(path, *, document_id=None)

O(1) direct lookup of the chunk at path. Pass document_id when the path exists in more than one document. See :meth:Document.at.


Network

Pre-configured Graph for network data.

from hybi.compose import Network

schema = Network(
    node_field="user",
    edge_field="interaction",
    directed=True,
    # Optional: use separate columns for source/target nodes
    # source_field="from_user",
    # target_field="to_user",
)

hybi.compose.Network dataclass

Bases: BaseMolecule

Network compound: node-edge-node graph structures.

A convenience wrapper around Graph optimized for social networks, citation graphs, dependency graphs, and other network structures.

Expands to

Graph( node=Field(node_field, encoding=node_encoding), edge=Field(edge_field, encoding=edge_encoding), directed=directed, )

Example

Social network

schema = Network( ... node_field="user", ... edge_field="connection_type", ... ) hb.ingest(social_df, collection="social", schema=schema)

Citation network (undirected)

schema = Network( ... node_field="paper_id", ... edge_field="citation_type", ... node_encoding=Encoding.EXACT, ... directed=False, ... )

Dependency graph

schema = Network( ... node_field="package", ... edge_field="dependency_type", ... source_field="dependent", ... target_field="dependency", ... )

__init__(node_field='node', edge_field='edge', source_field=None, target_field=None, node_encoding=Encoding.SEMANTIC, edge_encoding=Encoding.EXACT, node_weight=1.0, edge_weight=1.0, directed=True)


When to Use Compounds vs Molecules

Use compounds when:

  • Your data fits a common pattern
  • You want sensible encoding defaults
  • You're prototyping quickly

Use molecules when:

  • You need custom encodings per field
  • You're nesting structures
  • You need fine-grained control over weights

See Molecules vs Compounds for detailed guidance.


Example Code

Complete runnable examples for each compound type:

Compound Example File Description
KnowledgeGraph knowledge_graph_demo.py Entity-relation-entity facts with traversal
Document document_demo.py Chunked documents with tree-shaped navigation
Document (arXiv) document_arxiv_demo.py Real arXiv papers + semantic search + cross-compound hyperedges
Hierarchy hierarchy_demo.py Org charts and taxonomies
TimeSeries timeseries_demo.py Time-ordered data
Network network_demo.py Social graphs and citations
Catalog product_catalog_demo.py Product catalogs with search
RelationalTable fuzzy_to_exact_demo.py CRUD with fuzzy-to-exact pattern

Run any example from the SDK directory:

cd sdk
python examples/compose/knowledge_graph_demo.py

See Examples README for the full example index.