Collection API¶

Fluent API for collection-focused operations.

Collection¶

The Collection class provides a fluent interface for working with a specific collection.

employees = hb.collection("employees")

# Check status
print(employees.stats())

# Query methods
results = employees.search("senior engineers")

`hybi.Collection` ¶

Fluent API for collection operations.

Example

customers = hb.collection("customers") customers.ingest("data.csv") results = customers.search("enterprise AI")

`init(client, name)` ¶

`stats()` ¶

Get detailed statistics about this collection.

Returns comprehensive information including row count, columns, vector configuration (dimension, seed), and metadata.

Example

stats = hb.collection("customers").stats() print(stats) # "customers: 1,000 rows, 5 columns (structured)" print(f"Dimension: {stats.dimension}, Seed: {stats.seed}")

`count()` ¶

Get number of rows in collection.

`exists()` ¶

Check if collection exists.

Returns:

Type	Description
`bool`	True if collection exists, False if not found.

Raises:

Type	Description
`ConnectionError`	If server is unreachable.

Note

Unlike returning False for connection errors, this method raises ConnectionError so you can distinguish between "collection doesn't exist" and "server unreachable".

`delete()` ¶

Delete this collection.

Raises:

Type	Description
`HyperBinderError`	If deletion fails.

`ingest(source, **kwargs)` ¶

Ingest data into this collection.

`search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None)` ¶

Universal search on this collection.

Automatically detects collection type and uses the best search method, or use mode to explicitly choose.

Parameters:

Name	Type	Description	Default
`query`	`Union[str, Dict[str, Any]]`	Search query (string for text search, dict for field matching)	required
`top_k`	`Optional[int]`	Number of results to return	`None`
`mode`	`Optional[str]`	Search mode - "auto" (default), "structured", "semantic", or "hybrid"	`None`
`filters`	`Optional[List[tuple]]`	Hard filters for structured mode	`None`
`role`	`Optional[str]`	Filter by document role (semantic mode)	`None`
`slot_filters`	`Optional[Dict[str, Any]]`	Slot value filters (hybrid mode)	`None`

Examples:

Auto-detect (recommended)¶

results = collection.search("machine learning")

Explicit mode¶

results = collection.search("AI papers", mode="semantic")

`select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)` ¶

SQL-like SELECT on this collection.

`aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)` ¶

SQL-like AGGREGATE on this collection.

`join(right, on, join_type='inner', columns=None, limit=None)` ¶

JOIN this collection with another.

`multihop(start=None, hops=None, top_k=None)` ¶

Multi-hop reasoning on this collection.

`get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True)` ¶

Get LLM-ready context from this collection.

Automatically routes to the appropriate search: - String query + document collection → semantic search - Dict query → structured field search

Parameters:

Name	Type	Description	Default
`query`	`Union[str, Dict[str, Any]]`	Question or query (string for text, dict for structured)	required
`max_chunks`	`int`	Maximum chunks to retrieve	`5`
`max_tokens`	`int`	Approximate maximum tokens	`2000`
`auto_detect`	`bool`	If True, detect collection type and use appropriate search	`True`

`ask(question, top_k=5, role_filter=None)` ¶

End-to-end RAG on this collection.

`query(schema=None)` ¶

Get a schema-aware query builder for this collection.

ComposeQuery provides a fluent interface for building queries that leverage the collection's schema (if available) for type-safe, slot-based operations.

If no schema is provided, attempts to load the schema from the collection's stored metadata (set during ingest).

Parameters:

Name	Type	Description	Default
`schema`	`Optional[BaseMolecule]`	Optional molecule schema for validation. If not provided, will try to load from collection stats.	`None`

Returns:

Type	Description
`ComposeQuery`	ComposeQuery builder for this collection

Example

Auto-loads schema if collection was ingested with one¶

q = hb.collection("facts").query() results = q.search("enterprise software")

Explicit schema for type-safe slot access¶

from hyperbinder.compose import Triple, Field schema = Triple(subject=Field("entity"), ...) results = hb.collection("facts").query(schema).find(subject="Alice")

`get_schema()` ¶

Get the Compose schema stored with this collection.

Returns the schema that was passed during ingest, deserialized back into a molecule object (Triple, Record, etc.).

Returns:

Type	Description
`Optional[BaseMolecule]`	BaseMolecule instance or None if no schema was stored.

Example

schema = hb.collection("facts").get_schema() if schema: print(f"Schema: {schema.molecule_type}") print(f"Slots: {schema.slots()}")

AsyncCollection¶

Async version of the Collection API.

`hybi.AsyncCollection` ¶

Async fluent API for collection operations.

Example

async with AsyncHyperBinder() as hb: customers = hb.collection("customers") await customers.ingest("data.csv") results = await customers.search("enterprise AI")

`info()` `async` ¶

Get collection information.

`stats()` `async` ¶

Get detailed statistics about this collection.

Returns comprehensive information including row count, columns, vector configuration (dimension, seed), and metadata.

Example

stats = await hb.collection("customers").stats() print(stats) # "customers: 1,000 rows, 5 columns (structured)" print(f"Dimension: {stats.dimension}, Seed: {stats.seed}")

`get_schema()` `async` ¶

Get the Compose schema stored with this collection.

Returns the schema that was passed during ingest, deserialized back into a molecule object (Triple, Record, etc.).

Returns:

Type	Description
`Optional[BaseMolecule]`	BaseMolecule instance or None if no schema was stored.

Example

schema = await hb.collection("facts").get_schema() if schema: print(f"Schema: {schema.molecule_type}") print(f"Slots: {schema.slots()}")

`count()` `async` ¶

Get number of rows in collection.

`exists()` `async` ¶

Check if collection exists.

Returns:

Type	Description
`bool`	True if collection exists, False if not found.

Raises:

Type	Description
`ConnectionError`	If server is unreachable.

Note

Unlike returning False for connection errors, this method raises ConnectionError so you can distinguish between "collection doesn't exist" and "server unreachable".

`delete()` `async` ¶

Delete this collection.

Raises:

Type	Description
`HyperBinderError`	If deletion fails.

`ingest(source, **kwargs)` `async` ¶

Ingest data into this collection.

`search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None)` `async` ¶

Universal async search on this collection.

Automatically detects collection type and uses the best search method, or use mode to explicitly choose.

Parameters:

Name	Type	Description	Default
`query`	`Union[str, Dict[str, Any]]`	Search query (string for text search, dict for field matching)	required
`top_k`	`Optional[int]`	Number of results to return	`None`
`mode`	`Optional[str]`	Search mode - "auto" (default), "structured", "semantic", or "hybrid"	`None`
`filters`	`Optional[List[tuple]]`	Hard filters for structured mode	`None`
`role`	`Optional[str]`	Filter by document role (semantic mode)	`None`
`slot_filters`	`Optional[Dict[str, Any]]`	Slot value filters (hybrid mode)	`None`

`select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)` `async` ¶

SQL-like SELECT on this collection.

`aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)` `async` ¶

SQL-like AGGREGATE on this collection.

`multihop(start=None, hops=None, top_k=None)` `async` ¶

Multi-hop reasoning on this collection.

`join(right, on, join_type='inner', columns=None, limit=None)` `async` ¶

JOIN this collection with another.

`get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True)` `async` ¶

Get LLM-ready context from this collection.

Automatically routes to the appropriate search: - String query + document collection → semantic search - Dict query → structured field search

Parameters:

Name	Type	Description	Default
`query`	`Union[str, Dict[str, Any]]`	Question or query (string for text, dict for structured)	required
`max_chunks`	`int`	Maximum chunks to retrieve	`5`
`max_tokens`	`int`	Approximate maximum tokens	`2000`
`auto_detect`	`bool`	If True, detect collection type and use appropriate search	`True`

`ask(question, top_k=5, role_filter=None)` `async` ¶

End-to-end RAG on this collection.

`query(schema=None)` `async` ¶

Get a schema-aware async query builder for this collection.

AsyncComposeQuery provides a fluent interface for building queries that leverage the collection's schema (if available) for type-safe, slot-based operations.

If no schema is provided, attempts to load the schema from the collection's stored metadata (set during ingest).

Parameters:

Name	Type	Description	Default
`schema`	`Optional[BaseMolecule]`	Optional molecule schema for validation. If not provided, will try to load from collection stats.	`None`

Returns:

Type	Description
`AsyncComposeQuery`	AsyncComposeQuery builder for this collection

Example

Auto-loads schema if collection was ingested with one¶

q = await hb.collection("facts").query() results = await q.search("enterprise software")

Explicit schema for type-safe slot access¶

from hyperbinder.compose import Triple, Field schema = Triple(subject=Field("entity"), ...) q = await hb.collection("facts").query(schema) results = await q.find(subject="Alice")

Collection API¶

Collection¶

hybi.Collection ¶

__init__(client, name) ¶

stats() ¶

count() ¶

exists() ¶

delete() ¶

ingest(source, **kwargs) ¶

search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None) ¶

Auto-detect (recommended)¶

Explicit mode¶

select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False) ¶

aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None) ¶

join(right, on, join_type='inner', columns=None, limit=None) ¶

multihop(start=None, hops=None, top_k=None) ¶

get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True) ¶

ask(question, top_k=5, role_filter=None) ¶

query(schema=None) ¶

Auto-loads schema if collection was ingested with one¶

Explicit schema for type-safe slot access¶

get_schema() ¶

AsyncCollection¶

hybi.AsyncCollection ¶

info() async ¶

stats() async ¶

get_schema() async ¶

count() async ¶

exists() async ¶

delete() async ¶

ingest(source, **kwargs) async ¶

search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None) async ¶

select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False) async ¶

aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None) async ¶

multihop(start=None, hops=None, top_k=None) async ¶

join(right, on, join_type='inner', columns=None, limit=None) async ¶

get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True) async ¶

ask(question, top_k=5, role_filter=None) async ¶

query(schema=None) async ¶

Auto-loads schema if collection was ingested with one¶

Explicit schema for type-safe slot access¶

`hybi.Collection` ¶

`init(client, name)` ¶

`stats()` ¶

`count()` ¶

`exists()` ¶

`delete()` ¶

`ingest(source, **kwargs)` ¶

`search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None)` ¶

`select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)` ¶

`aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)` ¶

`join(right, on, join_type='inner', columns=None, limit=None)` ¶

`multihop(start=None, hops=None, top_k=None)` ¶

`get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True)` ¶

`ask(question, top_k=5, role_filter=None)` ¶

`query(schema=None)` ¶

`get_schema()` ¶

`hybi.AsyncCollection` ¶

`info()` `async` ¶

`stats()` `async` ¶

`get_schema()` `async` ¶

`count()` `async` ¶

`exists()` `async` ¶

`delete()` `async` ¶

`ingest(source, **kwargs)` `async` ¶

`search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None)` `async` ¶

`select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)` `async` ¶

`aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)` `async` ¶

`multihop(start=None, hops=None, top_k=None)` `async` ¶

`join(right, on, join_type='inner', columns=None, limit=None)` `async` ¶

`get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True)` `async` ¶

`ask(question, top_k=5, role_filter=None)` `async` ¶

`query(schema=None)` `async` ¶