Skip to content

Collection API

Fluent API for collection-focused operations.

Collection

The Collection class provides a fluent interface for working with a specific collection.

employees = hb.collection("employees")

# Check status
print(employees.stats())

# Query methods
results = employees.search("senior engineers")

hybi.Collection

Fluent API for collection operations.

Example

customers = hb.collection("customers") customers.ingest("data.csv") results = customers.search("enterprise AI")

__init__(client, name)

stats()

Get detailed statistics about this collection.

Returns comprehensive information including row count, columns, vector configuration (dimension, seed), and metadata.

Example

stats = hb.collection("customers").stats() print(stats) # "customers: 1,000 rows, 5 columns (structured)" print(f"Dimension: {stats.dimension}, Seed: {stats.seed}")

count()

Get number of rows in collection.

exists()

Check if collection exists.

Returns:

Type Description
bool

True if collection exists, False if not found.

Raises:

Type Description
ConnectionError

If server is unreachable.

Note

Unlike returning False for connection errors, this method raises ConnectionError so you can distinguish between "collection doesn't exist" and "server unreachable".

delete()

Delete this collection.

Raises:

Type Description
HyperBinderError

If deletion fails.

ingest(source, **kwargs)

Ingest data into this collection.

search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None)

Universal search on this collection.

Automatically detects collection type and uses the best search method, or use mode to explicitly choose.

Parameters:

Name Type Description Default
query Union[str, Dict[str, Any]]

Search query (string for text search, dict for field matching)

required
top_k Optional[int]

Number of results to return

None
mode Optional[str]

Search mode - "auto" (default), "structured", "semantic", or "hybrid"

None
filters Optional[List[tuple]]

Hard filters for structured mode

None
role Optional[str]

Filter by document role (semantic mode)

None
slot_filters Optional[Dict[str, Any]]

Slot value filters (hybrid mode)

None

Examples:

results = collection.search("machine learning")

Explicit mode

results = collection.search("AI papers", mode="semantic")

select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)

SQL-like SELECT on this collection.

aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)

SQL-like AGGREGATE on this collection.

join(right, on, join_type='inner', columns=None, limit=None)

JOIN this collection with another.

multihop(start=None, hops=None, top_k=None)

Multi-hop reasoning on this collection.

get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True)

Get LLM-ready context from this collection.

Automatically routes to the appropriate search: - String query + document collection → semantic search - Dict query → structured field search

Parameters:

Name Type Description Default
query Union[str, Dict[str, Any]]

Question or query (string for text, dict for structured)

required
max_chunks int

Maximum chunks to retrieve

5
max_tokens int

Approximate maximum tokens

2000
auto_detect bool

If True, detect collection type and use appropriate search

True

ask(question, top_k=5, role_filter=None)

End-to-end RAG on this collection.

query(schema=None)

Get a schema-aware query builder for this collection.

ComposeQuery provides a fluent interface for building queries that leverage the collection's schema (if available) for type-safe, slot-based operations.

If no schema is provided, attempts to load the schema from the collection's stored metadata (set during ingest).

Parameters:

Name Type Description Default
schema Optional[BaseMolecule]

Optional molecule schema for validation. If not provided, will try to load from collection stats.

None

Returns:

Type Description
ComposeQuery

ComposeQuery builder for this collection

Example

Auto-loads schema if collection was ingested with one

q = hb.collection("facts").query() results = q.search("enterprise software")

Explicit schema for type-safe slot access

from hyperbinder.compose import Triple, Field schema = Triple(subject=Field("entity"), ...) results = hb.collection("facts").query(schema).find(subject="Alice")

get_schema()

Get the Compose schema stored with this collection.

Returns the schema that was passed during ingest, deserialized back into a molecule object (Triple, Record, etc.).

Returns:

Type Description
Optional[BaseMolecule]

BaseMolecule instance or None if no schema was stored.

Example

schema = hb.collection("facts").get_schema() if schema: print(f"Schema: {schema.molecule_type}") print(f"Slots: {schema.slots()}")

AsyncCollection

Async version of the Collection API.

hybi.AsyncCollection

Async fluent API for collection operations.

Example

async with AsyncHyperBinder() as hb: customers = hb.collection("customers") await customers.ingest("data.csv") results = await customers.search("enterprise AI")

info() async

Get collection information.

stats() async

Get detailed statistics about this collection.

Returns comprehensive information including row count, columns, vector configuration (dimension, seed), and metadata.

Example

stats = await hb.collection("customers").stats() print(stats) # "customers: 1,000 rows, 5 columns (structured)" print(f"Dimension: {stats.dimension}, Seed: {stats.seed}")

get_schema() async

Get the Compose schema stored with this collection.

Returns the schema that was passed during ingest, deserialized back into a molecule object (Triple, Record, etc.).

Returns:

Type Description
Optional[BaseMolecule]

BaseMolecule instance or None if no schema was stored.

Example

schema = await hb.collection("facts").get_schema() if schema: print(f"Schema: {schema.molecule_type}") print(f"Slots: {schema.slots()}")

count() async

Get number of rows in collection.

exists() async

Check if collection exists.

Returns:

Type Description
bool

True if collection exists, False if not found.

Raises:

Type Description
ConnectionError

If server is unreachable.

Note

Unlike returning False for connection errors, this method raises ConnectionError so you can distinguish between "collection doesn't exist" and "server unreachable".

delete() async

Delete this collection.

Raises:

Type Description
HyperBinderError

If deletion fails.

ingest(source, **kwargs) async

Ingest data into this collection.

search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None) async

Universal async search on this collection.

Automatically detects collection type and uses the best search method, or use mode to explicitly choose.

Parameters:

Name Type Description Default
query Union[str, Dict[str, Any]]

Search query (string for text search, dict for field matching)

required
top_k Optional[int]

Number of results to return

None
mode Optional[str]

Search mode - "auto" (default), "structured", "semantic", or "hybrid"

None
filters Optional[List[tuple]]

Hard filters for structured mode

None
role Optional[str]

Filter by document role (semantic mode)

None
slot_filters Optional[Dict[str, Any]]

Slot value filters (hybrid mode)

None

select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False) async

SQL-like SELECT on this collection.

aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None) async

SQL-like AGGREGATE on this collection.

multihop(start=None, hops=None, top_k=None) async

Multi-hop reasoning on this collection.

join(right, on, join_type='inner', columns=None, limit=None) async

JOIN this collection with another.

get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True) async

Get LLM-ready context from this collection.

Automatically routes to the appropriate search: - String query + document collection → semantic search - Dict query → structured field search

Parameters:

Name Type Description Default
query Union[str, Dict[str, Any]]

Question or query (string for text, dict for structured)

required
max_chunks int

Maximum chunks to retrieve

5
max_tokens int

Approximate maximum tokens

2000
auto_detect bool

If True, detect collection type and use appropriate search

True

ask(question, top_k=5, role_filter=None) async

End-to-end RAG on this collection.

query(schema=None) async

Get a schema-aware async query builder for this collection.

AsyncComposeQuery provides a fluent interface for building queries that leverage the collection's schema (if available) for type-safe, slot-based operations.

If no schema is provided, attempts to load the schema from the collection's stored metadata (set during ingest).

Parameters:

Name Type Description Default
schema Optional[BaseMolecule]

Optional molecule schema for validation. If not provided, will try to load from collection stats.

None

Returns:

Type Description
AsyncComposeQuery

AsyncComposeQuery builder for this collection

Example

Auto-loads schema if collection was ingested with one

q = await hb.collection("facts").query() results = await q.search("enterprise software")

Explicit schema for type-safe slot access

from hyperbinder.compose import Triple, Field schema = Triple(subject=Field("entity"), ...) q = await hb.collection("facts").query(schema) results = await q.find(subject="Alice")