Collection API¶
Fluent API for collection-focused operations.
Collection¶
The Collection class provides a fluent interface for working with a specific collection.
employees = hb.collection("employees")
# Check status
print(employees.stats())
# Query methods
results = employees.search("senior engineers")
hybi.Collection
¶
Fluent API for collection operations.
Example
customers = hb.collection("customers") customers.ingest("data.csv") results = customers.search("enterprise AI")
__init__(client, name)
¶
stats()
¶
Get detailed statistics about this collection.
Returns comprehensive information including row count, columns, vector configuration (dimension, seed), and metadata.
Example
stats = hb.collection("customers").stats() print(stats) # "customers: 1,000 rows, 5 columns (structured)" print(f"Dimension: {stats.dimension}, Seed: {stats.seed}")
count()
¶
Get number of rows in collection.
exists()
¶
Check if collection exists.
Returns:
| Type | Description |
|---|---|
bool
|
True if collection exists, False if not found. |
Raises:
| Type | Description |
|---|---|
ConnectionError
|
If server is unreachable. |
Note
Unlike returning False for connection errors, this method raises ConnectionError so you can distinguish between "collection doesn't exist" and "server unreachable".
delete()
¶
ingest(source, **kwargs)
¶
Ingest data into this collection.
search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None)
¶
Universal search on this collection.
Automatically detects collection type and uses the best search method,
or use mode to explicitly choose.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Union[str, Dict[str, Any]]
|
Search query (string for text search, dict for field matching) |
required |
top_k
|
Optional[int]
|
Number of results to return |
None
|
mode
|
Optional[str]
|
Search mode - "auto" (default), "structured", "semantic", or "hybrid" |
None
|
filters
|
Optional[List[tuple]]
|
Hard filters for structured mode |
None
|
role
|
Optional[str]
|
Filter by document role (semantic mode) |
None
|
slot_filters
|
Optional[Dict[str, Any]]
|
Slot value filters (hybrid mode) |
None
|
Examples:
Auto-detect (recommended)¶
results = collection.search("machine learning")
Explicit mode¶
results = collection.search("AI papers", mode="semantic")
select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)
¶
SQL-like SELECT on this collection.
aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)
¶
SQL-like AGGREGATE on this collection.
join(right, on, join_type='inner', columns=None, limit=None)
¶
JOIN this collection with another.
multihop(start=None, hops=None, top_k=None)
¶
Multi-hop reasoning on this collection.
get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True)
¶
Get LLM-ready context from this collection.
Automatically routes to the appropriate search: - String query + document collection → semantic search - Dict query → structured field search
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Union[str, Dict[str, Any]]
|
Question or query (string for text, dict for structured) |
required |
max_chunks
|
int
|
Maximum chunks to retrieve |
5
|
max_tokens
|
int
|
Approximate maximum tokens |
2000
|
auto_detect
|
bool
|
If True, detect collection type and use appropriate search |
True
|
ask(question, top_k=5, role_filter=None)
¶
End-to-end RAG on this collection.
query(schema=None)
¶
Get a schema-aware query builder for this collection.
ComposeQuery provides a fluent interface for building queries that leverage the collection's schema (if available) for type-safe, slot-based operations.
If no schema is provided, attempts to load the schema from the collection's stored metadata (set during ingest).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
Optional[BaseMolecule]
|
Optional molecule schema for validation. If not provided, will try to load from collection stats. |
None
|
Returns:
| Type | Description |
|---|---|
ComposeQuery
|
ComposeQuery builder for this collection |
Example
Auto-loads schema if collection was ingested with one¶
q = hb.collection("facts").query() results = q.search("enterprise software")
Explicit schema for type-safe slot access¶
from hyperbinder.compose import Triple, Field schema = Triple(subject=Field("entity"), ...) results = hb.collection("facts").query(schema).find(subject="Alice")
get_schema()
¶
Get the Compose schema stored with this collection.
Returns the schema that was passed during ingest, deserialized back into a molecule object (Triple, Record, etc.).
Returns:
| Type | Description |
|---|---|
Optional[BaseMolecule]
|
BaseMolecule instance or None if no schema was stored. |
Example
schema = hb.collection("facts").get_schema() if schema: print(f"Schema: {schema.molecule_type}") print(f"Slots: {schema.slots()}")
AsyncCollection¶
Async version of the Collection API.
hybi.AsyncCollection
¶
Async fluent API for collection operations.
Example
async with AsyncHyperBinder() as hb: customers = hb.collection("customers") await customers.ingest("data.csv") results = await customers.search("enterprise AI")
info()
async
¶
Get collection information.
stats()
async
¶
Get detailed statistics about this collection.
Returns comprehensive information including row count, columns, vector configuration (dimension, seed), and metadata.
Example
stats = await hb.collection("customers").stats() print(stats) # "customers: 1,000 rows, 5 columns (structured)" print(f"Dimension: {stats.dimension}, Seed: {stats.seed}")
get_schema()
async
¶
Get the Compose schema stored with this collection.
Returns the schema that was passed during ingest, deserialized back into a molecule object (Triple, Record, etc.).
Returns:
| Type | Description |
|---|---|
Optional[BaseMolecule]
|
BaseMolecule instance or None if no schema was stored. |
Example
schema = await hb.collection("facts").get_schema() if schema: print(f"Schema: {schema.molecule_type}") print(f"Slots: {schema.slots()}")
count()
async
¶
Get number of rows in collection.
exists()
async
¶
Check if collection exists.
Returns:
| Type | Description |
|---|---|
bool
|
True if collection exists, False if not found. |
Raises:
| Type | Description |
|---|---|
ConnectionError
|
If server is unreachable. |
Note
Unlike returning False for connection errors, this method raises ConnectionError so you can distinguish between "collection doesn't exist" and "server unreachable".
delete()
async
¶
ingest(source, **kwargs)
async
¶
Ingest data into this collection.
search(query, top_k=None, *, mode=None, filters=None, role=None, slot_filters=None)
async
¶
Universal async search on this collection.
Automatically detects collection type and uses the best search method,
or use mode to explicitly choose.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Union[str, Dict[str, Any]]
|
Search query (string for text search, dict for field matching) |
required |
top_k
|
Optional[int]
|
Number of results to return |
None
|
mode
|
Optional[str]
|
Search mode - "auto" (default), "structured", "semantic", or "hybrid" |
None
|
filters
|
Optional[List[tuple]]
|
Hard filters for structured mode |
None
|
role
|
Optional[str]
|
Filter by document role (semantic mode) |
None
|
slot_filters
|
Optional[Dict[str, Any]]
|
Slot value filters (hybrid mode) |
None
|
select(columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)
async
¶
SQL-like SELECT on this collection.
aggregate(group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)
async
¶
SQL-like AGGREGATE on this collection.
multihop(start=None, hops=None, top_k=None)
async
¶
Multi-hop reasoning on this collection.
join(right, on, join_type='inner', columns=None, limit=None)
async
¶
JOIN this collection with another.
get_context(query, max_chunks=5, max_tokens=2000, auto_detect=True)
async
¶
Get LLM-ready context from this collection.
Automatically routes to the appropriate search: - String query + document collection → semantic search - Dict query → structured field search
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Union[str, Dict[str, Any]]
|
Question or query (string for text, dict for structured) |
required |
max_chunks
|
int
|
Maximum chunks to retrieve |
5
|
max_tokens
|
int
|
Approximate maximum tokens |
2000
|
auto_detect
|
bool
|
If True, detect collection type and use appropriate search |
True
|
ask(question, top_k=5, role_filter=None)
async
¶
End-to-end RAG on this collection.
query(schema=None)
async
¶
Get a schema-aware async query builder for this collection.
AsyncComposeQuery provides a fluent interface for building queries that leverage the collection's schema (if available) for type-safe, slot-based operations.
If no schema is provided, attempts to load the schema from the collection's stored metadata (set during ingest).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
Optional[BaseMolecule]
|
Optional molecule schema for validation. If not provided, will try to load from collection stats. |
None
|
Returns:
| Type | Description |
|---|---|
AsyncComposeQuery
|
AsyncComposeQuery builder for this collection |
Example
Auto-loads schema if collection was ingested with one¶
q = await hb.collection("facts").query() results = await q.search("enterprise software")
Explicit schema for type-safe slot access¶
from hyperbinder.compose import Triple, Field schema = Triple(subject=Field("entity"), ...) q = await hb.collection("facts").query(schema) results = await q.find(subject="Alice")