Client API¶
The main entry points for interacting with HyperBinder.
HyperBinder¶
Factory function that returns either a RemoteHyperBinder (client mode) or LocalHyperBinder (local mode).
from hybi import HyperBinder
# Client mode: Connect to HyperBinder server
hb = HyperBinder(
url="http://localhost:8000",
api_key="your-api-key", # Or set HYPERBINDER_API_KEY
timeout=30.0,
)
# Local mode: Embedded client (no Docker required)
hb = HyperBinder(
local=True,
db_path="./my_db", # Optional: custom database path
)
# Use as context manager
with HyperBinder() as hb:
results = hb.search("query", collection="data")
Parameters:
url(str, optional): URL of the server (e.g., "http://localhost:8000"). Defaults to http://localhost:8000 if not local.api_key(str, optional): API key for server authentication.local(bool): If True, uses embedded LocalHyperBinder (requires 'hyperbinder' pip package).**kwargs: Additional arguments passed to the specific client (e.g., db_path for local mode).
Returns: Either a LocalHyperBinder or RemoteHyperBinder instance.
RemoteHyperBinder¶
Direct access to the HTTP client (same as HyperBinder(url=...)).
from hybi import RemoteHyperBinder
hb = RemoteHyperBinder(
url="http://localhost:8000",
api_key="your-api-key",
)
LocalHyperBinder¶
Direct access to the embedded client (same as HyperBinder(local=True)).
Requirements: Requires the hyperbinder pip package to be installed.
hybi.HyperBinder(url=None, api_key=None, local=False, **kwargs)
¶
Initialize a HyperBinder client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
Optional[str]
|
URL of the server (e.g., "http://localhost:8000"). Defaults to http://localhost:8000 if not local. |
None
|
api_key
|
Optional[str]
|
Optional API key for server authentication. |
None
|
local
|
bool
|
If True, uses the embedded LocalHyperBinder (no Docker required). Requires 'hyperbinder' pip package installed. |
False
|
**kwargs
|
Any
|
Additional arguments passed to the specific client (e.g., db_path for local mode). |
{}
|
Returns:
| Type | Description |
|---|---|
Union[LocalHyperBinder, HyperBinder]
|
Either a LocalHyperBinder or a RemoteHyperBinder instance. |
AsyncHyperBinder¶
Asynchronous client for high-throughput applications.
from hybi import AsyncHyperBinder
import asyncio
async def main():
async with AsyncHyperBinder() as hb:
results = await hb.search("query", collection="data")
for r in results:
print(r['name'])
asyncio.run(main())
hybi.AsyncHyperBinder
¶
Bases: AsyncObserveMixin, AsyncComposeMixin, BaseHyperBinder
Async HyperBinder client for high-throughput applications.
Compose operations (unbind, extract, bundle_search, etc.) are provided by AsyncComposeMixin.
Observability operations (traced versions of compose methods, recovery points) are provided by AsyncObserveMixin.
Example
async with AsyncHyperBinder("http://localhost:8000") as hb: await hb.ingest("data.csv", collection="customers") results = await hb.search("enterprise AI", collection="customers")
__init__(url='http://localhost:8000', api_key=None, timeout=30.0, default_collection=None, default_top_k=10, verify_ssl=True, warn_insecure=True, max_retries=3, retry_delay=0.5, join_config=None)
¶
Initialize async HyperBinder client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
Server URL (use https:// in production) |
'http://localhost:8000'
|
api_key
|
Optional[str]
|
API key for authentication. Can also be set via HYPERBINDER_API_KEY environment variable. |
None
|
timeout
|
float
|
Request timeout in seconds |
30.0
|
default_collection
|
Optional[str]
|
Default collection for operations |
None
|
default_top_k
|
int
|
Default number of results to return |
10
|
verify_ssl
|
bool
|
Whether to verify SSL certificates (default True) |
True
|
warn_insecure
|
bool
|
Warn if using HTTP instead of HTTPS (default True) |
True
|
max_retries
|
int
|
Maximum retry attempts for transient errors (default 3) |
3
|
retry_delay
|
float
|
Initial delay between retries in seconds (default 0.5) |
0.5
|
join_config
|
Optional[JoinConfig]
|
Configuration for join operations (cycle limits, dedup). Defaults to JoinConfig() with sensible production defaults. |
None
|
close()
async
¶
Close the async HTTP client.
ping()
async
¶
Check server health.
is_ready()
async
¶
Check if server is ready to handle requests.
Returns:
| Type | Description |
|---|---|
bool
|
True if server responds with ready status, False otherwise. |
Note
Connection errors return False (server unreachable). Other errors (auth, server errors) are re-raised.
collection(name)
¶
Get an async collection object for fluent API access.
list_collections()
async
¶
List all collections.
get_collection_info(collection=None)
async
¶
Get detailed information about a collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
Optional[str]
|
Collection name (uses default if not specified) |
None
|
Returns:
| Type | Description |
|---|---|
CollectionInfo
|
CollectionInfo with type, columns, and capabilities |
get_collection_stats(collection=None, *, use_cache=True)
async
¶
Get detailed statistics about a collection.
Returns more detailed information than get_collection_info(), including vector configuration (dimension, seed) and source file metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
Optional[str]
|
Collection name (uses default if not specified) |
None
|
use_cache
|
bool
|
Whether to use cached stats if available (default True) |
True
|
Returns:
| Type | Description |
|---|---|
CollectionStats
|
CollectionStats with full collection details |
Example
stats = await hb.get_collection_stats("customers") print(stats) # "customers: 1,000 rows, 5 columns (structured)" print(f"Vector dimension: {stats.dimension}")
delete_collection(collection=None)
async
¶
Delete a collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
Optional[str]
|
Collection name (uses default if not specified) |
None
|
Raises:
| Type | Description |
|---|---|
CollectionNotFoundError
|
If collection doesn't exist |
HyperBinderError
|
If deletion fails |
ingest(source, *, collection=None, dim=1024, seed=42, depth=3, schema=None, vector_col=None, warn_schema_evolution=True)
async
¶
Ingest data into a collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Union[str, Path, DataFrame, List[str]]
|
File path, list of paths, or pandas DataFrame |
required |
collection
|
Optional[str]
|
Target collection name |
None
|
dim
|
int
|
Vector dimension for embeddings (default 512) |
1024
|
seed
|
int
|
Random seed for reproducibility (default 42) |
42
|
depth
|
int
|
Hierarchy depth (default 3) |
3
|
schema
|
Optional[BaseMolecule]
|
Optional Compose schema (Pair, Triple, Record) defining how data should be encoded. If provided, validates that the data matches the schema and stores the schema with the collection for schema-aware queries. |
None
|
warn_schema_evolution
|
bool
|
Whether to emit SchemaEvolutionWarning warnings during ingest (default True). Set False to suppress adaptive-mode schema evolution warning noise. |
True
|
Returns:
| Type | Description |
|---|---|
IngestResult
|
IngestResult with ingestion details |
Example
Ingest with explicit Triple schema¶
from hyperbinder.compose import Triple, Field, Encoding
schema = Triple( subject=Field("entity"), predicate=Field("relation", encoding=Encoding.EXACT), object=Field("target"), ) await hb.ingest(df, collection="knowledge", schema=schema)
search(query, *, collection=None, top_k=None, mode=None, filters=None, role=None, slot_filters=None)
async
¶
Universal async search across any collection type.
Automatically detects collection type and uses the best search method,
or use mode to explicitly choose.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Union[str, Dict[str, Any]]
|
Search query (string for text search, dict for field matching) |
required |
collection
|
Optional[str]
|
Collection to search |
None
|
top_k
|
Optional[int]
|
Number of results to return |
None
|
mode
|
Optional[str]
|
Search mode - "auto" (default), "structured", "semantic", or "hybrid" |
None
|
filters
|
Optional[List[tuple]]
|
Hard filters as list of (field, op, value) tuples (structured mode) |
None
|
role
|
Optional[str]
|
Filter by document role e.g. "paragraph" (semantic mode) |
None
|
slot_filters
|
Optional[Dict[str, Any]]
|
Slot value filters e.g. {"category": "Electronics"} (hybrid mode) |
None
|
Returns:
| Type | Description |
|---|---|
List[SearchResult]
|
List of SearchResult objects |
select(collection=None, columns=None, where=None, order_by=None, limit=None, offset=0, distinct=False)
async
¶
Async SQL-like SELECT query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
Optional[str]
|
Collection to query |
None
|
columns
|
Optional[List[str]]
|
Columns to select (None = all) |
None
|
where
|
Optional[List[tuple]]
|
Filter conditions as list of (field, operator, value) |
None
|
order_by
|
Optional[List[tuple]]
|
Sort order as list of (field, descending) |
None
|
limit
|
Optional[int]
|
Maximum rows to return |
None
|
offset
|
int
|
Number of rows to skip |
0
|
distinct
|
bool
|
Return distinct rows only |
False
|
Returns:
| Type | Description |
|---|---|
SelectResult
|
SelectResult with rows |
aggregate(collection=None, group_by=None, aggregations=None, where=None, having=None, order_by=None, limit=None)
async
¶
Async SQL-like AGGREGATE query with GROUP BY.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
Optional[str]
|
Collection to query |
None
|
group_by
|
Optional[List[str]]
|
Fields to group by |
None
|
aggregations
|
Optional[List[tuple]]
|
List of (field, operation, alias) tuples Operations: "sum", "avg", "count", "min", "max" |
None
|
where
|
Optional[List[tuple]]
|
Filter conditions before grouping |
None
|
having
|
Optional[List[tuple]]
|
Filter conditions after grouping |
None
|
order_by
|
Optional[List[tuple]]
|
Sort order as list of (field, descending) tuples |
None
|
limit
|
Optional[int]
|
Maximum groups to return |
None
|
Returns:
| Type | Description |
|---|---|
AggregateResult
|
AggregateResult with groups |
Example
results = await hb.aggregate( collection="orders", group_by=["category"], aggregations=[("amount", "sum", "total")], order_by=[("total", True)], # Sort by total descending )
join(left, right, on, join_type='inner', columns=None, limit=None)
async
¶
Async SQL-like JOIN across collections.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
left
|
str
|
Left collection name |
required |
right
|
str
|
Right collection name |
required |
on
|
List[tuple]
|
Join conditions as list of (left_field, operator, right_field) |
required |
join_type
|
str
|
Type of join ("inner", "left", "right", "outer") |
'inner'
|
columns
|
Optional[List[str]]
|
Columns to select from result |
None
|
limit
|
Optional[int]
|
Maximum rows to return |
None
|
Returns:
| Type | Description |
|---|---|
JoinResult
|
JoinResult with joined rows |
multihop(collection=None, start=None, hops=None, top_k=None)
async
¶
Async multi-hop reasoning query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
Optional[str]
|
Collection to query |
None
|
start
|
Optional[Dict[str, Any]]
|
Starting query as field:value dict |
None
|
hops
|
Optional[List[tuple]]
|
List of (field, value) tuples defining the path |
None
|
top_k
|
Optional[int]
|
Number of results to return |
None
|
Returns:
| Type | Description |
|---|---|
List[MultihopResult]
|
List of MultihopResult with reasoning paths |
get_context(query, collection=None, max_chunks=5, max_tokens=2000, auto_detect=True, expand=None, *, expand_reasoning=False, reasoning_hops=2, include_proofs=False)
async
¶
Async get relevant context for LLM consumption.
Automatically routes to the appropriate search method: - String query + document collection → semantic search - String query + structured collection → structured search - Dict query → structured field search
When expand is provided, the context is enriched with related data from
other collections using declared intersections (via hb.intersect()).
When expand_reasoning is provided, the context is enriched with inferred
relationships discovered via MHV compositional reasoning (requires paid tier).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Union[str, Dict[str, Any]]
|
The question or query (string for text search, dict for structured queries) |
required |
collection
|
Optional[str]
|
Collection to search |
None
|
max_chunks
|
int
|
Maximum number of chunks to retrieve |
5
|
max_tokens
|
int
|
Approximate maximum tokens in context |
2000
|
auto_detect
|
bool
|
If True, detect collection type and use appropriate search |
True
|
expand
|
Optional[List[Union[str, Dict[str, Any]]]]
|
Optional list of collections to expand into. Each element can be: - A string: collection name (e.g., "expertise") - A dict with options: {"collection": "expertise", "fields": ["skill"]} |
None
|
expand_reasoning
|
bool
|
If True, expand context with MHV reasoning inferences |
False
|
reasoning_hops
|
int
|
Maximum reasoning chain length (default 2) |
2
|
include_proofs
|
bool
|
Include proof traces for inferred relationships |
False
|
Returns:
| Type | Description |
|---|---|
Context
|
Context object with formatted text and source chunks. |
Context
|
If expand_reasoning=True, context may include inferred relationships. |
Example
Basic context retrieval¶
context = await hb.get_context("Alice's team", collection="org")
With reasoning expansion¶
context = await hb.get_context( ... "Alice's team", ... collection="org", ... expand_reasoning=True, ... reasoning_hops=3, ... )
Context now includes inferred relationships like:¶
"Alice reports_to→reports_to Bob" (2-hop inference)¶
ask(question, collection=None, top_k=5, role_filter=None)
async
¶
Async end-to-end RAG: retrieve context and generate answer.
Note: The LLM model is configured server-side.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
The question to answer |
required |
collection
|
Optional[str]
|
Collection to search |
None
|
top_k
|
int
|
Number of chunks to retrieve |
5
|
role_filter
|
Optional[str]
|
Only search in specific role (e.g., "paragraph") |
None
|
Returns:
| Type | Description |
|---|---|
Answer
|
Answer object with generated text and sources |
query(collection=None, schema=None)
¶
Get a schema-aware async query builder for a collection.
AsyncComposeQuery provides a fluent interface for building queries that leverage the collection's schema (if available) for type-safe, slot-based operations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
Optional[str]
|
Collection name (uses default if not specified) |
None
|
schema
|
Optional[BaseMolecule]
|
Optional molecule schema for validation. If not provided, queries will work but without schema validation. |
None
|
Returns:
| Type | Description |
|---|---|
AsyncComposeQuery
|
AsyncComposeQuery builder for the collection |
populate_links(intersection, df, source_column, target_column, *, weight_column=None)
async
¶
Populate a flexible intersection with link data.
Links map source field values to target field values, enabling cross-encoding joins. This method replaces any existing links for the intersection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
intersection
|
Intersection
|
The flexible intersection to populate (from intersect_flexible()) |
required |
df
|
DataFrame
|
DataFrame containing link pairs |
required |
source_column
|
str
|
Column name for source values |
required |
target_column
|
str
|
Column name for target values |
required |
weight_column
|
Optional[str]
|
Optional column for link weights (default 1.0) |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with ingestion stats: {"links_created": int, "link_collection": str} |
Raises:
| Type | Description |
|---|---|
ValueError
|
If intersection is not in FLEXIBLE mode |
Example
Create link data¶
links_df = pd.DataFrame({ ... "emp_id": ["EMP001", "EMP002", "EMP003"], ... "topic": ["machine learning", "databases", "cloud computing"] ... })
Populate the intersection¶
result = await hb.populate_links(ix, links_df, "emp_id", "topic") print(f"Created {result['links_created']} links")
get_link_bindings(link_collection, source_values)
async
¶
Retrieve link mappings for join operations.
Internal method used by join() to look up target values for source values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
link_collection
|
str
|
Name of the link collection |
required |
source_values
|
List[str]
|
List of source values to look up |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, List[str]]
|
Dict mapping source_value -> [target_values] |
insert_row(collection, *, row, schema)
async
¶
Insert a single row using Row molecule encoding (chain binding).
This method uses the dedicated /row/insert/ endpoint which provides: - Chain binding encoding for lossless field extraction - Primary key duplicate detection (raises DuplicateKeyError if exists) - Proper indexing for O(1) PK lookups
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
str
|
Collection name |
required |
row
|
Dict[str, Any]
|
Row data including primary key |
required |
schema
|
BaseMolecule
|
RelationalTable schema (required) |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with insert status and pk info |
Raises:
| Type | Description |
|---|---|
ValueError
|
If schema is not RelationalTable or row missing PK |
DuplicateKeyError
|
If row with same PK already exists |
Example
await hb.insert_row( ... "users", ... row={"user_id": "U001", "email": "alice@test.com", "name": "Alice"}, ... schema=users_schema, ... )
get_row(collection, *, pk_field, pk_value)
async
¶
Get a row by primary key (O(1) lookup).
Used by RelationalTable for deterministic PK lookups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
str
|
Collection name |
required |
pk_field
|
str
|
Name of the primary key field |
required |
pk_value
|
Any
|
Value to look up |
required |
Returns:
| Type | Description |
|---|---|
Optional[Dict[str, Any]]
|
Row data dict if found, None if not found |
Raises:
| Type | Description |
|---|---|
ValueError
|
If multiple rows found (PK must be unique) |
CollectionNotFoundError
|
If collection doesn't exist |
Example
row = await hb.get_row("users", pk_field="user_id", pk_value="U001") if row: ... print(row["email"])
update(collection, *, where, set, schema=None)
async
¶
Atomically update a row matching the where clause.
This method performs an atomic update - all fields are updated together in a single operation. For RelationalTable schemas, the row is re-encoded with chain binding to preserve encoding integrity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
str
|
Collection name |
required |
where
|
Dict[str, Any]
|
Primary key condition, e.g. {"user_id": "U001"} |
required |
set
|
Dict[str, Any]
|
Fields to update, e.g. {"email": "new@test.com"} |
required |
schema
|
Optional[BaseMolecule]
|
Optional RelationalTable schema for validation and re-encoding |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with update status and info including old/new values |
Raises:
| Type | Description |
|---|---|
ValueError
|
If trying to update primary key or missing PK in where |
CollectionNotFoundError
|
If row not found |
Example
await hb.update( ... "users", ... where={"user_id": "U001"}, ... set={"email": "new@example.com", "name": "Alice Smith"}, ... schema=users_schema, ... )
delete(collection, *, where, schema=None)
async
¶
Delete a row matching the where clause.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
str
|
Collection name |
required |
where
|
Dict[str, Any]
|
Primary key condition, e.g. {"user_id": "U001"} |
required |
schema
|
Optional[BaseMolecule]
|
Optional RelationalTable schema for validation |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with delete status and info |
Raises:
| Type | Description |
|---|---|
ValueError
|
If missing PK in where |
CollectionNotFoundError
|
If row not found |
Example
await hb.delete_row("users", where={"user_id": "U001"})
upsert(collection, *, row, schema)
async
¶
Insert or update a row.
If a row with matching primary key exists, update it. Otherwise, insert as new row.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection
|
str
|
Collection name |
required |
row
|
Dict[str, Any]
|
Row data including primary key |
required |
schema
|
BaseMolecule
|
RelationalTable schema (required for PK info) |
required |
Returns:
| Type | Description |
|---|---|
UpsertResult
|
UpsertResult indicating whether inserted or updated |
Example
await hb.upsert( ... "users", ... row={"user_id": "U001", "email": "new@example.com", "name": "Alice"}, ... schema=users_schema, ... )
Method Categories¶
Health & Status¶
| Method | Description |
|---|---|
ping() |
Check server connectivity |
is_ready() |
Check if server is ready |
auth_status() |
Get authentication status |
Collection Management¶
| Method | Description |
|---|---|
collection(name) |
Get Collection fluent API |
list_collections() |
List all collections with metadata |
list_collection_names() |
List collection names only (convenience) |
get_collection_info(name) |
Get collection metadata |
get_collection_stats(name) |
Get detailed statistics |
delete_collection(name) |
Delete a collection |
Data Ingestion¶
| Method | Description |
|---|---|
ingest(source, collection, schema) |
Ingest CSV, DataFrame, or documents |
Search Operations¶
| Method | Description |
|---|---|
search(query, collection) |
Semantic similarity search |
bundle_search(values, field, collection) |
Search similar to any example |
search_prototype(examples, collection) |
Search using examples as a prototype |
analogy(a, b, c, field, collection) |
Analogical reasoning (A:B :: C:?) |
SQL-like Operations¶
| Method | Description |
|---|---|
select(conditions, collection) |
Filter with conditions |
aggregate(group_by, field, function) |
GROUP BY aggregation |
join(collection, target, on) |
Cross-collection JOIN |
Graph Operations¶
| Method | Description |
|---|---|
multihop(start, path, collection) |
Multi-hop traversal |
RAG Operations¶
| Method | Description |
|---|---|
get_context(query, collection) |
Retrieve context for LLM |
ask(query, collection) |
End-to-end RAG query |
Compose Operations¶
| Method | Description |
|---|---|
query(collection, schema) |
Get ComposeQuery builder |
intersect(source, target) |
Define cross-collection relationship (strict mode) |
intersect_flexible(source, target) |
Define cross-encoding relationship (flexible mode) |
populate_links(intersection, df, ...) |
Populate links for flexible intersection |
CRUD Operations (RelationalTable)¶
These methods provide SQL-like row operations for RelationalTable schemas.
| Method | Description |
|---|---|
insert_row(collection, row, schema) |
Insert a single row |
get_row(collection, pk_field, pk_value) |
Get row by primary key |
update(collection, where, set, schema) |
Update row atomically |
delete(collection, where, schema) |
Delete row by primary key |
upsert(collection, row, schema) |
Insert or update row |
Example:
from hybi.compose import RelationalTable, Field, Encoding
schema = RelationalTable(
columns={
"user_id": Field(encoding=Encoding.EXACT),
"email": Field(encoding=Encoding.EXACT),
"name": Field(encoding=Encoding.SEMANTIC),
},
primary_key="user_id",
)
# Insert
hb.insert_row("users", row={"user_id": "U001", "email": "a@test.com", "name": "Alice"}, schema=schema)
# Get
row = hb.get_row("users", pk_field="user_id", pk_value="U001")
# Update
hb.update("users", where={"user_id": "U001"}, set={"email": "new@test.com"}, schema=schema)
# Delete
hb.delete("users", where={"user_id": "U001"}, schema=schema)
# Upsert (insert or update)
hb.upsert("users", row={"user_id": "U001", "email": "upsert@test.com", "name": "Alice"}, schema=schema)
See RelationalTable for more details on schema definition.