Getting Started¶

This guide introduces HyperBinder's core concepts and gets you running queries in minutes.

Installation¶

pip install hybi

Core Concepts¶

Before diving into code, understand these key ideas:

1. Neurosymbolic = Semantic + Symbolic¶

HyperBinder combines three query paradigms:

Paradigm	What it does	Example
Semantic	Similarity-based matching	"Find companies like Apple"
Symbolic	Exact/structured matching	`WHERE revenue > 1000000`
Traversal	Relationship discovery	"x -> y -> z"

Most queries blend all three - semantic search with symbolic filters between relationships.

While Hyperbinder works just fine on its own, it's tailor made for AI applications that require a clearly specified knowledge layer, such as a RAG architecture or LLM agent.

2. Collections Hold Your Data¶

A collection is like a database table. You ingest data into collections, then query them:

hb.ingest("customers.csv", collection="customers")
results = hb.search("enterprise AI", collection="customers")

3. Compositional Schemas Define Structure (Recommended Usage Pattern)¶

For simple use cases, HyperBinder works without schemas. But schemas unlock:

Slot-specific queries: Search only in "subject" field
Structural decomposition: Extract components from composed data
Relationship traversal: Follow paths through knowledge graphs

from hybi.compose import Triple, Field, Encoding

# This schema encodes: Subject -[Predicate]-> Object
schema = Triple(
    subject=Field("entity"),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("target"),
)

Your First Queries¶

Connect to HyperBinder¶

from hybi import HyperBinder

# Client mode: Connect to server (uses HYPERBINDER_URL env var if not specified)
hb = HyperBinder("http://localhost:8000")

# Local mode: Embedded client (no Docker required, needs 'hyperbinder' package)
hb = HyperBinder(local=True)
# Optional: specify database path
hb = HyperBinder(local=True, db_path="./my_db")

# Check connection
print(hb.ping())  # {'status': 'ok'}

Ingest Data¶

# From CSV
hb.ingest("customers.csv", collection="customers")

# From pandas DataFrame
import pandas as pd
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Carol"],
    "role": ["Engineer", "Manager", "Designer"],
    "department": ["Engineering", "Engineering", "Design"]
})
hb.ingest(df, collection="employees")

Semantic Search¶

Find similar items using natural language:

results = hb.search("machine learning experts", collection="employees")

for r in results:
    print(f"{r['name']}: {r.score:.2f}")
# Alice: 0.87
# Bob: 0.72

SQL-like Queries¶

Filter with exact conditions:

# SELECT with conditions
result = hb.select(
    collection="employees",
    where=[("department", "=", "Engineering")]
)

# Aggregation
result = hb.aggregate(
    collection="employees",
    group_by=["department"],
    aggregations=[("name", "count", "count")]
)

Multi-hop Queries¶

Traverse relationships:

# Find: Alice -> manager -> department
results = hb.multihop(
    start_value="Alice",
    path=["manager", "department"],
    collection="employees"
)

Schema-Aware Queries (Compose)¶

In HyperBinder, schemas specify how you want vectors to compose together. They define a pattern for queries to follow.

With HyperBinder, the developer's primary task is knowledge representation, describing the mental model of the problem domain using template units called molecules and compounds.

Schemas transform raw data into structured knowledge. By defining a schema, you tell HyperBinder how your data relates and unlock powerful queries:

from hybi.compose import Triple, Field, Encoding

# Define a relationship schema
schema = Triple(
    subject=Field("person"),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Field("company"),
)

# Ingest with schema
facts = pd.DataFrame({
    "person": ["Einstein", "Curie", "Darwin"],
    "relation": ["worked_at", "worked_at", "studied"],
    "company": ["Princeton", "Sorbonne", "Cambridge"]
})
hb.ingest(facts, collection="facts", schema=schema)

# Query by slot
q = hb.query("facts", schema=schema)
results = q.find(person="Einstein")  # Find all facts about Einstein
results = q.find(relation="worked_at")  # Find all "worked_at" relations

Cross-Collection Queries (Intersections)¶

Collections are isolated by default. Intersections declare relationships between them, enabling cross-collection joins:

# Two collections: employees and expertise
hb.ingest(employees_df, collection="employees")
hb.ingest(expertise_df, collection="expertise")

# Declare how they connect
hb.intersect("employees.employee_id", "expertise.subject")

# Now query across both
results = (
    hb.query("employees")
    .search("senior engineer")
    .join("expertise")
)

for r in results:
    if r.is_matched:
        print(f"{r.source['name']} knows {r.target['skill']}")

Relation Types¶

Type	Matching	Use Case
`identity`	Exact equality	IDs, foreign keys
`semantic`	Embedding similarity	Text content, fuzzy matching
`link`	Explicit mappings	Cross-encoding fields (flexible mode)

# Exact matching (default for EXACT-encoded fields)
hb.intersect("orders.customer_id", "customers.id")

# Semantic matching for text fields
hb.intersect(
    "emails.content",
    "projects.description",
    relation="semantic",
    threshold=0.7
)

# Flexible mode: cross-encoding intersections via explicit links
ix = hb.intersect_flexible("employees.employee_id", "expertise.topic")
hb.populate_links(ix, links_df, "emp_id", "topic")

The idea is to compose together when you need structured, unstructured, or semi-structured behavior, and then chain those together to traverse multiple collections in one query. Need some knowledge graph here? Specify a KnowledgeGraph compound? Need it validate these off a structured schema: use Table. Vector search can be used when needed to handle vagueness and semantics.

Chaining Joins¶

Traverse multiple collections in one query:

results = (
    hb.query("employees")
    .search("ML engineer")
    .join("expertise")      # employees → expertise
    .join("projects")       # expertise → projects
    .join("budgets")        # projects → budgets
)

See Intersections API for full documentation.

RAG (Retrieval-Augmented Generation)¶

Build context for LLMs:

# Get relevant context
context = hb.get_context(
    "What were the Q3 results?",
    collection="reports",
    max_chunks=5,
    max_tokens=2000
)
print(f"Retrieved {context.token_count} tokens")

# Or get a complete answer
answer = hb.ask(
    "Summarize the quarterly performance",
    collection="reports"
)
print(answer.text)

Using the Collection API¶

For fluent, collection-focused operations:

employees = hb.collection("employees")

# Check collection
print(employees.stats())  # employees: 3 rows, 3 columns

# Query methods chain from collection
results = employees.search("senior engineers", top_k=10)

Async Client¶

For high-throughput applications:

from hybi import AsyncHyperBinder
import asyncio

async def main():
    async with AsyncHyperBinder() as hb:
        results = await hb.search("query", collection="data")
        for r in results:
            print(r['name'])

asyncio.run(main())

Error Handling¶

from hybi import (
    HyperBinderError,
    CollectionNotFoundError,
    AuthenticationError,
)

try:
    results = hb.search("query", collection="missing")
except CollectionNotFoundError as e:
    print(f"Collection not found: {e}")
    print(f"Suggestion: {e.suggestion}")
except HyperBinderError as e:
    print(f"Error [{e.error_code}]: {e}")

Running the Quickstart Example¶

A complete working example is available in the SDK:

# Navigate to the SDK directory
cd sdk

# Run the quickstart example
python examples/01_quickstart.py

The quickstart example demonstrates:

Initializing a HyperBinder client
Ingesting data from CSV and DataFrames
Semantic search queries
SQL-like filtering and aggregation
Multi-hop traversals

See examples/01_quickstart.py for the full code.

Next Steps¶

Compose Concepts - Deep dive into the schema system
Intersections - Cross-collection query patterns
Examples - Complete runnable examples
API Reference - Full method documentation