Unified Intelligence Demo¶
This demo showcases HyperBinder's unique ability to seamlessly traverse between semantic (fuzzy) and symbolic (exact) data in a single query chain—something that traditionally requires complex multi-system orchestration.
The Knowledge Design Layer¶
Modern AI stacks have a gap. LLMs need structured knowledge, but data lives in fragmented systems. HyperBinder fills this gap with a Knowledge Design Layer:
flowchart TB
A["LLMs · Agents · RAG"]
K["<b>KNOWLEDGE DESIGN LAYER</b><br/><i>Compounds · Intersections · Queries</i>"]
D["Your Data Sources"]
A <-->|"Declarative queries"| K
K <-->|"Ingestion"| D
style K fill:#3182ce,stroke:#63b3ed,stroke-width:2px,color:#fff
Without HyperBinder: 200+ lines of orchestration across 3+ systems per query type.
With HyperBinder: ~10 lines, one system, declarative.
The Scenario¶
A product manager asks:
"Who are our ML experts in Engineering, what projects are they on, and what's the budget allocation?"
This simple question requires traversing:
- Semantic search on employee bios
- Exact filtering on department
- Cross-encoding join from employee IDs to expertise topics
- Semantic join from topics to projects
- Exact join from projects to budgets
Traditional Approach¶
Without HyperBinder, you'd need to orchestrate multiple systems:
flowchart TB
ES["Elasticsearch"]
APP1["App Code"]
PG1["PostgreSQL"]
APP2["App Code"]
NEO["Neo4j"]
APP3["App Code"]
PG2["PostgreSQL"]
APP4["App Code"]
ES --> APP1 --> PG1 --> APP2 --> NEO --> APP3 --> PG2 --> APP4
style ES fill:#c05621,stroke:#ed8936,color:#fff
style PG1 fill:#276749,stroke:#48bb78,color:#fff
style PG2 fill:#276749,stroke:#48bb78,color:#fff
style NEO fill:#553c9a,stroke:#9f7aea,color:#fff
style APP1 fill:#9b2c2c,stroke:#fc8181,color:#fff
style APP2 fill:#9b2c2c,stroke:#fc8181,color:#fff
style APP3 fill:#9b2c2c,stroke:#fc8181,color:#fff
style APP4 fill:#9b2c2c,stroke:#fc8181,color:#fff
Problems with this approach:
- Multiple systems to maintain (Elasticsearch, PostgreSQL, Neo4j)
- Complex orchestration code for each query type
- No type safety across system boundaries
- Difficult to modify query logic
- Performance overhead from multiple round-trips
HyperBinder Approach¶
from hybi import HyperBinder
import pandas as pd
hb = HyperBinder()
# === DATA SETUP ===
# Employees (semantic search on bios)
employees = pd.DataFrame([
{"employee_id": "EMP001", "name": "Alice Chen",
"bio": "Senior ML engineer specializing in NLP", "department": "Engineering"},
{"employee_id": "EMP002", "name": "Bob Smith",
"bio": "Backend developer with distributed systems expertise", "department": "Engineering"},
{"employee_id": "EMP003", "name": "Carol White",
"bio": "ML researcher focusing on computer vision", "department": "Research"},
{"employee_id": "EMP004", "name": "David Lee",
"bio": "Data scientist with machine learning background", "department": "Engineering"},
])
# Expertise topics (semantic encoding)
expertise = pd.DataFrame([
{"topic": "natural language processing", "domain": "AI", "maturity": "Production"},
{"topic": "computer vision models", "domain": "AI", "maturity": "Research"},
{"topic": "distributed systems design", "domain": "Infrastructure", "maturity": "Production"},
{"topic": "deep learning frameworks", "domain": "AI", "maturity": "Production"},
])
# Projects (joined via expertise)
projects = pd.DataFrame([
{"project_id": "PROJ001", "name": "ChatBot v2", "focus_area": "natural language processing"},
{"project_id": "PROJ002", "name": "Visual Search", "focus_area": "computer vision models"},
{"project_id": "PROJ003", "name": "Model Training Platform", "focus_area": "deep learning frameworks"},
])
# Budgets (exact lookups)
budgets = pd.DataFrame([
{"project_id": "PROJ001", "allocated": 500000, "spent": 320000},
{"project_id": "PROJ002", "allocated": 750000, "spent": 180000},
{"project_id": "PROJ003", "allocated": 1000000, "spent": 50000},
])
# Link mappings (employee IDs → expertise topics)
employee_expertise = pd.DataFrame([
{"employee_id": "EMP001", "topic": "natural language processing"},
{"employee_id": "EMP001", "topic": "deep learning frameworks"},
{"employee_id": "EMP003", "topic": "computer vision models"},
{"employee_id": "EMP004", "topic": "deep learning frameworks"},
{"employee_id": "EMP004", "topic": "natural language processing"},
])
# Ingest data
hb.ingest(employees, "employees")
hb.ingest(expertise, "expertise")
hb.ingest(projects, "projects")
hb.ingest(budgets, "budgets")
# === DECLARE INTERSECTIONS ===
# Cross-encoding: EXACT employee_id → SEMANTIC topic
ix = hb.intersect_flexible("employees.employee_id", "expertise.topic")
hb.populate_links(ix, employee_expertise, "employee_id", "topic")
# Same-encoding joins
hb.intersect("expertise.topic", "projects.focus_area")
hb.intersect("projects.project_id", "budgets.project_id")
# === THE POWER QUERY ===
results = (
hb.query("employees")
.search("machine learning") # SEMANTIC: fuzzy bio search
.filter(department="Engineering") # EXACT: department filter
.join("expertise") # CROSS-ENCODING: via links!
.join("projects") # SEMANTIC: topic → focus area
.join("budgets") # EXACT: project_id match
)
# That's it. ~10 lines instead of 200+.
for r in results:
emp = r["employees"]
exp = r["expertise"]
proj = r["projects"]
budget = r["budgets"]
print(f"{emp['name']} → {exp['topic']} → {proj['name']} (${budget['allocated']:,})")
Output:
Alice Chen → natural language processing → ChatBot v2 ($500,000)
Alice Chen → deep learning frameworks → Model Training Platform ($1,000,000)
David Lee → natural language processing → ChatBot v2 ($500,000)
David Lee → deep learning frameworks → Model Training Platform ($1,000,000)
What Makes This Unique¶
1. Unified Query Language¶
One syntax for semantic, exact, and graph operations:
.search("query") # Semantic similarity
.filter(field="value") # Exact equality
.join("collection") # Graph traversal
2. Cross-Encoding Joins¶
Connect fields with different encoding types via explicit links:
# EXACT employee IDs ↔ SEMANTIC topic descriptions
ix = hb.intersect_flexible("employees.employee_id", "expertise.topic")
hb.populate_links(ix, links_df, "employee_id", "topic")
3. Declarative, Not Procedural¶
Describe what you want, not how to orchestrate it:
4. Type-Safe¶
Schema validation catches errors at definition time, not runtime.
5. Composable¶
Each operation builds on the previous, creating a fluent chain.
Additional Capabilities¶
Analogical Reasoning¶
# "Alice is our NLP expert. Who plays a similar role for computer vision?"
results = hb.analogy("Alice Chen", "NLP", "computer vision",
field="name", collection="employees")
Bidirectional Traversal¶
# Start from budgets, trace back to employees
results = (
hb.query("budgets")
.filter(allocated__gt=500000)
.join("projects")
.join("expertise")
.join("employees")
)
Prototype Search¶
# Find employees similar to ANY of these examples
results = hb.search_prototype(
examples=["Alice Chen", "David Lee"],
field="name",
collection="employees"
)
Running the Demo¶
Compound Combinations: Model Any Domain¶
The real power is combining compounds to model any domain. Each compound is optimized for a specific pattern, and intersections wire them together.
| Compound | Optimized For |
|---|---|
| Catalog | Searchable records with weighted fields |
| RelationalTable | CRUD operations with primary keys |
| KnowledgeGraph | Entity-relation-entity facts |
| Hierarchy | Parent-child trees (org charts, taxonomies) |
| Document | Text chunks with metadata |
| Network | Graph relationships (social, citations) |
| TimeSeries | Time-ordered sequences |
Example: Healthcare System¶
flowchart LR
subgraph row1 [" "]
direction LR
P["Patients<br/><i>Catalog</i>"]
D["Diagnoses<br/><i>KnowledgeGraph</i>"]
T["Treatments<br/><i>TimeSeries</i>"]
end
subgraph row2 [" "]
direction LR
R["Records<br/><i>RelationalTable</i>"]
M["Medical Ontology<br/><i>Hierarchy</i>"]
DR["Drug Database<br/><i>RelationalTable</i>"]
end
P --> D --> T
P --> R
D --> M
T --> DR
results = (
hb.query("patients").search("diabetes symptoms")
.join("diagnoses") # Semantic: symptoms → condition
.join("treatments") # Identity: condition → treatment plan
.join("drugs") # Identity: drug_id → drug record
.filter(has_contraindications=True)
)
Example: Legal Research Platform¶
flowchart LR
subgraph row1 [" "]
direction LR
C["Cases<br/><i>Document</i>"]
CT["Citations<br/><i>Network</i>"]
S["Statutes<br/><i>Hierarchy</i>"]
end
subgraph row2 [" "]
direction LR
J["Judges<br/><i>Catalog</i>"]
L["Legal Concepts<br/><i>KnowledgeGraph</i>"]
JR["Jurisdictions<br/><i>Hierarchy</i>"]
end
C --> CT --> S
C --> J
CT --> L
S --> JR
results = (
hb.query("cases").search("digital privacy")
.join("citations") # Network: case → cited cases
.join("statutes") # Identity: statute_id → statute
.filter(amendment="4th")
.join("concepts") # Semantic: case text → legal concept
)
Example: Research Knowledge Base¶
flowchart LR
subgraph row1 [" "]
direction LR
PA["Papers<br/><i>Document</i>"]
AU["Authors<br/><i>Network</i>"]
IN["Institutions<br/><i>Hierarchy</i>"]
end
subgraph row2 [" "]
direction LR
CO["Concepts<br/><i>KnowledgeGraph</i>"]
GR["Grants<br/><i>RelationalTable</i>"]
VE["Venues<br/><i>Catalog</i>"]
end
PA --> AU --> IN
PA --> CO
AU --> GR
IN --> VE
results = (
hb.query("papers").search("transformer attention mechanism")
.join("concepts") # Semantic: abstract → research concept
.join("authors") # Network: paper → author collaborations
.join("institutions") # Identity: affiliation_id → institution
.filter(name="Stanford")
.join("grants") # Flexible: author_id → grant_id
.filter(agency="NSF")
)
The Pattern¶
- Choose compounds that match your data shapes
- Connect with intersections (strict for same-type, flexible for cross-type)
- Query declaratively across the entire graph
Each compound handles its specialty. Mix them freely—the intersections handle the glue.
Key Takeaways¶
- One System: Replace Elasticsearch + PostgreSQL + Neo4j with one unified interface
- Cross-Encoding: Connect any field types via explicit link mappings
- Declarative: Focus on the question, not the plumbing
- Composable: Build complex queries from simple, chainable operations
- Type-Safe: Catch errors early with schema validation
- Domain Flexible: Combine compounds to model any knowledge architecture
This is the power of neurosymbolic computing: the best of semantic and symbolic approaches, seamlessly integrated.