Intersections¶
Intersections are the glue that connects collections, enabling cross-collection queries. They enable you to seamlessly pipe queries across collections specialized for different cases such as semantic search, fuzzy matching, and exact lookups, as specified by composition schemas.
Intersections allow you to easily create sophisticated logic that would normally require complex custom code across multiple database types.
The Problem¶
By default, collections are isolated islands:
flowchart LR
E[Employees] ~~~ X[Expertise] ~~~ P[Projects]
You can query each independently, but you can't ask questions that span them—like "What skills does the ML team have?" or "Which projects need Python experts?"
The Solution¶
Intersections declare relationships between collections:
# Declare: employees.employee_id links to expertise.subject
hb.intersect("employees.employee_id", "expertise.subject")
Now the collections are connected:
flowchart LR
E[Employees] <-->|employee_id = subject| X[Expertise]
And you can query across them:
results = (
hb.query("employees")
.search("ML engineer")
.join("expertise")
)
for r in results:
print(f"{r.source['name']} knows {r.target['skill']}")
Two Types of Matching¶
| Relation | How it matches | Use for |
|---|---|---|
identity |
Exact value equality | IDs, foreign keys, categories |
semantic |
Embedding similarity | Text content, descriptions |
# Identity: exact match on IDs
hb.intersect("orders.customer_id", "customers.id")
# Semantic: fuzzy match on text
hb.intersect("emails.content", "projects.description", relation="semantic")
Strict vs Flexible Mode¶
By default, intersections use strict mode, which only allows connections between fields of the same encoding type (EXACT↔EXACT, SEMANTIC↔SEMANTIC).
Flexible mode enables cross-encoding intersections using explicit links—declared mappings that tell HyperBinder exactly which values correspond.
| Mode | Allowed Pairs | Use When |
|---|---|---|
STRICT (default) |
Same encoding types | Fields share natural equality |
FLEXIBLE |
Any encoding types | Need explicit value mappings |
When to Use Flexible Mode¶
Flexible mode solves the cross-encoding problem:
# Problem: EXACT employee IDs don't match SEMANTIC topic descriptions
employees.employee_id = "EMP001" # EXACT encoding
expertise.topic = "machine learning" # SEMANTIC encoding
# "EMP001" and "machine learning" are semantically unrelated,
# but we need to connect them for queries!
The solution: Explicitly declare which values link together.
# 1. Declare flexible intersection
ix = hb.intersect_flexible("employees.employee_id", "expertise.topic")
# 2. Provide the link mappings
links_df = pd.DataFrame({
"emp_id": ["EMP001", "EMP002", "EMP003"],
"topic": ["machine learning", "databases", "cloud computing"]
})
hb.populate_links(ix, links_df, "emp_id", "topic")
# 3. Now cross-type joins work!
results = hb.query("employees").filter(employee_id="EMP001").join("expertise")
# Returns: EMP001 → machine learning
Links as First-Class Citizens¶
Links are bidirectional by default—you can join in either direction:
# Forward: employees → expertise
results = hb.query("employees").search("Alice").join("expertise")
# Reverse: expertise → employees
results = hb.query("expertise").search("machine learning").join("employees")
See Intersections API for full reference on Link, LinkSet, and populate_links().
Chaining Joins¶
Connect multiple collections in one query:
results = (
hb.query("employees")
.search("senior engineer")
.join("expertise") # employees → expertise
.join("projects") # expertise → projects
.join("budgets") # projects → budgets
)
flowchart LR
E[Employees] --> X[Expertise] --> P[Projects] --> B[Budgets]
The Bridge Pattern¶
A powerful architecture for connecting heterogeneous data:
flowchart LR
D["Documents<br/>(fuzzy text)"] <-->|semantic| K["Knowledge Graph<br/>(entities)"] <-->|identity| T["Tables<br/>(exact data)"]
The Knowledge Graph acts as an index—semantic search finds relevant entities, which link to exact structured records.
Example: Find budget info for projects mentioned in emails:
hb.intersect("emails.content", "projects.description", relation="semantic")
hb.intersect("projects.project_id", "budgets.project_id", relation="identity")
results = (
hb.query("emails")
.search("Q2 budget concerns")
.join("projects") # semantic: email text → project
.join("budgets") # identity: project ID → budget
)
Match Quality¶
Each joined result has a status:
| Status | Meaning |
|---|---|
MATCHED |
Confident match found |
NULL |
Ambiguous (multiple close candidates) |
NO_MATCH |
No match above threshold |
for r in results:
if r.is_matched:
# Safe to use r.target
print(f"{r.source['name']} → {r.target['skill']}")
elif r.is_no_match:
print(f"{r.source['name']} has no matching expertise")
When to Use Intersections¶
Use intersections when:
- Data naturally lives in separate collections (different schemas)
- You need to answer questions spanning multiple data types
- You want to connect fuzzy (semantic) and exact (symbolic) data
Don't use intersections when:
- All data fits in one collection
- Relationships are within the same collection (use multihop instead)
Next Steps¶
- Intersections API - Full reference
- Intersections Tutorial - Step-by-step guide
- Enterprise Knowledge Example - Complex multi-collection patterns