Intersections Tutorial¶

A step-by-step guide to understanding how cross-collection joins work in HyperBinder.

What you'll learn:

Declaring intersections between collections
How the join mechanics work internally
Working with JoinedResult and match status
Output formats and filtering

The Problem¶

You have two collections:

employees: {employee_id, name, dept}
expertise: {subject, skill, level}

They're connected: employees.employee_id links to expertise.subject.

How do you query employees and get their skills in one operation?

Step 1: Declare the Intersection¶

An intersection declares the relationship between two collections:

from hybi import HyperBinder

hb = HyperBinder("http://localhost:8000")

# Declare: employees.employee_id links to expertise.subject
intersection = hb.intersect("employees.employee_id", "expertise.subject")

This tells HyperBinder:

"When I query 'employees' and join to 'expertise', match rows where employees.employee_id = expertise.subject"

The intersection is stored in a registry and can be reused.

Step 2: Sample Data¶

# Employees
employees = [
    {"employee_id": "EMP001", "name": "Alice", "dept": "Engineering"},
    {"employee_id": "EMP002", "name": "Bob", "dept": "Engineering"},
    {"employee_id": "EMP003", "name": "Charlie", "dept": "Sales"},
]

# Expertise (skills held by employees)
expertise = [
    {"subject": "EMP001", "skill": "Python", "level": "Expert"},
    {"subject": "EMP001", "skill": "Rust", "level": "Intermediate"},
    {"subject": "EMP002", "skill": "JavaScript", "level": "Expert"},
    {"subject": "EMP002", "skill": "Python", "level": "Beginner"},
    # Note: EMP003 has no expertise records
]

Step 3: Understanding the Join Mechanics¶

When you use .join(), HyperBinder matches rows based on the declared intersection:

What happens:

Employee	Matching Expertise	Status
Alice (EMP001)	Python (Expert), Rust (Intermediate)	MATCHED
Bob (EMP002)	JavaScript (Expert), Python (Beginner)	MATCHED
Charlie (EMP003)	(none)	NO_MATCH

Alice and Bob each match multiple expertise rows, so they appear multiple times in the results.

Step 4: Working with JoinedResult¶

Each result has helpful properties:

for result in joined_results:
    # Check match status
    if result.is_matched:
        # Confident match - safe to access target
        print(f"MATCHED: {result.source['name']} knows {result.target['skill']}")

    elif result.is_null:
        # Ambiguous match (multiple close candidates with similar scores)
        print(f"AMBIGUOUS: {result.source['name']} - unclear match")

    elif result.is_no_match:
        # No matching row found
        print(f"NO MATCH: {result.source['name']} - no expertise on file")

Output:

MATCHED: Alice knows Python
MATCHED: Alice knows Rust
MATCHED: Bob knows JavaScript
MATCHED: Bob knows Python
NO MATCH: Charlie - no expertise on file

Match Status Values¶

Status	Meaning	When it happens
`MATCHED`	Confident match found	Clear best match above threshold
`NULL`	Ambiguous match	Multiple candidates with similar scores
`NO_MATCH`	No match found	No candidates above threshold

Step 5: Filtering Results¶

Wrap results in JoinedResultSet for filtering utilities:

from hybi.compose.intersections import JoinedResultSet

result_set = JoinedResultSet(
    results=joined_results,
    intersection=intersection,
    source_count=len(employees),
    target_count=len(expertise),
)

# Get only matched results
matched = result_set.filter_matched()
print(f"Matched: {len(matched)} of {len(result_set)}")

# Statistics
print(f"Matched count: {result_set.matched_count}")
print(f"Null count: {result_set.null_count}")
print(f"No match count: {result_set.no_match_count}")
print(f"Expansion ratio: {result_set.expansion_ratio:.2f}x")

Output:

Matched: 4 of 5
Matched count: 4
Null count: 0
No match count: 1
Expansion ratio: 1.67x

The expansion ratio shows fan-out: 3 employees became 5 results (some matched multiple expertise rows).

Step 6: Output Formats¶

JoinedResult supports multiple access patterns:

Direct Access (Recommended)¶

result.source['name']     # → "Alice"
result.target['skill']    # → "Python"

Flat Dictionary¶

Keys are prefixed with collection names:

result.to_flat()
# {
#   'employees.employee_id': 'EMP001',
#   'employees.name': 'Alice',
#   'employees.dept': 'Engineering',
#   'expertise.subject': 'EMP001',
#   'expertise.skill': 'Python',
#   'expertise.level': 'Expert',
#   '_score': 1.0,
#   '_status': 'MATCHED'
# }

Nested Dictionary¶

Grouped by collection:

result.to_nested()
# {
#   'employees': {'employee_id': 'EMP001', 'name': 'Alice', 'dept': 'Engineering'},
#   'expertise': {'subject': 'EMP001', 'skill': 'Python', 'level': 'Expert'}
# }

Step 7: Using .join() in Practice¶

With the intersection declared, use .join() in queries:

# Query employees, join to expertise
results = (
    hb.query("employees", schema=employee_schema)
    .search("engineering")
    .join("expertise")
)

for r in results:
    if r.is_matched:
        print(f"{r.source['name']} knows {r.target['skill']}")

Chaining Joins¶

Join through multiple collections:

results = (
    hb.query("employees")
    .search("senior engineer")
    .join("expertise")       # employees → expertise
    .join("projects")        # expertise → projects
    .join("budgets")         # projects → budgets
)

Error Handling¶

No Intersection Declared¶

from hybi.compose.intersections import NoIntersectionError

try:
    results = hb.query("employees").search("...").join("unknown_collection")
except NoIntersectionError as e:
    print(f"No intersection defined between employees and unknown_collection")

Circular Joins¶

from hybi.compose.intersections import CircularJoinError

try:
    results = query.join("A").join("B").join("A")  # Cycle!
except CircularJoinError as e:
    print(f"Detected cycle: {e.path}")

Complete Example¶

#!/usr/bin/env python3
"""Intersections Tutorial: Complete Example"""

from hybi import HyperBinder
from hybi.compose import Triple, Field, Encoding

# Connect
hb = HyperBinder("http://localhost:8000")

# Define schemas
employee_schema = Triple(
    subject=Field("employee_id", encoding=Encoding.EXACT),
    predicate=Field("role"),
    object=Field("department"),
)

expertise_schema = Triple(
    subject=Field("employee_id", encoding=Encoding.EXACT),
    predicate=Field("skill"),
    object=Field("level"),
)

# Ingest data
hb.ingest(employees_df, collection="employees", schema=employee_schema)
hb.ingest(expertise_df, collection="expertise", schema=expertise_schema)

# Declare intersection
hb.intersect("employees.employee_id", "expertise.employee_id")

# Query with join
results = (
    hb.query("employees", schema=employee_schema)
    .find(department="Engineering")
    .join("expertise")
)

# Process results
for r in results.filter_matched():
    print(f"{r.source['employee_id']}: {r.target['skill']} ({r.target['level']})")

Step 8: Cross-Encoding Joins (Flexible Mode)¶

What if your fields have different encoding types? For example:

employees.employee_id uses EXACT encoding
expertise.topic uses SEMANTIC encoding

By default, these can't intersect—their encodings are incompatible. Flexible mode solves this with explicit link bindings.

The Problem¶

# This won't work in strict mode:
# EXACT (employee_id) ↔ SEMANTIC (topic) = Encoding mismatch!
hb.intersect("employees.employee_id", "expertise.topic")  # Error!

The Solution: Flexible Intersections¶

# 1. Declare a FLEXIBLE intersection
ix = hb.intersect_flexible("employees.employee_id", "expertise.topic")

# 2. Provide explicit link mappings
links_df = pd.DataFrame({
    "emp_id": ["EMP001", "EMP002", "EMP003"],
    "topic": ["machine learning", "databases", "cloud computing"]
})
hb.populate_links(ix, links_df, "emp_id", "topic")

# 3. Now the join works!
results = (
    hb.query("employees")
    .filter(employee_id="EMP001")
    .join("expertise")
)

for r in results:
    if r.is_matched:
        print(f"{r.source['employee_id']} → {r.target['topic']}")
        # EMP001 → machine learning

How Links Work¶

Links are bidirectional value mappings:

Source (employee_id)	Target (topic)
EMP001	machine learning
EMP002	databases
EMP003	cloud computing

The join uses these mappings instead of encoding-based matching:

Query employees, get source values (EMP001, EMP002)
Look up link mappings → ["machine learning", "databases"]
Match against target results
Return joined rows

One-to-Many Links¶

A single source can link to multiple targets:

links_df = pd.DataFrame({
    "emp_id": ["EMP001", "EMP001", "EMP002"],  # EMP001 appears twice!
    "topic": ["machine learning", "AI", "databases"]
})
hb.populate_links(ix, links_df, "emp_id", "topic")

# EMP001 now matches BOTH "machine learning" AND "AI"

When to Use Flexible Mode¶

Use Flexible Mode	Use Strict Mode
EXACT↔SEMANTIC fields	Same encoding types
Explicit value mappings needed	Natural equality works
Foreign-key-like relationships	Self-joining collections

Key Takeaways¶

DECLARE: hb.intersect("source.field", "target.field")
Tells HyperBinder how collections relate
JOIN: hb.query("source").search("...").join("target")
Executes the cross-collection query
ACCESS: result.source["field"], result.target["field"]
Direct access to matched data
CHECK: result.is_matched, result.is_null, result.is_no_match
Know the quality of each match
FILTER: result_set.filter_matched()
Get only confident matches
FLEXIBLE MODE: hb.intersect_flexible() + hb.populate_links()
Enable cross-encoding joins with explicit mappings

Next Steps¶

Intersections API Reference - Full API documentation
Enterprise Knowledge Example - Complex multi-collection queries
The Compose System - Understanding the full architecture