Fictional Universes¶
A fun introduction to HyperBinder using characters from Marvel, DC, Star Wars, Lord of the Rings, and Harry Potter.
What you'll learn:
- Nested molecule composition (typed entities)
- Cross-universe analogical reasoning
- Bundle search for prototype matching
- Why structure matters in vector encoding
The Key Insight¶
Traditional vector databases encode "Thor" as a single vector. But which Thor?
- Marvel's Thor (Avengers)
- Norse mythology Thor
- God of War's Thor
With nested molecules, we encode characters as typed entities:
These are different vectors, but they're related—more similar to each other than Marvel:Thor is to Marvel:Hulk.
The Data¶
We'll model character relationships across five fictional universes:
import pandas as pd
# Format: (subject_universe, subject_name, relation, object_universe, object_name)
facts = [
# Star Wars mentorship chain
("StarWars", "Rey", "mentored_by", "StarWars", "Luke Skywalker"),
("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Yoda"),
("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),
("StarWars", "Anakin Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),
# Harry Potter mentorship
("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Albus Dumbledore"),
("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Sirius Black"),
# Lord of the Rings
("LotR", "Frodo Baggins", "mentored_by", "LotR", "Gandalf"),
("LotR", "Bilbo Baggins", "mentored_by", "LotR", "Gandalf"),
# Marvel
("Marvel", "Peter Parker", "mentored_by", "Marvel", "Tony Stark"),
("Marvel", "Wanda Maximoff", "mentored_by", "Marvel", "Doctor Strange"),
# DC
("DC", "Dick Grayson", "mentored_by", "DC", "Bruce Wayne"),
("DC", "Tim Drake", "mentored_by", "DC", "Bruce Wayne"),
# Team leadership
("Marvel", "Tony Stark", "leads", "Marvel", "Avengers"),
("DC", "Superman", "leads", "DC", "Justice League"),
("Marvel", "Charles Xavier", "leads", "Marvel", "X-Men"),
# Rivalries
("Marvel", "Thor", "enemy_of", "Marvel", "Loki"),
("DC", "Batman", "enemy_of", "DC", "Joker"),
("StarWars", "Luke Skywalker", "enemy_of", "StarWars", "Darth Vader"),
("HarryPotter", "Harry Potter", "enemy_of", "HarryPotter", "Voldemort"),
("LotR", "Gandalf", "enemy_of", "LotR", "Sauron"),
]
df = pd.DataFrame(facts, columns=[
"subject_universe", "subject_name",
"relation",
"object_universe", "object_name"
])
The Schema: Nested Molecules¶
Here's where it gets interesting. We define characters as Pair(universe, name):
from hybi import HyperBinder
from hybi.compose import Triple, Pair, Field, Encoding
schema = Triple(
subject=Pair(
left=Field("subject_universe", encoding=Encoding.EXACT),
right=Field("subject_name", encoding=Encoding.SEMANTIC),
),
predicate=Field("relation", encoding=Encoding.EXACT),
object=Pair(
left=Field("object_universe", encoding=Encoding.EXACT),
right=Field("object_name", encoding=Encoding.SEMANTIC),
),
)
What this encodes:
This creates a composite structure where:
Marvel:ThorandNorse:Thorare different vectors- But they share the "Thor" component, so they're similar
- You can query by just the universe or just the name
Ingest and Query¶
hb = HyperBinder("http://localhost:8000")
# Ingest with our nested schema
hb.ingest(df, collection="fictional_universe", schema=schema)
# Get a query builder
q = hb.query("fictional_universe", schema=schema)
Find by Typed Entity¶
# Find all relationships for Marvel's Tony Stark
results = q.find(
subject_universe="Marvel",
subject_name="Tony Stark",
top_k=10
)
for r in results:
print(f"{r['subject_universe']}:{r['subject_name']} "
f"--[{r['relation']}]--> "
f"{r['object_universe']}:{r['object_name']}")
Column names work directly
Even though subject_universe is nested inside subject.left in the schema,
you can query using the column name directly. HyperBinder automatically
resolves field names to their schema paths. You can also use:
- Dot notation:
q.find(**{"subject.left": "Marvel"}) - Top-level slot:
q.find(subject="Tony Stark")(matches any nested field)
Output:
Marvel:Tony Stark --[leads]--> Marvel:Avengers
Marvel:Peter Parker --[mentored_by]--> Marvel:Tony Stark
Find All Mentorships¶
results = q.find(relation="mentored_by", top_k=20)
for r in results:
print(f"{r['subject_name']} was mentored by {r['object_name']} ({r['object_universe']})")
Output:
Rey was mentored by Luke Skywalker (StarWars)
Luke Skywalker was mentored by Yoda (StarWars)
Harry Potter was mentored by Albus Dumbledore (HarryPotter)
Frodo Baggins was mentored by Gandalf (LotR)
Peter Parker was mentored by Tony Stark (Marvel)
Dick Grayson was mentored by Bruce Wayne (DC)
...
Cross-Universe Analogies¶
This is impossible with traditional databases.
Question: "Luke was mentored by Yoda. Who plays a similar role for Harry?"
results = hb.analogy(
a="Luke Skywalker",
b="Yoda",
c="Harry Potter",
field_name="subject_name",
collection="fictional_universe",
top_k=3,
)
for r in results:
print(f"→ {r['object_name']} ({r['object_universe']})")
Output:
The system found Dumbledore—the wise elder mentor archetype that matches Yoda's role for Luke.
Another Analogy¶
Question: "Batman mentored Robin. Who did Obi-Wan mentor similarly?"
results = hb.analogy(
a="Bruce Wayne",
b="Dick Grayson",
c="Obi-Wan Kenobi",
field_name="subject_name",
collection="fictional_universe",
top_k=3,
)
Output:
Anakin was Obi-Wan's primary apprentice, just as Robin was Batman's.
Bundle Search: Find Similar Characters¶
Find characters that match ANY of multiple examples:
# Find characters like Superman, Thor, Wonder Woman (powerful hero archetypes)
results = hb.bundle_search(
values=["Superman", "Thor", "Wonder Woman"],
field_name="subject_name",
collection="fictional_universe",
top_k=10,
)
Or find mentor figures:
# Find wise elders like Yoda, Gandalf, Dumbledore
results = hb.bundle_search(
values=["Yoda", "Gandalf", "Albus Dumbledore"],
field_name="object_name",
collection="fictional_universe",
top_k=10,
)
Why Nesting Matters¶
Without nesting (flat encoding):
With nesting (typed entities):
These are DIFFERENT but RELATED vectors:
- similarity(Marvel:Thor, Norse:Thor) > 0.5 (same name)
- similarity(Marvel:Thor, Marvel:Hulk) < 0.3 (different character)
This enables:
- Disambiguation - Which Thor do you mean?
- Cross-domain analogies - Similar characters across universes
- Component extraction - Query by universe or name independently
- Typed queries - "Find all Marvel mentors"
Complete Code¶
#!/usr/bin/env python3
"""Fictional Universe Knowledge Graph Demo"""
import pandas as pd
from hybi import HyperBinder
from hybi.compose import Triple, Pair, Field, Encoding
def create_data():
facts = [
# Star Wars
("StarWars", "Rey", "mentored_by", "StarWars", "Luke Skywalker"),
("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Yoda"),
("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),
("StarWars", "Anakin Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),
("StarWars", "Luke Skywalker", "enemy_of", "StarWars", "Darth Vader"),
# Harry Potter
("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Albus Dumbledore"),
("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Sirius Black"),
("HarryPotter", "Harry Potter", "enemy_of", "HarryPotter", "Voldemort"),
# Lord of the Rings
("LotR", "Frodo Baggins", "mentored_by", "LotR", "Gandalf"),
("LotR", "Gandalf", "enemy_of", "LotR", "Sauron"),
# Marvel
("Marvel", "Peter Parker", "mentored_by", "Marvel", "Tony Stark"),
("Marvel", "Tony Stark", "leads", "Marvel", "Avengers"),
("Marvel", "Thor", "enemy_of", "Marvel", "Loki"),
# DC
("DC", "Dick Grayson", "mentored_by", "DC", "Bruce Wayne"),
("DC", "Batman", "enemy_of", "DC", "Joker"),
("DC", "Superman", "leads", "DC", "Justice League"),
]
return pd.DataFrame(facts, columns=[
"subject_universe", "subject_name", "relation",
"object_universe", "object_name"
])
def main():
hb = HyperBinder("http://localhost:8000")
# Nested schema: Character = Pair(Universe, Name)
schema = Triple(
subject=Pair(
left=Field("subject_universe", encoding=Encoding.EXACT),
right=Field("subject_name", encoding=Encoding.SEMANTIC),
),
predicate=Field("relation", encoding=Encoding.EXACT),
object=Pair(
left=Field("object_universe", encoding=Encoding.EXACT),
right=Field("object_name", encoding=Encoding.SEMANTIC),
),
)
# Ingest
df = create_data()
hb.ingest(df, collection="fictional_universe", schema=schema)
# Query
q = hb.query("fictional_universe", schema=schema)
# Find all mentorships
print("Mentorship relationships:")
for r in q.find(relation="mentored_by", top_k=10):
print(f" {r['subject_name']} ← {r['object_name']}")
# Cross-universe analogy
print("\nAnalogy: Luke:Yoda :: Harry:?")
results = hb.analogy("Luke Skywalker", "Yoda", "Harry Potter",
field_name="subject_name",
collection="fictional_universe")
for r in results:
print(f" → {r['object_name']}")
if __name__ == "__main__":
main()
Key Takeaways¶
- Nested molecules encode structure - Characters as (Universe, Name) pairs
- Typed entities enable disambiguation - Marvel:Thor ≠ Norse:Thor
- Cross-universe analogies find similar roles - Yoda↔Dumbledore as mentor archetypes
- Bundle search matches prototypes - Find characters like ANY of multiple examples
- The structure IS the semantics - Nesting encodes meaning into vector space