Skip to content

Fictional Universes

A fun introduction to HyperBinder using characters from Marvel, DC, Star Wars, Lord of the Rings, and Harry Potter.

What you'll learn:

  • Nested molecule composition (typed entities)
  • Cross-universe analogical reasoning
  • Bundle search for prototype matching
  • Why structure matters in vector encoding

The Key Insight

Traditional vector databases encode "Thor" as a single vector. But which Thor?

  • Marvel's Thor (Avengers)
  • Norse mythology Thor
  • God of War's Thor

With nested molecules, we encode characters as typed entities:

Character = (Universe, Name)
Marvel:Thor ≠ Norse:Thor

These are different vectors, but they're related—more similar to each other than Marvel:Thor is to Marvel:Hulk.


The Data

We'll model character relationships across five fictional universes:

import pandas as pd

# Format: (subject_universe, subject_name, relation, object_universe, object_name)
facts = [
    # Star Wars mentorship chain
    ("StarWars", "Rey", "mentored_by", "StarWars", "Luke Skywalker"),
    ("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Yoda"),
    ("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),
    ("StarWars", "Anakin Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),

    # Harry Potter mentorship
    ("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Albus Dumbledore"),
    ("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Sirius Black"),

    # Lord of the Rings
    ("LotR", "Frodo Baggins", "mentored_by", "LotR", "Gandalf"),
    ("LotR", "Bilbo Baggins", "mentored_by", "LotR", "Gandalf"),

    # Marvel
    ("Marvel", "Peter Parker", "mentored_by", "Marvel", "Tony Stark"),
    ("Marvel", "Wanda Maximoff", "mentored_by", "Marvel", "Doctor Strange"),

    # DC
    ("DC", "Dick Grayson", "mentored_by", "DC", "Bruce Wayne"),
    ("DC", "Tim Drake", "mentored_by", "DC", "Bruce Wayne"),

    # Team leadership
    ("Marvel", "Tony Stark", "leads", "Marvel", "Avengers"),
    ("DC", "Superman", "leads", "DC", "Justice League"),
    ("Marvel", "Charles Xavier", "leads", "Marvel", "X-Men"),

    # Rivalries
    ("Marvel", "Thor", "enemy_of", "Marvel", "Loki"),
    ("DC", "Batman", "enemy_of", "DC", "Joker"),
    ("StarWars", "Luke Skywalker", "enemy_of", "StarWars", "Darth Vader"),
    ("HarryPotter", "Harry Potter", "enemy_of", "HarryPotter", "Voldemort"),
    ("LotR", "Gandalf", "enemy_of", "LotR", "Sauron"),
]

df = pd.DataFrame(facts, columns=[
    "subject_universe", "subject_name",
    "relation",
    "object_universe", "object_name"
])

The Schema: Nested Molecules

Here's where it gets interesting. We define characters as Pair(universe, name):

from hybi import HyperBinder
from hybi.compose import Triple, Pair, Field, Encoding

schema = Triple(
    subject=Pair(
        left=Field("subject_universe", encoding=Encoding.EXACT),
        right=Field("subject_name", encoding=Encoding.SEMANTIC),
    ),
    predicate=Field("relation", encoding=Encoding.EXACT),
    object=Pair(
        left=Field("object_universe", encoding=Encoding.EXACT),
        right=Field("object_name", encoding=Encoding.SEMANTIC),
    ),
)

What this encodes:

((universe, name), relation, (universe, name))

This creates a composite structure where:

  • Marvel:Thor and Norse:Thor are different vectors
  • But they share the "Thor" component, so they're similar
  • You can query by just the universe or just the name

Ingest and Query

hb = HyperBinder("http://localhost:8000")

# Ingest with our nested schema
hb.ingest(df, collection="fictional_universe", schema=schema)

# Get a query builder
q = hb.query("fictional_universe", schema=schema)

Find by Typed Entity

# Find all relationships for Marvel's Tony Stark
results = q.find(
    subject_universe="Marvel",
    subject_name="Tony Stark",
    top_k=10
)

for r in results:
    print(f"{r['subject_universe']}:{r['subject_name']} "
          f"--[{r['relation']}]--> "
          f"{r['object_universe']}:{r['object_name']}")

Column names work directly

Even though subject_universe is nested inside subject.left in the schema, you can query using the column name directly. HyperBinder automatically resolves field names to their schema paths. You can also use:

  • Dot notation: q.find(**{"subject.left": "Marvel"})
  • Top-level slot: q.find(subject="Tony Stark") (matches any nested field)

Output:

Marvel:Tony Stark --[leads]--> Marvel:Avengers
Marvel:Peter Parker --[mentored_by]--> Marvel:Tony Stark

Find All Mentorships

results = q.find(relation="mentored_by", top_k=20)

for r in results:
    print(f"{r['subject_name']} was mentored by {r['object_name']} ({r['object_universe']})")

Output:

Rey was mentored by Luke Skywalker (StarWars)
Luke Skywalker was mentored by Yoda (StarWars)
Harry Potter was mentored by Albus Dumbledore (HarryPotter)
Frodo Baggins was mentored by Gandalf (LotR)
Peter Parker was mentored by Tony Stark (Marvel)
Dick Grayson was mentored by Bruce Wayne (DC)
...


Cross-Universe Analogies

This is impossible with traditional databases.

Question: "Luke was mentored by Yoda. Who plays a similar role for Harry?"

results = hb.analogy(
    a="Luke Skywalker",
    b="Yoda",
    c="Harry Potter",
    field_name="subject_name",
    collection="fictional_universe",
    top_k=3,
)

for r in results:
    print(f"→ {r['object_name']} ({r['object_universe']})")

Output:

→ Albus Dumbledore (HarryPotter)
→ Sirius Black (HarryPotter)

The system found Dumbledore—the wise elder mentor archetype that matches Yoda's role for Luke.

Another Analogy

Question: "Batman mentored Robin. Who did Obi-Wan mentor similarly?"

results = hb.analogy(
    a="Bruce Wayne",
    b="Dick Grayson",
    c="Obi-Wan Kenobi",
    field_name="subject_name",
    collection="fictional_universe",
    top_k=3,
)

Output:

→ Anakin Skywalker

Anakin was Obi-Wan's primary apprentice, just as Robin was Batman's.


Bundle Search: Find Similar Characters

Find characters that match ANY of multiple examples:

# Find characters like Superman, Thor, Wonder Woman (powerful hero archetypes)
results = hb.bundle_search(
    values=["Superman", "Thor", "Wonder Woman"],
    field_name="subject_name",
    collection="fictional_universe",
    top_k=10,
)

Or find mentor figures:

# Find wise elders like Yoda, Gandalf, Dumbledore
results = hb.bundle_search(
    values=["Yoda", "Gandalf", "Albus Dumbledore"],
    field_name="object_name",
    collection="fictional_universe",
    top_k=10,
)

Why Nesting Matters

Without nesting (flat encoding):

encode("Thor") = same vector everywhere
No distinction between universes

With nesting (typed entities):

encode("Marvel", "Thor") = different from encode("Norse", "Thor")

These are DIFFERENT but RELATED vectors: - similarity(Marvel:Thor, Norse:Thor) > 0.5 (same name) - similarity(Marvel:Thor, Marvel:Hulk) < 0.3 (different character)

This enables:

  1. Disambiguation - Which Thor do you mean?
  2. Cross-domain analogies - Similar characters across universes
  3. Component extraction - Query by universe or name independently
  4. Typed queries - "Find all Marvel mentors"

Complete Code

#!/usr/bin/env python3
"""Fictional Universe Knowledge Graph Demo"""

import pandas as pd
from hybi import HyperBinder
from hybi.compose import Triple, Pair, Field, Encoding


def create_data():
    facts = [
        # Star Wars
        ("StarWars", "Rey", "mentored_by", "StarWars", "Luke Skywalker"),
        ("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Yoda"),
        ("StarWars", "Luke Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),
        ("StarWars", "Anakin Skywalker", "mentored_by", "StarWars", "Obi-Wan Kenobi"),
        ("StarWars", "Luke Skywalker", "enemy_of", "StarWars", "Darth Vader"),

        # Harry Potter
        ("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Albus Dumbledore"),
        ("HarryPotter", "Harry Potter", "mentored_by", "HarryPotter", "Sirius Black"),
        ("HarryPotter", "Harry Potter", "enemy_of", "HarryPotter", "Voldemort"),

        # Lord of the Rings
        ("LotR", "Frodo Baggins", "mentored_by", "LotR", "Gandalf"),
        ("LotR", "Gandalf", "enemy_of", "LotR", "Sauron"),

        # Marvel
        ("Marvel", "Peter Parker", "mentored_by", "Marvel", "Tony Stark"),
        ("Marvel", "Tony Stark", "leads", "Marvel", "Avengers"),
        ("Marvel", "Thor", "enemy_of", "Marvel", "Loki"),

        # DC
        ("DC", "Dick Grayson", "mentored_by", "DC", "Bruce Wayne"),
        ("DC", "Batman", "enemy_of", "DC", "Joker"),
        ("DC", "Superman", "leads", "DC", "Justice League"),
    ]
    return pd.DataFrame(facts, columns=[
        "subject_universe", "subject_name", "relation",
        "object_universe", "object_name"
    ])


def main():
    hb = HyperBinder("http://localhost:8000")

    # Nested schema: Character = Pair(Universe, Name)
    schema = Triple(
        subject=Pair(
            left=Field("subject_universe", encoding=Encoding.EXACT),
            right=Field("subject_name", encoding=Encoding.SEMANTIC),
        ),
        predicate=Field("relation", encoding=Encoding.EXACT),
        object=Pair(
            left=Field("object_universe", encoding=Encoding.EXACT),
            right=Field("object_name", encoding=Encoding.SEMANTIC),
        ),
    )

    # Ingest
    df = create_data()
    hb.ingest(df, collection="fictional_universe", schema=schema)

    # Query
    q = hb.query("fictional_universe", schema=schema)

    # Find all mentorships
    print("Mentorship relationships:")
    for r in q.find(relation="mentored_by", top_k=10):
        print(f"  {r['subject_name']}{r['object_name']}")

    # Cross-universe analogy
    print("\nAnalogy: Luke:Yoda :: Harry:?")
    results = hb.analogy("Luke Skywalker", "Yoda", "Harry Potter",
                         field_name="subject_name",
                         collection="fictional_universe")
    for r in results:
        print(f"  → {r['object_name']}")


if __name__ == "__main__":
    main()

Key Takeaways

  1. Nested molecules encode structure - Characters as (Universe, Name) pairs
  2. Typed entities enable disambiguation - Marvel:Thor ≠ Norse:Thor
  3. Cross-universe analogies find similar roles - Yoda↔Dumbledore as mentor archetypes
  4. Bundle search matches prototypes - Find characters like ANY of multiple examples
  5. The structure IS the semantics - Nesting encodes meaning into vector space