Enterprise Knowledge Management¶

A comprehensive example modeling an entire enterprise with six interconnected collections.

What you'll learn:

Multiple compound types orchestrated together
HDC-specific capabilities (analogy, bundle search)
Knowledge graph traversal
Cross-collection intelligence patterns

Architecture¶

flowchart TB
    E["Employees<br/>(Catalog)"] -->|has_skill| S["Skills<br/>(KnowledgeGraph)"]
    E -->|belongs_to| D["Departments<br/>(Hierarchy)"]
    S -->|requires| P["Projects<br/>(Catalog)"]
    P -->|owns| D
    P -->|produces| DOC["Documents<br/>(Document)"]
    DOC -->|references| C["Collaboration<br/>(Network)"]

The Six Collections¶

Collection	Compound	Purpose
Employees	Catalog	Employee records with semantic bios
Organization	Hierarchy	Department tree structure
Expertise	KnowledgeGraph	Skills, certifications, requirements
Projects	Catalog	Project portfolio with metadata
Documents	Document	Policies, guides, procedures
Collaboration	Network	Reporting and working relationships

Schema Definitions¶

from hybi.compose import (
    Catalog, Hierarchy, KnowledgeGraph, Document, Network,
    Field, Encoding
)

# Employees: structured catalog with semantic bio
employees_schema = Catalog(
    columns={
        "employee_id": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "title": Field(encoding=Encoding.SEMANTIC),
        "department": Field(encoding=Encoding.EXACT),
        "location": Field(encoding=Encoding.EXACT),
        "bio": Field(encoding=Encoding.SEMANTIC, weight=2.0),
    },
    primary_key="employee_id",
)

# Organization: department hierarchy
org_schema = Hierarchy(
    node_field="department",
    parent_field="parent",
    level_field="level",
    node_encoding=Encoding.EXACT,
)

# Expertise: knowledge graph of skills
expertise_schema = KnowledgeGraph(
    entity_field="subject",
    relation_field="relation",
    subject_field="subject",
    object_field="object",
    entity_encoding=Encoding.SEMANTIC,  # Fuzzy skill matching
    relation_encoding=Encoding.EXACT,   # Precise relation types
)

# Projects: project portfolio
projects_schema = Catalog(
    columns={
        "project_id": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "description": Field(encoding=Encoding.SEMANTIC, weight=2.0),
        "status": Field(encoding=Encoding.EXACT),
        "department": Field(encoding=Encoding.EXACT),
        "lead_id": Field(encoding=Encoding.EXACT),
        "budget": Field(encoding=Encoding.NUMERIC),
    },
    primary_key="project_id",
)

# Documents: searchable content
documents_schema = Document(
    content_field="content",
    content_encoding=Encoding.SEMANTIC,
    content_weight=2.0,
    metadata_fields={
        "doc_id": Field(encoding=Encoding.EXACT),
        "title": Field(encoding=Encoding.SEMANTIC, weight=1.5),
        "doc_type": Field(encoding=Encoding.EXACT),
        "author_id": Field(encoding=Encoding.EXACT),
        "department": Field(encoding=Encoding.EXACT),
    },
)

# Collaboration: relationship network
collaboration_schema = Network(
    source_field="source",
    edge_field="relation",
    target_field="target",
    node_encoding=Encoding.EXACT,
    edge_encoding=Encoding.EXACT,
)

Sample Data¶

Employees¶

employees = pd.DataFrame([
    {
        "employee_id": "EMP001",
        "name": "Sarah Chen",
        "title": "VP of Engineering",
        "department": "Engineering",
        "location": "San Francisco",
        "bio": "Engineering leader with 20+ years building distributed systems. "
               "Expert in Kubernetes, cloud-native, and platform engineering."
    },
    {
        "employee_id": "EMP002",
        "name": "Alice Zhang",
        "title": "Senior ML Engineer",
        "department": "Data Science",
        "location": "San Francisco",
        "bio": "ML engineer specializing in NLP and transformers. "
               "Expert in PyTorch, vector databases, and semantic search."
    },
    # ... more employees
])

Expertise Graph¶

expertise = pd.DataFrame([
    # Employee skills
    ("EMP002", "expert_in", "Natural Language Processing"),
    ("EMP002", "expert_in", "PyTorch"),
    ("EMP002", "expert_in", "Vector Databases"),

    # Skill relationships
    ("Machine Learning", "related_to", "Deep Learning"),
    ("NLP", "related_to", "Transformers"),

    # Project requirements
    ("PROJ001", "requires", "Machine Learning"),
    ("PROJ001", "requires", "Vector Databases"),
], columns=["subject", "relation", "object"])

Collaboration Network¶

collaboration = pd.DataFrame([
    # Reporting
    ("EMP002", "reports_to", "EMP001"),
    ("EMP003", "reports_to", "EMP001"),

    # Mentorship
    ("EMP004", "mentored_by", "EMP002"),

    # Working together
    ("EMP002", "collaborates_with", "EMP003"),
], columns=["source", "relation", "target"])

Query Examples¶

1. Semantic Employee Search¶

Find employees with relevant expertise:

employees_coll = hb.collection("ekm_employees")

results = employees_coll.query(employees_schema).search(
    "machine learning infrastructure and MLOps",
    top_k=5
)

for r in results:
    print(f"{r['name']} - {r['title']}")
    print(f"  Department: {r['department']}")

2. Find Experts for a Project¶

expertise_coll = hb.collection("ekm_expertise")

# Get project requirements
reqs = expertise_coll.query(expertise_schema).find(
    subject="PROJ001", predicate="requires", top_k=10
)
required_skills = [r["object"] for r in reqs]
print(f"Required: {required_skills}")

# Find employees with those skills
for skill in required_skills:
    experts = expertise_coll.query(expertise_schema).find(
        predicate="expert_in", object=skill, top_k=5
    )
    for e in experts:
        if e["subject"].startswith("EMP"):
            print(f"  {skill}: {e['subject']}")

3. Department Expertise Profile¶

# Get department members
dept_members = employees_coll.query(employees_schema).filter(
    where=[("department", "==", "Data Science")], top_k=20
)
member_ids = [m["employee_id"] for m in dept_members]

# Aggregate skills
skill_counts = {}
for emp_id in member_ids:
    skills = expertise_coll.query(expertise_schema).find(
        subject=emp_id, predicate="expert_in", top_k=20
    )
    for s in skills:
        skill = s["object"]
        skill_counts[skill] = skill_counts.get(skill, 0) + 1

print("Team expertise:")
for skill, count in sorted(skill_counts.items(), key=lambda x: -x[1]):
    print(f"  {skill}: {count} experts")

HDC-Specific Capabilities¶

These queries are impossible with traditional databases.

Analogical Reasoning¶

"Alice is our ML expert in Data Science. Who plays a similar role in Platform?"

results = hb.analogy(
    a="Alice Zhang",
    b="Data Science",
    c="Platform",
    field_name="name",
    collection="ekm_employees",
    top_k=3,
)

for r in results:
    print(f"→ {r['name']} - {r['title']}")

Bundle Search¶

Find employees with ANY of multiple skills:

results = hb.bundle_search(
    values=["Python", "Kubernetes", "Machine Learning"],
    field_name="object",
    collection="ekm_expertise",
    top_k=15,
)

employees_found = set()
for r in results:
    emp_id = r["subject"]
    if emp_id.startswith("EMP"):
        employees_found.add(emp_id)

print(f"Found {len(employees_found)} employees with any of those skills")

Unified Semantic Search¶

Search across all entity types with one concept:

query = "machine learning production deployment"

print("Employees:")
for r in employees_coll.query(employees_schema).search(query, top_k=3):
    print(f"  {r['name']}")

print("Documents:")
for r in hb.collection("ekm_documents").query(documents_schema).search(query, top_k=3):
    print(f"  {r['title']}")

print("Projects:")
for r in hb.collection("ekm_projects").query(projects_schema).search(query, top_k=2):
    print(f"  {r['name']}")

Advanced Scenarios¶

Knowledge Risk Assessment¶

Find skills concentrated in a single person:

# Get all expertise facts
all_expertise = expertise_coll.query(expertise_schema).find(
    predicate="expert_in", top_k=100
)

# Count experts per skill
skill_experts = {}
for e in all_expertise:
    skill = e["object"]
    emp = e["subject"]
    if emp.startswith("EMP"):
        skill_experts.setdefault(skill, []).append(emp)

# Find single-expert skills
print("Knowledge risk (single expert):")
for skill, experts in skill_experts.items():
    if len(experts) == 1:
        emp_name = get_employee_name(experts[0])
        print(f"  {skill} → only {emp_name}")

Cross-Team Collaboration¶

Find bridges between departments:

# Get team members
ds_team = {e["employee_id"] for e in employees_coll.query(employees_schema).filter(
    where=[("department", "==", "Data Science")], top_k=20
)}
platform_team = {e["employee_id"] for e in employees_coll.query(employees_schema).filter(
    where=[("department", "==", "Platform")], top_k=20
)}

# Find cross-team collaborations
collabs = collaboration_coll.query(collaboration_schema).find(
    edge="collaborates_with", top_k=50
)

print("Cross-team bridges:")
for c in collabs:
    src, tgt = c["source"], c["target"]
    if (src in ds_team and tgt in platform_team) or \
       (src in platform_team and tgt in ds_team):
        print(f"  {src} ↔ {tgt}")

Key Takeaways¶

Six compounds, one unified system - Each optimized for its data shape
Cross-collection intelligence - Navigate relationships across collections
HDC enables the impossible - Analogies, bundle search, semantic traversal
Production patterns - Knowledge risk, staffing gaps, collaboration analysis
Semantic + symbolic - Natural language queries with structured filters