Skip to content

Enterprise Knowledge Management

A comprehensive example modeling an entire enterprise with six interconnected collections.

What you'll learn:

  • Multiple compound types orchestrated together
  • HDC-specific capabilities (analogy, bundle search)
  • Knowledge graph traversal
  • Cross-collection intelligence patterns

Architecture

flowchart TB
    E["Employees<br/>(Catalog)"] -->|has_skill| S["Skills<br/>(KnowledgeGraph)"]
    E -->|belongs_to| D["Departments<br/>(Hierarchy)"]
    S -->|requires| P["Projects<br/>(Catalog)"]
    P -->|owns| D
    P -->|produces| DOC["Documents<br/>(Document)"]
    DOC -->|references| C["Collaboration<br/>(Network)"]

The Six Collections

Collection Compound Purpose
Employees Catalog Employee records with semantic bios
Organization Hierarchy Department tree structure
Expertise KnowledgeGraph Skills, certifications, requirements
Projects Catalog Project portfolio with metadata
Documents Document Policies, guides, procedures
Collaboration Network Reporting and working relationships

Schema Definitions

from hybi.compose import (
    Catalog, Hierarchy, KnowledgeGraph, Document, Network,
    Field, Encoding
)

# Employees: structured catalog with semantic bio
employees_schema = Catalog(
    columns={
        "employee_id": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "title": Field(encoding=Encoding.SEMANTIC),
        "department": Field(encoding=Encoding.EXACT),
        "location": Field(encoding=Encoding.EXACT),
        "bio": Field(encoding=Encoding.SEMANTIC, weight=2.0),
    },
    primary_key="employee_id",
)

# Organization: department hierarchy
org_schema = Hierarchy(
    node_field="department",
    parent_field="parent",
    level_field="level",
    node_encoding=Encoding.EXACT,
)

# Expertise: knowledge graph of skills
expertise_schema = KnowledgeGraph(
    entity_field="subject",
    relation_field="relation",
    subject_field="subject",
    object_field="object",
    entity_encoding=Encoding.SEMANTIC,  # Fuzzy skill matching
    relation_encoding=Encoding.EXACT,   # Precise relation types
)

# Projects: project portfolio
projects_schema = Catalog(
    columns={
        "project_id": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "description": Field(encoding=Encoding.SEMANTIC, weight=2.0),
        "status": Field(encoding=Encoding.EXACT),
        "department": Field(encoding=Encoding.EXACT),
        "lead_id": Field(encoding=Encoding.EXACT),
        "budget": Field(encoding=Encoding.NUMERIC),
    },
    primary_key="project_id",
)

# Documents: searchable content
documents_schema = Document(
    content_field="content",
    content_encoding=Encoding.SEMANTIC,
    content_weight=2.0,
    metadata_fields={
        "doc_id": Field(encoding=Encoding.EXACT),
        "title": Field(encoding=Encoding.SEMANTIC, weight=1.5),
        "doc_type": Field(encoding=Encoding.EXACT),
        "author_id": Field(encoding=Encoding.EXACT),
        "department": Field(encoding=Encoding.EXACT),
    },
)

# Collaboration: relationship network
collaboration_schema = Network(
    source_field="source",
    edge_field="relation",
    target_field="target",
    node_encoding=Encoding.EXACT,
    edge_encoding=Encoding.EXACT,
)

Sample Data

Employees

employees = pd.DataFrame([
    {
        "employee_id": "EMP001",
        "name": "Sarah Chen",
        "title": "VP of Engineering",
        "department": "Engineering",
        "location": "San Francisco",
        "bio": "Engineering leader with 20+ years building distributed systems. "
               "Expert in Kubernetes, cloud-native, and platform engineering."
    },
    {
        "employee_id": "EMP002",
        "name": "Alice Zhang",
        "title": "Senior ML Engineer",
        "department": "Data Science",
        "location": "San Francisco",
        "bio": "ML engineer specializing in NLP and transformers. "
               "Expert in PyTorch, vector databases, and semantic search."
    },
    # ... more employees
])

Expertise Graph

expertise = pd.DataFrame([
    # Employee skills
    ("EMP002", "expert_in", "Natural Language Processing"),
    ("EMP002", "expert_in", "PyTorch"),
    ("EMP002", "expert_in", "Vector Databases"),

    # Skill relationships
    ("Machine Learning", "related_to", "Deep Learning"),
    ("NLP", "related_to", "Transformers"),

    # Project requirements
    ("PROJ001", "requires", "Machine Learning"),
    ("PROJ001", "requires", "Vector Databases"),
], columns=["subject", "relation", "object"])

Collaboration Network

collaboration = pd.DataFrame([
    # Reporting
    ("EMP002", "reports_to", "EMP001"),
    ("EMP003", "reports_to", "EMP001"),

    # Mentorship
    ("EMP004", "mentored_by", "EMP002"),

    # Working together
    ("EMP002", "collaborates_with", "EMP003"),
], columns=["source", "relation", "target"])

Query Examples

Find employees with relevant expertise:

employees_coll = hb.collection("ekm_employees")

results = employees_coll.query(employees_schema).search(
    "machine learning infrastructure and MLOps",
    top_k=5
)

for r in results:
    print(f"{r['name']} - {r['title']}")
    print(f"  Department: {r['department']}")

2. Find Experts for a Project

expertise_coll = hb.collection("ekm_expertise")

# Get project requirements
reqs = expertise_coll.query(expertise_schema).find(
    subject="PROJ001", predicate="requires", top_k=10
)
required_skills = [r["object"] for r in reqs]
print(f"Required: {required_skills}")

# Find employees with those skills
for skill in required_skills:
    experts = expertise_coll.query(expertise_schema).find(
        predicate="expert_in", object=skill, top_k=5
    )
    for e in experts:
        if e["subject"].startswith("EMP"):
            print(f"  {skill}: {e['subject']}")

3. Department Expertise Profile

# Get department members
dept_members = employees_coll.query(employees_schema).filter(
    where=[("department", "==", "Data Science")], top_k=20
)
member_ids = [m["employee_id"] for m in dept_members]

# Aggregate skills
skill_counts = {}
for emp_id in member_ids:
    skills = expertise_coll.query(expertise_schema).find(
        subject=emp_id, predicate="expert_in", top_k=20
    )
    for s in skills:
        skill = s["object"]
        skill_counts[skill] = skill_counts.get(skill, 0) + 1

print("Team expertise:")
for skill, count in sorted(skill_counts.items(), key=lambda x: -x[1]):
    print(f"  {skill}: {count} experts")

HDC-Specific Capabilities

These queries are impossible with traditional databases.

Analogical Reasoning

"Alice is our ML expert in Data Science. Who plays a similar role in Platform?"

results = hb.analogy(
    a="Alice Zhang",
    b="Data Science",
    c="Platform",
    field_name="name",
    collection="ekm_employees",
    top_k=3,
)

for r in results:
    print(f"→ {r['name']} - {r['title']}")

Find employees with ANY of multiple skills:

results = hb.bundle_search(
    values=["Python", "Kubernetes", "Machine Learning"],
    field_name="object",
    collection="ekm_expertise",
    top_k=15,
)

employees_found = set()
for r in results:
    emp_id = r["subject"]
    if emp_id.startswith("EMP"):
        employees_found.add(emp_id)

print(f"Found {len(employees_found)} employees with any of those skills")

Search across all entity types with one concept:

query = "machine learning production deployment"

print("Employees:")
for r in employees_coll.query(employees_schema).search(query, top_k=3):
    print(f"  {r['name']}")

print("Documents:")
for r in hb.collection("ekm_documents").query(documents_schema).search(query, top_k=3):
    print(f"  {r['title']}")

print("Projects:")
for r in hb.collection("ekm_projects").query(projects_schema).search(query, top_k=2):
    print(f"  {r['name']}")

Advanced Scenarios

Knowledge Risk Assessment

Find skills concentrated in a single person:

# Get all expertise facts
all_expertise = expertise_coll.query(expertise_schema).find(
    predicate="expert_in", top_k=100
)

# Count experts per skill
skill_experts = {}
for e in all_expertise:
    skill = e["object"]
    emp = e["subject"]
    if emp.startswith("EMP"):
        skill_experts.setdefault(skill, []).append(emp)

# Find single-expert skills
print("Knowledge risk (single expert):")
for skill, experts in skill_experts.items():
    if len(experts) == 1:
        emp_name = get_employee_name(experts[0])
        print(f"  {skill} → only {emp_name}")

Cross-Team Collaboration

Find bridges between departments:

# Get team members
ds_team = {e["employee_id"] for e in employees_coll.query(employees_schema).filter(
    where=[("department", "==", "Data Science")], top_k=20
)}
platform_team = {e["employee_id"] for e in employees_coll.query(employees_schema).filter(
    where=[("department", "==", "Platform")], top_k=20
)}

# Find cross-team collaborations
collabs = collaboration_coll.query(collaboration_schema).find(
    edge="collaborates_with", top_k=50
)

print("Cross-team bridges:")
for c in collabs:
    src, tgt = c["source"], c["target"]
    if (src in ds_team and tgt in platform_team) or \
       (src in platform_team and tgt in ds_team):
        print(f"  {src}{tgt}")

Key Takeaways

  1. Six compounds, one unified system - Each optimized for its data shape
  2. Cross-collection intelligence - Navigate relationships across collections
  3. HDC enables the impossible - Analogies, bundle search, semantic traversal
  4. Production patterns - Knowledge risk, staffing gaps, collaboration analysis
  5. Semantic + symbolic - Natural language queries with structured filters