Enterprise Knowledge Management¶
A comprehensive example modeling an entire enterprise with six interconnected collections.
What you'll learn:
- Multiple compound types orchestrated together
- HDC-specific capabilities (analogy, bundle search)
- Knowledge graph traversal
- Cross-collection intelligence patterns
Architecture¶
flowchart TB
E["Employees<br/>(Catalog)"] -->|has_skill| S["Skills<br/>(KnowledgeGraph)"]
E -->|belongs_to| D["Departments<br/>(Hierarchy)"]
S -->|requires| P["Projects<br/>(Catalog)"]
P -->|owns| D
P -->|produces| DOC["Documents<br/>(Document)"]
DOC -->|references| C["Collaboration<br/>(Network)"]
The Six Collections¶
| Collection | Compound | Purpose |
|---|---|---|
| Employees | Catalog | Employee records with semantic bios |
| Organization | Hierarchy | Department tree structure |
| Expertise | KnowledgeGraph | Skills, certifications, requirements |
| Projects | Catalog | Project portfolio with metadata |
| Documents | Document | Policies, guides, procedures |
| Collaboration | Network | Reporting and working relationships |
Schema Definitions¶
from hybi.compose import (
Catalog, Hierarchy, KnowledgeGraph, Document, Network,
Field, Encoding
)
# Employees: structured catalog with semantic bio
employees_schema = Catalog(
columns={
"employee_id": Field(encoding=Encoding.EXACT),
"name": Field(encoding=Encoding.SEMANTIC),
"title": Field(encoding=Encoding.SEMANTIC),
"department": Field(encoding=Encoding.EXACT),
"location": Field(encoding=Encoding.EXACT),
"bio": Field(encoding=Encoding.SEMANTIC, weight=2.0),
},
primary_key="employee_id",
)
# Organization: department hierarchy
org_schema = Hierarchy(
node_field="department",
parent_field="parent",
level_field="level",
node_encoding=Encoding.EXACT,
)
# Expertise: knowledge graph of skills
expertise_schema = KnowledgeGraph(
entity_field="subject",
relation_field="relation",
subject_field="subject",
object_field="object",
entity_encoding=Encoding.SEMANTIC, # Fuzzy skill matching
relation_encoding=Encoding.EXACT, # Precise relation types
)
# Projects: project portfolio
projects_schema = Catalog(
columns={
"project_id": Field(encoding=Encoding.EXACT),
"name": Field(encoding=Encoding.SEMANTIC),
"description": Field(encoding=Encoding.SEMANTIC, weight=2.0),
"status": Field(encoding=Encoding.EXACT),
"department": Field(encoding=Encoding.EXACT),
"lead_id": Field(encoding=Encoding.EXACT),
"budget": Field(encoding=Encoding.NUMERIC),
},
primary_key="project_id",
)
# Documents: searchable content
documents_schema = Document(
content_field="content",
content_encoding=Encoding.SEMANTIC,
content_weight=2.0,
metadata_fields={
"doc_id": Field(encoding=Encoding.EXACT),
"title": Field(encoding=Encoding.SEMANTIC, weight=1.5),
"doc_type": Field(encoding=Encoding.EXACT),
"author_id": Field(encoding=Encoding.EXACT),
"department": Field(encoding=Encoding.EXACT),
},
)
# Collaboration: relationship network
collaboration_schema = Network(
source_field="source",
edge_field="relation",
target_field="target",
node_encoding=Encoding.EXACT,
edge_encoding=Encoding.EXACT,
)
Sample Data¶
Employees¶
employees = pd.DataFrame([
{
"employee_id": "EMP001",
"name": "Sarah Chen",
"title": "VP of Engineering",
"department": "Engineering",
"location": "San Francisco",
"bio": "Engineering leader with 20+ years building distributed systems. "
"Expert in Kubernetes, cloud-native, and platform engineering."
},
{
"employee_id": "EMP002",
"name": "Alice Zhang",
"title": "Senior ML Engineer",
"department": "Data Science",
"location": "San Francisco",
"bio": "ML engineer specializing in NLP and transformers. "
"Expert in PyTorch, vector databases, and semantic search."
},
# ... more employees
])
Expertise Graph¶
expertise = pd.DataFrame([
# Employee skills
("EMP002", "expert_in", "Natural Language Processing"),
("EMP002", "expert_in", "PyTorch"),
("EMP002", "expert_in", "Vector Databases"),
# Skill relationships
("Machine Learning", "related_to", "Deep Learning"),
("NLP", "related_to", "Transformers"),
# Project requirements
("PROJ001", "requires", "Machine Learning"),
("PROJ001", "requires", "Vector Databases"),
], columns=["subject", "relation", "object"])
Collaboration Network¶
collaboration = pd.DataFrame([
# Reporting
("EMP002", "reports_to", "EMP001"),
("EMP003", "reports_to", "EMP001"),
# Mentorship
("EMP004", "mentored_by", "EMP002"),
# Working together
("EMP002", "collaborates_with", "EMP003"),
], columns=["source", "relation", "target"])
Query Examples¶
1. Semantic Employee Search¶
Find employees with relevant expertise:
employees_coll = hb.collection("ekm_employees")
results = employees_coll.query(employees_schema).search(
"machine learning infrastructure and MLOps",
top_k=5
)
for r in results:
print(f"{r['name']} - {r['title']}")
print(f" Department: {r['department']}")
2. Find Experts for a Project¶
expertise_coll = hb.collection("ekm_expertise")
# Get project requirements
reqs = expertise_coll.query(expertise_schema).find(
subject="PROJ001", predicate="requires", top_k=10
)
required_skills = [r["object"] for r in reqs]
print(f"Required: {required_skills}")
# Find employees with those skills
for skill in required_skills:
experts = expertise_coll.query(expertise_schema).find(
predicate="expert_in", object=skill, top_k=5
)
for e in experts:
if e["subject"].startswith("EMP"):
print(f" {skill}: {e['subject']}")
3. Department Expertise Profile¶
# Get department members
dept_members = employees_coll.query(employees_schema).filter(
where=[("department", "==", "Data Science")], top_k=20
)
member_ids = [m["employee_id"] for m in dept_members]
# Aggregate skills
skill_counts = {}
for emp_id in member_ids:
skills = expertise_coll.query(expertise_schema).find(
subject=emp_id, predicate="expert_in", top_k=20
)
for s in skills:
skill = s["object"]
skill_counts[skill] = skill_counts.get(skill, 0) + 1
print("Team expertise:")
for skill, count in sorted(skill_counts.items(), key=lambda x: -x[1]):
print(f" {skill}: {count} experts")
HDC-Specific Capabilities¶
These queries are impossible with traditional databases.
Analogical Reasoning¶
"Alice is our ML expert in Data Science. Who plays a similar role in Platform?"
results = hb.analogy(
a="Alice Zhang",
b="Data Science",
c="Platform",
field_name="name",
collection="ekm_employees",
top_k=3,
)
for r in results:
print(f"→ {r['name']} - {r['title']}")
Bundle Search¶
Find employees with ANY of multiple skills:
results = hb.bundle_search(
values=["Python", "Kubernetes", "Machine Learning"],
field_name="object",
collection="ekm_expertise",
top_k=15,
)
employees_found = set()
for r in results:
emp_id = r["subject"]
if emp_id.startswith("EMP"):
employees_found.add(emp_id)
print(f"Found {len(employees_found)} employees with any of those skills")
Unified Semantic Search¶
Search across all entity types with one concept:
query = "machine learning production deployment"
print("Employees:")
for r in employees_coll.query(employees_schema).search(query, top_k=3):
print(f" {r['name']}")
print("Documents:")
for r in hb.collection("ekm_documents").query(documents_schema).search(query, top_k=3):
print(f" {r['title']}")
print("Projects:")
for r in hb.collection("ekm_projects").query(projects_schema).search(query, top_k=2):
print(f" {r['name']}")
Advanced Scenarios¶
Knowledge Risk Assessment¶
Find skills concentrated in a single person:
# Get all expertise facts
all_expertise = expertise_coll.query(expertise_schema).find(
predicate="expert_in", top_k=100
)
# Count experts per skill
skill_experts = {}
for e in all_expertise:
skill = e["object"]
emp = e["subject"]
if emp.startswith("EMP"):
skill_experts.setdefault(skill, []).append(emp)
# Find single-expert skills
print("Knowledge risk (single expert):")
for skill, experts in skill_experts.items():
if len(experts) == 1:
emp_name = get_employee_name(experts[0])
print(f" {skill} → only {emp_name}")
Cross-Team Collaboration¶
Find bridges between departments:
# Get team members
ds_team = {e["employee_id"] for e in employees_coll.query(employees_schema).filter(
where=[("department", "==", "Data Science")], top_k=20
)}
platform_team = {e["employee_id"] for e in employees_coll.query(employees_schema).filter(
where=[("department", "==", "Platform")], top_k=20
)}
# Find cross-team collaborations
collabs = collaboration_coll.query(collaboration_schema).find(
edge="collaborates_with", top_k=50
)
print("Cross-team bridges:")
for c in collabs:
src, tgt = c["source"], c["target"]
if (src in ds_team and tgt in platform_team) or \
(src in platform_team and tgt in ds_team):
print(f" {src} ↔ {tgt}")
Key Takeaways¶
- Six compounds, one unified system - Each optimized for its data shape
- Cross-collection intelligence - Navigate relationships across collections
- HDC enables the impossible - Analogies, bundle search, semantic traversal
- Production patterns - Knowledge risk, staffing gaps, collaboration analysis
- Semantic + symbolic - Natural language queries with structured filters