Fields & Encoding¶
Configure how individual fields are encoded into hypervectors.
Field¶
The Field class configures a single column's encoding behavior.
from hybi.compose import Field, Encoding
Field(
name="description", # Column name (optional, inferred from slot)
encoding=Encoding.SEMANTIC, # How to encode values
weight=1.5, # Importance in similarity (default 1.0)
similar_within=0.1, # Scale for NUMERIC encoding
searchable=True, # Include in search (default True)
)
hybi.compose.Field
dataclass
¶
Configuration for a single field in a Compose schema.
Field.name specifies which DataFrame COLUMN to use for this slot. The slot name (e.g., "subject" in Triple) is separate from the column name.
Column Name Resolution
- If Field.name is provided, use that as the column name
- If Field.name is None, use the slot name as the column name
Example
Column name matches slot name (most common)¶
Triple( subject=Field("subject"), # Uses column "subject" predicate=Field("predicate"), object=Field("object"), )
Column name differs from slot name¶
Triple( subject=Field("entity_name"), # Uses column "entity_name" for subject slot predicate=Field("relation_type"), object=Field("target_entity"), )
Field with custom encoding¶
Field("category", encoding=Encoding.EXACT)
Field with weight boost¶
Field("description", weight=2.0)
Numeric field with scale preset (recommended)¶
Field("price", encoding=Encoding.NUMERIC, similar_within=NumericScale.DOLLARS)
Numeric field with custom scale¶
Field("score", encoding=Encoding.NUMERIC, similar_within=25)
name = None
class-attribute
instance-attribute
¶
DataFrame column name to use for this slot.
If None, the slot name is used as the column name. Example: Triple(subject=Field()) uses the "subject" column.
encoding = Encoding.SEMANTIC
class-attribute
instance-attribute
¶
How values are encoded into hypervectors.
weight = 1.0
class-attribute
instance-attribute
¶
Importance weight in similarity calculations.
Higher weights make this field more influential in search. The final score is a weighted average: Σ(similarity × weight) / Σ(weight).
similar_within = 0.1
class-attribute
instance-attribute
¶
Scale for NUMERIC encoding: the distance at which values are "similar".
Values within this distance have ~60% similarity. Values at 2× this distance have ~14% similarity. Values at 3× this distance have ~1% similarity.
Use NumericScale presets for common data types
similar_within=NumericScale.DOLLARS # $50 difference = similar similar_within=NumericScale.RATING_5 # 0.5 star difference = similar similar_within=NumericScale.PERCENTAGE # 5 points difference = similar
Or use a custom number
similar_within=100 # 100 units difference = similar
Only used when encoding=Encoding.NUMERIC. Default: 0.1 (suitable for normalized 0-1 data).
searchable = True
class-attribute
instance-attribute
¶
Whether to include this field in search queries.
required = False
class-attribute
instance-attribute
¶
Whether the field must be present (not null) in data.
__init__(name=None, encoding=Encoding.SEMANTIC, weight=1.0, similar_within=0.1, searchable=True, required=False)
¶
Encoding¶
The Encoding enum specifies how values become vectors.
| Encoding | Behavior | Use When |
|---|---|---|
SEMANTIC |
Similar values → similar vectors | Text, names, descriptions |
EXACT |
Each unique value → distinct vector | IDs, categories, types |
NUMERIC |
Close numbers → similar vectors | Prices, counts, ratings |
TEMPORAL |
Time-aware encoding for dates/timestamps | Timestamps, dates, event times |
HIERARCHICAL |
Parent-child similarity encoding | Taxonomies, org structures |
from hybi.compose import Encoding
# Semantic: "apple" is similar to "fruit"
Field("description", encoding=Encoding.SEMANTIC)
# Exact: "category_a" is NOT similar to "category_b"
Field("type", encoding=Encoding.EXACT)
# Numeric: 100 is similar to 110 (within similar_within)
Field("price", encoding=Encoding.NUMERIC, similar_within=50)
hybi.compose.Encoding
¶
Bases: Enum
How a field value is encoded into hypervectors.
Different encoding types are suited for different data types and query patterns.
SEMANTIC = auto()
class-attribute
instance-attribute
¶
Similar values produce similar vectors (default).
Best for: text, embeddings, semantic content. Enables: similarity search, semantic matching.
EXACT = auto()
class-attribute
instance-attribute
¶
Each unique value gets a random orthogonal vector.
Best for: categorical values, IDs, enums. Enables: exact match queries, slot-based unbinding.
NUMERIC = auto()
class-attribute
instance-attribute
¶
Gaussian RBF encoding for continuous values.
Best for: prices, measurements, quantities. Enables: range queries, numeric similarity. Uses similar_within parameter for scale.
TEMPORAL = auto()
class-attribute
instance-attribute
¶
Time-aware encoding for dates and timestamps.
Storage: Unix epoch seconds (i64 precision). Timezone: Naive datetime/strings use local time; timezone-aware respected. NULLs: Field omitted from storage; queries treat as "no match".
Best for: timestamps, dates, event times. Enables: temporal queries, time-range filtering.
HIERARCHICAL = auto()
class-attribute
instance-attribute
¶
Parent-child similarity encoding.
Best for: categories, taxonomies, org structures. Enables: hierarchical queries, level-aware matching.
NumericScale¶
Pre-configured scales for numeric encoding.
from hybi.compose import Field, Encoding, NumericScale
# Prices in dollars (similar within $100)
Field("price", encoding=Encoding.NUMERIC, similar_within=NumericScale.DOLLARS)
# 5-star ratings
Field("rating", encoding=Encoding.NUMERIC, similar_within=NumericScale.RATING_5)
# Percentages
Field("score", encoding=Encoding.NUMERIC, similar_within=NumericScale.PERCENTAGE)
hybi.compose.NumericScale
¶
Preset scales for NUMERIC field encoding.
These presets define "similar_within" values for common data types. The value represents the distance at which two numbers have ~60% similarity.
Usage
Field("price", encoding=Encoding.NUMERIC, similar_within=NumericScale.DOLLARS) Field("rating", encoding=Encoding.NUMERIC, similar_within=NumericScale.RATING_5)
How it works
similar_within=50 means values within 50 units have high similarity (~60%+). Values at exactly 50 apart have ~60% similarity. Values at 100 apart (2x) have ~14% similarity. Values at 150 apart (3x) have ~1% similarity.
Custom values
You can use any positive number: similar_within=100 for custom scales.
CENTS = 10
class-attribute
instance-attribute
¶
Price in cents: $0.10 difference = high similarity. For micro-transactions.
DOLLARS = 50
class-attribute
instance-attribute
¶
Price in dollars: $50 difference = high similarity. For consumer goods.
DOLLARS_LUXURY = 500
class-attribute
instance-attribute
¶
Price in dollars: $500 difference = high similarity. For luxury items.
RATING_5 = 0.5
class-attribute
instance-attribute
¶
5-star rating: 0.5 star difference = high similarity.
RATING_10 = 1.0
class-attribute
instance-attribute
¶
10-point rating: 1 point difference = high similarity.
RATING_100 = 10
class-attribute
instance-attribute
¶
100-point rating (percentages): 10 point difference = high similarity.
PERCENTAGE = 5
class-attribute
instance-attribute
¶
Percentage (0-100): 5 percentage points = high similarity.
FRACTION = 0.05
class-attribute
instance-attribute
¶
Fraction (0.0-1.0): 0.05 difference = high similarity.
TEMPERATURE_C = 2
class-attribute
instance-attribute
¶
Temperature in Celsius: 2°C difference = high similarity.
TEMPERATURE_F = 4
class-attribute
instance-attribute
¶
Temperature in Fahrenheit: 4°F difference = high similarity.
SMALL_COUNT = 1
class-attribute
instance-attribute
¶
Small counts (0-10): 1 unit difference = high similarity.
MEDIUM_COUNT = 10
class-attribute
instance-attribute
¶
Medium counts (0-100): 10 unit difference = high similarity.
LARGE_COUNT = 100
class-attribute
instance-attribute
¶
Large counts (0-1000): 100 unit difference = high similarity.
SECONDS = 5
class-attribute
instance-attribute
¶
Duration in seconds: 5 seconds = high similarity.
MINUTES = 1
class-attribute
instance-attribute
¶
Duration in minutes: 1 minute = high similarity.
HOURS = 0.5
class-attribute
instance-attribute
¶
Duration in hours: 30 minutes = high similarity.
DAYS = 1
class-attribute
instance-attribute
¶
Duration in days: 1 day = high similarity.
Field Weights¶
Weights control relative importance in similarity calculations:
from hybi.compose import Bundle, Field, Encoding
schema = Bundle(
fields={
"title": Field(encoding=Encoding.SEMANTIC, weight=2.0), # 2x importance
"description": Field(encoding=Encoding.SEMANTIC, weight=1.0),
"category": Field(encoding=Encoding.EXACT, weight=0.5), # Half weight
}
)
When searching, higher-weighted fields contribute more to the similarity score.
Encoding Selection Guide¶
flowchart LR
Q1{Categorical/ID?} -->|Yes| EXACT
Q1 -->|No| Q2{Numeric?}
Q2 -->|Yes| NUMERIC
Q2 -->|No| SEMANTIC
Examples¶
| Field Type | Example Values | Encoding |
|---|---|---|
| Product name | "MacBook Pro", "iPhone" | SEMANTIC |
| Category | "electronics", "clothing" | EXACT |
| Price | 999.99, 49.99 | NUMERIC |
| Description | "Lightweight laptop..." | SEMANTIC |
| User ID | "user_12345" | EXACT |
| Rating | 4.5, 3.0 | NUMERIC |
| Relationship type | "works_at", "knows" | EXACT |
FieldPath¶
The FieldPath dataclass represents a path to a field within a nested schema. It's returned by schema.resolve_field() and used internally for field resolution.
from hybi.compose.fields import FieldPath
# FieldPath for a nested field
path = schema.resolve_field("subject_type")
print(path.parts) # ("subject", "left")
print(path.column_name) # "subject_type"
print(path.dot_notation) # "subject.left"
print(path.root_slot) # "subject"
print(path.is_nested) # True
Properties¶
| Property | Type | Description |
|---|---|---|
parts |
Tuple[str, ...] |
Tuple of slot names from root to field |
column_name |
str |
The DataFrame column name |
field |
Field |
The Field configuration |
dot_notation |
str |
Path as dot-separated string |
root_slot |
str |
The top-level slot name |
is_nested |
bool |
True if path has multiple parts |
hybi.compose.fields.FieldPath
dataclass
¶
Represents a path to a field within a nested schema.
For simple schemas: FieldPath(("subject",), "entity", field) For nested schemas: FieldPath(("subject", "left"), "subject_type", field)
This enables queries using column names (what users see in DataFrames) rather than requiring knowledge of the internal slot structure.
Attributes:
| Name | Type | Description |
|---|---|---|
parts |
Tuple[str, ...]
|
Tuple of slot names forming the path from root to leaf |
column_name |
str
|
The DataFrame column name this path maps to |
field |
Field
|
The Field configuration at this path |
Example
schema = Triple( ... subject=Pair( ... left=Field("subject_type"), ... right=Field("subject_name"), ... ), ... predicate=Field("relation"), ... object=Field("target"), ... ) field_map = schema.get_field_map() field_map["subject_type"] FieldPath(parts=("subject", "left"), column_name="subject_type", field=Field(...))
parts
instance-attribute
¶
Tuple of slot names from root to this field.
column_name
instance-attribute
¶
The DataFrame column name this path maps to.
field
instance-attribute
¶
The Field configuration at this path.
dot_notation
property
¶
Return the path as dot-separated string: 'subject.left'.
root_slot
property
¶
Return the top-level slot name.
is_nested
property
¶
Whether this is a nested path (more than one part).
SchemaEvolution¶
The SchemaEvolution enum controls how schema changes are handled during subsequent ingest operations on a collection.
When using ADAPTIVE mode, you can optionally suppress evolution warnings for bulk/backfill jobs.
from hybi.compose import SchemaEvolution
hb.ingest(df, collection="users", schema=schema, evolution=SchemaEvolution.STRICT)
# ADAPTIVE mode with warnings suppressed
hb.ingest(
df,
collection="users",
schema=schema,
evolution=SchemaEvolution.ADAPTIVE,
warn_schema_evolution=False,
)
| Mode | Behavior |
|---|---|
ADAPTIVE |
Default. Additive changes allowed with warnings. New fields auto-added. |
STRICT |
No changes allowed without explicit migration. Raises error on schema change. |
LOCKED |
Immutable schema. Even additive changes are blocked after first ingest. |
| Parameter | Type | Default | Description |
|---|---|---|---|
warn_schema_evolution |
bool |
True |
In ADAPTIVE mode, emit schema evolution warnings when additive changes are applied. Set to False to suppress warning noise in controlled pipelines. |
hybi.compose.SchemaEvolution
¶
Bases: Enum
Controls how schema changes are handled for a collection.
Schema evolution mode determines whether changes to the schema (adding fields, changing encodings, etc.) are allowed during subsequent ingest operations.
Example
hb.ingest(df, collection="users", schema=schema, ... evolution=SchemaEvolution.STRICT)
ADAPTIVE = 'adaptive'
class-attribute
instance-attribute
¶
Default mode: additive changes allowed with warnings.
- New fields in data are automatically added to schema
- Breaking changes (removing fields) require explicit flag
- Encoding changes require explicit migration
- Best for: development, exploration, evolving data
STRICT = 'strict'
class-attribute
instance-attribute
¶
No changes allowed without explicit migration.
- Any schema change raises an error
- Requires explicit migration for changes
- Best for: production, regulated environments
LOCKED = 'locked'
class-attribute
instance-attribute
¶
Immutable schema - no changes allowed at all.
- Even additive changes are blocked
- Schema is frozen after first ingest
- Best for: audit logs, compliance, immutable data