Skip to content

Fields & Encoding

Configure how individual fields are encoded into hypervectors.

Field

The Field class configures a single column's encoding behavior.

from hybi.compose import Field, Encoding

Field(
    name="description",           # Column name (optional, inferred from slot)
    encoding=Encoding.SEMANTIC,   # How to encode values
    weight=1.5,                   # Importance in similarity (default 1.0)
    similar_within=0.1,           # Scale for NUMERIC encoding
    searchable=True,              # Include in search (default True)
)

hybi.compose.Field dataclass

Configuration for a single field in a Compose schema.

Field.name specifies which DataFrame COLUMN to use for this slot. The slot name (e.g., "subject" in Triple) is separate from the column name.

Column Name Resolution
  1. If Field.name is provided, use that as the column name
  2. If Field.name is None, use the slot name as the column name
Example

Column name matches slot name (most common)

Triple( subject=Field("subject"), # Uses column "subject" predicate=Field("predicate"), object=Field("object"), )

Column name differs from slot name

Triple( subject=Field("entity_name"), # Uses column "entity_name" for subject slot predicate=Field("relation_type"), object=Field("target_entity"), )

Field with custom encoding

Field("category", encoding=Encoding.EXACT)

Field with weight boost

Field("description", weight=2.0)

Field("price", encoding=Encoding.NUMERIC, similar_within=NumericScale.DOLLARS)

Numeric field with custom scale

Field("score", encoding=Encoding.NUMERIC, similar_within=25)

name = None class-attribute instance-attribute

DataFrame column name to use for this slot.

If None, the slot name is used as the column name. Example: Triple(subject=Field()) uses the "subject" column.

encoding = Encoding.SEMANTIC class-attribute instance-attribute

How values are encoded into hypervectors.

weight = 1.0 class-attribute instance-attribute

Importance weight in similarity calculations.

Higher weights make this field more influential in search. The final score is a weighted average: Σ(similarity × weight) / Σ(weight).

similar_within = 0.1 class-attribute instance-attribute

Scale for NUMERIC encoding: the distance at which values are "similar".

Values within this distance have ~60% similarity. Values at 2× this distance have ~14% similarity. Values at 3× this distance have ~1% similarity.

Use NumericScale presets for common data types

similar_within=NumericScale.DOLLARS # $50 difference = similar similar_within=NumericScale.RATING_5 # 0.5 star difference = similar similar_within=NumericScale.PERCENTAGE # 5 points difference = similar

Or use a custom number

similar_within=100 # 100 units difference = similar

Only used when encoding=Encoding.NUMERIC. Default: 0.1 (suitable for normalized 0-1 data).

searchable = True class-attribute instance-attribute

Whether to include this field in search queries.

required = False class-attribute instance-attribute

Whether the field must be present (not null) in data.

__init__(name=None, encoding=Encoding.SEMANTIC, weight=1.0, similar_within=0.1, searchable=True, required=False)


Encoding

The Encoding enum specifies how values become vectors.

Encoding Behavior Use When
SEMANTIC Similar values → similar vectors Text, names, descriptions
EXACT Each unique value → distinct vector IDs, categories, types
NUMERIC Close numbers → similar vectors Prices, counts, ratings
TEMPORAL Time-aware encoding for dates/timestamps Timestamps, dates, event times
HIERARCHICAL Parent-child similarity encoding Taxonomies, org structures
from hybi.compose import Encoding

# Semantic: "apple" is similar to "fruit"
Field("description", encoding=Encoding.SEMANTIC)

# Exact: "category_a" is NOT similar to "category_b"
Field("type", encoding=Encoding.EXACT)

# Numeric: 100 is similar to 110 (within similar_within)
Field("price", encoding=Encoding.NUMERIC, similar_within=50)

hybi.compose.Encoding

Bases: Enum

How a field value is encoded into hypervectors.

Different encoding types are suited for different data types and query patterns.

SEMANTIC = auto() class-attribute instance-attribute

Similar values produce similar vectors (default).

Best for: text, embeddings, semantic content. Enables: similarity search, semantic matching.

EXACT = auto() class-attribute instance-attribute

Each unique value gets a random orthogonal vector.

Best for: categorical values, IDs, enums. Enables: exact match queries, slot-based unbinding.

NUMERIC = auto() class-attribute instance-attribute

Gaussian RBF encoding for continuous values.

Best for: prices, measurements, quantities. Enables: range queries, numeric similarity. Uses similar_within parameter for scale.

TEMPORAL = auto() class-attribute instance-attribute

Time-aware encoding for dates and timestamps.

Storage: Unix epoch seconds (i64 precision). Timezone: Naive datetime/strings use local time; timezone-aware respected. NULLs: Field omitted from storage; queries treat as "no match".

Best for: timestamps, dates, event times. Enables: temporal queries, time-range filtering.

HIERARCHICAL = auto() class-attribute instance-attribute

Parent-child similarity encoding.

Best for: categories, taxonomies, org structures. Enables: hierarchical queries, level-aware matching.


NumericScale

Pre-configured scales for numeric encoding.

from hybi.compose import Field, Encoding, NumericScale

# Prices in dollars (similar within $100)
Field("price", encoding=Encoding.NUMERIC, similar_within=NumericScale.DOLLARS)

# 5-star ratings
Field("rating", encoding=Encoding.NUMERIC, similar_within=NumericScale.RATING_5)

# Percentages
Field("score", encoding=Encoding.NUMERIC, similar_within=NumericScale.PERCENTAGE)

hybi.compose.NumericScale

Preset scales for NUMERIC field encoding.

These presets define "similar_within" values for common data types. The value represents the distance at which two numbers have ~60% similarity.

Usage

Field("price", encoding=Encoding.NUMERIC, similar_within=NumericScale.DOLLARS) Field("rating", encoding=Encoding.NUMERIC, similar_within=NumericScale.RATING_5)

How it works

similar_within=50 means values within 50 units have high similarity (~60%+). Values at exactly 50 apart have ~60% similarity. Values at 100 apart (2x) have ~14% similarity. Values at 150 apart (3x) have ~1% similarity.

Custom values

You can use any positive number: similar_within=100 for custom scales.

CENTS = 10 class-attribute instance-attribute

Price in cents: $0.10 difference = high similarity. For micro-transactions.

DOLLARS = 50 class-attribute instance-attribute

Price in dollars: $50 difference = high similarity. For consumer goods.

DOLLARS_LUXURY = 500 class-attribute instance-attribute

Price in dollars: $500 difference = high similarity. For luxury items.

RATING_5 = 0.5 class-attribute instance-attribute

5-star rating: 0.5 star difference = high similarity.

RATING_10 = 1.0 class-attribute instance-attribute

10-point rating: 1 point difference = high similarity.

RATING_100 = 10 class-attribute instance-attribute

100-point rating (percentages): 10 point difference = high similarity.

PERCENTAGE = 5 class-attribute instance-attribute

Percentage (0-100): 5 percentage points = high similarity.

FRACTION = 0.05 class-attribute instance-attribute

Fraction (0.0-1.0): 0.05 difference = high similarity.

TEMPERATURE_C = 2 class-attribute instance-attribute

Temperature in Celsius: 2°C difference = high similarity.

TEMPERATURE_F = 4 class-attribute instance-attribute

Temperature in Fahrenheit: 4°F difference = high similarity.

SMALL_COUNT = 1 class-attribute instance-attribute

Small counts (0-10): 1 unit difference = high similarity.

MEDIUM_COUNT = 10 class-attribute instance-attribute

Medium counts (0-100): 10 unit difference = high similarity.

LARGE_COUNT = 100 class-attribute instance-attribute

Large counts (0-1000): 100 unit difference = high similarity.

SECONDS = 5 class-attribute instance-attribute

Duration in seconds: 5 seconds = high similarity.

MINUTES = 1 class-attribute instance-attribute

Duration in minutes: 1 minute = high similarity.

HOURS = 0.5 class-attribute instance-attribute

Duration in hours: 30 minutes = high similarity.

DAYS = 1 class-attribute instance-attribute

Duration in days: 1 day = high similarity.


Field Weights

Weights control relative importance in similarity calculations:

from hybi.compose import Bundle, Field, Encoding

schema = Bundle(
    fields={
        "title": Field(encoding=Encoding.SEMANTIC, weight=2.0),  # 2x importance
        "description": Field(encoding=Encoding.SEMANTIC, weight=1.0),
        "category": Field(encoding=Encoding.EXACT, weight=0.5),  # Half weight
    }
)

When searching, higher-weighted fields contribute more to the similarity score.


Encoding Selection Guide

flowchart LR
    Q1{Categorical/ID?} -->|Yes| EXACT
    Q1 -->|No| Q2{Numeric?}
    Q2 -->|Yes| NUMERIC
    Q2 -->|No| SEMANTIC

Examples

Field Type Example Values Encoding
Product name "MacBook Pro", "iPhone" SEMANTIC
Category "electronics", "clothing" EXACT
Price 999.99, 49.99 NUMERIC
Description "Lightweight laptop..." SEMANTIC
User ID "user_12345" EXACT
Rating 4.5, 3.0 NUMERIC
Relationship type "works_at", "knows" EXACT

FieldPath

The FieldPath dataclass represents a path to a field within a nested schema. It's returned by schema.resolve_field() and used internally for field resolution.

from hybi.compose.fields import FieldPath

# FieldPath for a nested field
path = schema.resolve_field("subject_type")
print(path.parts)          # ("subject", "left")
print(path.column_name)    # "subject_type"
print(path.dot_notation)   # "subject.left"
print(path.root_slot)      # "subject"
print(path.is_nested)      # True

Properties

Property Type Description
parts Tuple[str, ...] Tuple of slot names from root to field
column_name str The DataFrame column name
field Field The Field configuration
dot_notation str Path as dot-separated string
root_slot str The top-level slot name
is_nested bool True if path has multiple parts

hybi.compose.fields.FieldPath dataclass

Represents a path to a field within a nested schema.

For simple schemas: FieldPath(("subject",), "entity", field) For nested schemas: FieldPath(("subject", "left"), "subject_type", field)

This enables queries using column names (what users see in DataFrames) rather than requiring knowledge of the internal slot structure.

Attributes:

Name Type Description
parts Tuple[str, ...]

Tuple of slot names forming the path from root to leaf

column_name str

The DataFrame column name this path maps to

field Field

The Field configuration at this path

Example

schema = Triple( ... subject=Pair( ... left=Field("subject_type"), ... right=Field("subject_name"), ... ), ... predicate=Field("relation"), ... object=Field("target"), ... ) field_map = schema.get_field_map() field_map["subject_type"] FieldPath(parts=("subject", "left"), column_name="subject_type", field=Field(...))

parts instance-attribute

Tuple of slot names from root to this field.

column_name instance-attribute

The DataFrame column name this path maps to.

field instance-attribute

The Field configuration at this path.

dot_notation property

Return the path as dot-separated string: 'subject.left'.

root_slot property

Return the top-level slot name.

is_nested property

Whether this is a nested path (more than one part).


SchemaEvolution

The SchemaEvolution enum controls how schema changes are handled during subsequent ingest operations on a collection.

When using ADAPTIVE mode, you can optionally suppress evolution warnings for bulk/backfill jobs.

from hybi.compose import SchemaEvolution

hb.ingest(df, collection="users", schema=schema, evolution=SchemaEvolution.STRICT)

# ADAPTIVE mode with warnings suppressed
hb.ingest(
    df,
    collection="users",
    schema=schema,
    evolution=SchemaEvolution.ADAPTIVE,
    warn_schema_evolution=False,
)
Mode Behavior
ADAPTIVE Default. Additive changes allowed with warnings. New fields auto-added.
STRICT No changes allowed without explicit migration. Raises error on schema change.
LOCKED Immutable schema. Even additive changes are blocked after first ingest.
Parameter Type Default Description
warn_schema_evolution bool True In ADAPTIVE mode, emit schema evolution warnings when additive changes are applied. Set to False to suppress warning noise in controlled pipelines.

hybi.compose.SchemaEvolution

Bases: Enum

Controls how schema changes are handled for a collection.

Schema evolution mode determines whether changes to the schema (adding fields, changing encodings, etc.) are allowed during subsequent ingest operations.

Example

hb.ingest(df, collection="users", schema=schema, ... evolution=SchemaEvolution.STRICT)

ADAPTIVE = 'adaptive' class-attribute instance-attribute

Default mode: additive changes allowed with warnings.

  • New fields in data are automatically added to schema
  • Breaking changes (removing fields) require explicit flag
  • Encoding changes require explicit migration
  • Best for: development, exploration, evolving data

STRICT = 'strict' class-attribute instance-attribute

No changes allowed without explicit migration.

  • Any schema change raises an error
  • Requires explicit migration for changes
  • Best for: production, regulated environments

LOCKED = 'locked' class-attribute instance-attribute

Immutable schema - no changes allowed at all.

  • Even additive changes are blocked
  • Schema is frozen after first ingest
  • Best for: audit logs, compliance, immutable data