Skip to content

POST /build_ingest_data/

Builds a new vector database or appends data to an existing one. Requires sufficient storage quota.


Request

Content-Type: multipart/form-data

Parameter Type Required Default Description
file file CSV file containing rows to ingest
dim int Vector dimensionality (must be 128–8192 for new DBs)
seed int Random seed for the HyperBinder index
depth int Index depth parameter
db_name string "fractal_db" Name of the database
namespace string "default" Namespace within the database
vector_col string null CSV column containing precomputed vectors (JSON arrays). Excluded from metadata fields.
dtype_config string null Optional dtype configuration (currently unused in routing logic)
template_name string null Name of the schema template to associate
template_schema string null JSON string defining schema with semantic_fields. Auto-generated from column names if omitted.

Behavior

Create vs. Append — If db_name + namespace already exists in the cache, rows are appended with IDs continuing from the last ingested row. If it's new, a fresh index is created.

Dimension enforcement — On append, the dim parameter must match the stored dimension, or the request is rejected.

Vector handling has three modes: - generatedsemantic_fields are defined in the schema, so embeddings are computed automatically - precomputedvector_col is provided and parsed from the CSV - none — neither is present

Schema inference — If template_schema is not provided, one is auto-generated from the CSV column names via auto_bundle_schema().


Responses

200 OK

{
  "status": "success",
  "mode": "Create",
  "rows_added": 150,
  "vector_source": "generated",
  "namespace": "default"
}
Field Description
status Always "success" on a successful request
mode "Create" for new DBs, "Append" for existing ones
rows_added Number of rows ingested from the uploaded CSV
vector_source "generated", "precomputed", or "none"
namespace The namespace used for this ingest

Error Responses

Status Condition
400 Dimension mismatch on append, dim out of range [128, 8192], or malformed vector_col data
413 Uploaded file exceeds the maximum allowed size
500 Unexpected internal server error

Notes

  • db_name and namespace are sanitized via _sanitize_identifier() — avoid special characters.
  • Row IDs are assigned sequentially and are not reset on append; they continue from the last known next_row_id.
  • Storage usage is recorded after every successful ingest via record_ingest().

Example

import requests

SERVER_URL = "http://hbserver:8000"
API_KEY    = "yourapitoken"

def ingest_csv(filepath: str) -> dict:
    with open(filepath, "rb") as f:
        response = requests.post(
            f"{SERVER_URL}/build_ingest_data/",
            headers={"X-API-Key": API_KEY},
            files={
                "file": (filepath.split("/")[-1], f, "text/csv")
            },
            data={
                "dim":       512,
                "seed":      42,
                "depth":     3,
                "db_name":   "my_db",    # optional, defaults to "fractal_db"
                "namespace": "default",  # optional, defaults to "default"
            },
        )
    response.raise_for_status()
    return response.json()


result = ingest_csv(r"yourfilepath.csv")
print(result)

Expected output:

{
  "status": "success",
  "mode": "Create",
  "rows_added": 42,
  "vector_source": "none",
  "namespace": "default"
}