`POST /build_ingest_data/`¶

Builds a new vector database or appends data to an existing one. Requires sufficient storage quota.

Request¶

Content-Type: multipart/form-data

Parameter	Type	Required	Default	Description
`file`	file	✅	—	CSV file containing rows to ingest
`dim`	int	✅	—	Vector dimensionality (must be 128–8192 for new DBs)
`seed`	int	✅	—	Random seed for the HyperBinder index
`depth`	int	✅	—	Index depth parameter
`db_name`	string	❌	`"fractal_db"`	Name of the database
`namespace`	string	❌	`"default"`	Namespace within the database
`vector_col`	string	❌	`null`	CSV column containing precomputed vectors (JSON arrays). Excluded from metadata fields.
`dtype_config`	string	❌	`null`	Optional dtype configuration (currently unused in routing logic)
`template_name`	string	❌	`null`	Name of the schema template to associate
`template_schema`	string	❌	`null`	JSON string defining schema with `semantic_fields`. Auto-generated from column names if omitted.

Behavior¶

Create vs. Append — If db_name + namespace already exists in the cache, rows are appended with IDs continuing from the last ingested row. If it's new, a fresh index is created.

Dimension enforcement — On append, the dim parameter must match the stored dimension, or the request is rejected.

Vector handling has three modes: - generated — semantic_fields are defined in the schema, so embeddings are computed automatically - precomputed — vector_col is provided and parsed from the CSV - none — neither is present

Schema inference — If template_schema is not provided, one is auto-generated from the CSV column names via auto_bundle_schema().

Responses¶

200 OK¶

{
  "status": "success",
  "mode": "Create",
  "rows_added": 150,
  "vector_source": "generated",
  "namespace": "default"
}

Field	Description
`status`	Always `"success"` on a successful request
`mode`	`"Create"` for new DBs, `"Append"` for existing ones
`rows_added`	Number of rows ingested from the uploaded CSV
`vector_source`	`"generated"`, `"precomputed"`, or `"none"`
`namespace`	The namespace used for this ingest

Error Responses¶

Status	Condition
`400`	Dimension mismatch on append, `dim` out of range [128, 8192], or malformed `vector_col` data
`413`	Uploaded file exceeds the maximum allowed size
`500`	Unexpected internal server error

Notes¶

db_name and namespace are sanitized via _sanitize_identifier() — avoid special characters.
Row IDs are assigned sequentially and are not reset on append; they continue from the last known next_row_id.
Storage usage is recorded after every successful ingest via record_ingest().

Example¶

import requests

SERVER_URL = "http://hbserver:8000"
API_KEY    = "yourapitoken"

def ingest_csv(filepath: str) -> dict:
    with open(filepath, "rb") as f:
        response = requests.post(
            f"{SERVER_URL}/build_ingest_data/",
            headers={"X-API-Key": API_KEY},
            files={
                "file": (filepath.split("/")[-1], f, "text/csv")
            },
            data={
                "dim":       512,
                "seed":      42,
                "depth":     3,
                "db_name":   "my_db",    # optional, defaults to "fractal_db"
                "namespace": "default",  # optional, defaults to "default"
            },
        )
    response.raise_for_status()
    return response.json()


result = ingest_csv(r"yourfilepath.csv")
print(result)

Expected output:

{
  "status": "success",
  "mode": "Create",
  "rows_added": 42,
  "vector_source": "none",
  "namespace": "default"
}

POST /build_ingest_data/¶