Skip to content

POST /build_ingest_data/

Builds a new vector database or appends data to an existing one. Requires sufficient storage quota.


Request

Content-Type: multipart/form-data

Parameter Type Required Default Description
file file CSV file containing rows to ingest
dim int Vector dimensionality (must be 128–8192 for new DBs)
seed int Random seed for the HyperBinder index
depth int Index depth parameter
db_name string "fractal_db" Name of the database
namespace string "default" Namespace within the database
vector_col string null CSV column containing precomputed vectors (JSON arrays). Excluded from metadata fields.
dtype_config string null Optional dtype configuration (currently unused in routing logic)
template_name string null Name of the schema template to associate
template_schema string null JSON string defining schema with semantic_fields. Auto-generated from column names if omitted.
on_conflict string "error" Behavior when an incoming row's primary key already exists: "error" (409), "update" (re-encode existing row), or "skip" (keep existing, drop incoming).

Behavior

Create vs. Append — If db_name + namespace already exists in the cache, rows are appended with IDs continuing from the last ingested row. If it's new, a fresh index is created.

Dimension enforcement — On append, the dim parameter must match the stored dimension, or the request is rejected.

Vector handling has three modes: - generatedsemantic_fields are defined in the schema, so embeddings are computed automatically - precomputedvector_col is provided and parsed from the CSV - none — neither is present

Schema inference — If template_schema is not provided, one is auto-generated from the CSV column names via auto_bundle_schema().

Primary-key conflict resolution — When the schema declares a primary key, each incoming row is checked against existing rows in the namespace. Resolution is controlled by on_conflict:

  • "error" (default) — any collision aborts the request with 409 Conflict. The response detail names the PK field and lists the conflicting value(s).
  • "update" — conflicting rows are re-encoded with the incoming values; new rows are inserted normally. Note: the update path is currently N+1 (one atomic re-encode per conflicting row) — a bulk path is not yet implemented.
  • "skip" — conflicting rows are left unchanged; only new rows are inserted.

For update and skip, the response reports per-mode counts plus the list of conflicting PK values.


Responses

200 OK

{
  "status": "success",
  "mode": "Create",
  "rows_added": 150,
  "rows_inserted": 150,
  "rows_updated": 0,
  "rows_skipped": 0,
  "conflicts": [],
  "on_conflict": "error",
  "vector_source": "generated",
  "namespace": "default"
}
Field Description
status Always "success" on a successful request
mode "Create" for new DBs, "Append" for existing ones
rows_added Total rows processed (inserted + updated, excluding skipped)
rows_inserted New rows written to the namespace
rows_updated Existing rows re-encoded (only non-zero when on_conflict="update")
rows_skipped Existing rows left unchanged (only non-zero when on_conflict="skip")
conflicts Primary-key values that already existed in the namespace. Currently uncapped.
on_conflict Echoes the mode applied to this ingest
vector_source "generated", "precomputed", or "none"
namespace The namespace used for this ingest

Error Responses

Status Condition
400 Dimension mismatch on append, dim out of range [128, 8192], malformed vector_col data, or invalid on_conflict value
409 Primary-key collision with on_conflict="error". detail names the PK field and value(s).
413 Uploaded file exceeds the maximum allowed size
500 Unexpected internal server error

Notes

  • db_name and namespace are sanitized via _sanitize_identifier() — avoid special characters.
  • Row IDs are assigned sequentially and are not reset on append; they continue from the last known next_row_id.
  • Storage usage is recorded after every successful ingest via record_ingest().

Example

import requests

SERVER_URL = "http://18.220.128.24:8000"
API_KEY    = "yourapitoken"

def ingest_csv(filepath: str) -> dict:
    with open(filepath, "rb") as f:
        response = requests.post(
            f"{SERVER_URL}/build_ingest_data/",
            headers={"X-API-Key": API_KEY},
            files={
                "file": (filepath.split("/")[-1], f, "text/csv")
            },
            data={
                "dim":       512,
                "seed":      42,
                "depth":     3,
                "db_name":   "my_db",    # optional, defaults to "fractal_db"
                "namespace": "default",  # optional, defaults to "default"
            },
        )
    response.raise_for_status()
    return response.json()


result = ingest_csv(r"yourfilepath.csv")
print(result)

Expected output:

{
  "status": "success",
  "mode": "Create",
  "rows_added": 42,
  "rows_inserted": 42,
  "rows_updated": 0,
  "rows_skipped": 0,
  "conflicts": [],
  "on_conflict": "error",
  "vector_source": "none",
  "namespace": "default"
}