`POST /build_ingest_data/`¶

Builds a new vector database or appends data to an existing one. Requires sufficient storage quota.

Request¶

Content-Type: multipart/form-data

Parameter	Type	Required	Default	Description
`file`	file	✅	—	CSV file containing rows to ingest
`dim`	int	✅	—	Vector dimensionality (must be 128–8192 for new DBs)
`seed`	int	✅	—	Random seed for the HyperBinder index
`depth`	int	✅	—	Index depth parameter
`db_name`	string	❌	`"fractal_db"`	Name of the database
`namespace`	string	❌	`"default"`	Namespace within the database
`vector_col`	string	❌	`null`	CSV column containing precomputed vectors (JSON arrays). Excluded from metadata fields.
`dtype_config`	string	❌	`null`	Optional dtype configuration (currently unused in routing logic)
`template_name`	string	❌	`null`	Name of the schema template to associate
`template_schema`	string	❌	`null`	JSON string defining schema with `semantic_fields`. Auto-generated from column names if omitted.
`on_conflict`	string	❌	`"error"`	Behavior when an incoming row's primary key already exists: `"error"` (409), `"update"` (re-encode existing row), or `"skip"` (keep existing, drop incoming).

Behavior¶

Create vs. Append — If db_name + namespace already exists in the cache, rows are appended with IDs continuing from the last ingested row. If it's new, a fresh index is created.

Dimension enforcement — On append, the dim parameter must match the stored dimension, or the request is rejected.

Vector handling has three modes: - generated — semantic_fields are defined in the schema, so embeddings are computed automatically - precomputed — vector_col is provided and parsed from the CSV - none — neither is present

Schema inference — If template_schema is not provided, one is auto-generated from the CSV column names via auto_bundle_schema().

Primary-key conflict resolution — When the schema declares a primary key, each incoming row is checked against existing rows in the namespace. Resolution is controlled by on_conflict:

"error" (default) — any collision aborts the request with 409 Conflict. The response detail names the PK field and lists the conflicting value(s).
"update" — conflicting rows are re-encoded with the incoming values; new rows are inserted normally. Note: the update path is currently N+1 (one atomic re-encode per conflicting row) — a bulk path is not yet implemented.
"skip" — conflicting rows are left unchanged; only new rows are inserted.

For update and skip, the response reports per-mode counts plus the list of conflicting PK values.

Responses¶

200 OK¶

{
  "status": "success",
  "mode": "Create",
  "rows_added": 150,
  "rows_inserted": 150,
  "rows_updated": 0,
  "rows_skipped": 0,
  "conflicts": [],
  "on_conflict": "error",
  "vector_source": "generated",
  "namespace": "default"
}

Field	Description
`status`	Always `"success"` on a successful request
`mode`	`"Create"` for new DBs, `"Append"` for existing ones
`rows_added`	Total rows processed (inserted + updated, excluding skipped)
`rows_inserted`	New rows written to the namespace
`rows_updated`	Existing rows re-encoded (only non-zero when `on_conflict="update"`)
`rows_skipped`	Existing rows left unchanged (only non-zero when `on_conflict="skip"`)
`conflicts`	Primary-key values that already existed in the namespace. Currently uncapped.
`on_conflict`	Echoes the mode applied to this ingest
`vector_source`	`"generated"`, `"precomputed"`, or `"none"`
`namespace`	The namespace used for this ingest

Error Responses¶

Status	Condition
`400`	Dimension mismatch on append, `dim` out of range [128, 8192], malformed `vector_col` data, or invalid `on_conflict` value
`409`	Primary-key collision with `on_conflict="error"`. `detail` names the PK field and value(s).
`413`	Uploaded file exceeds the maximum allowed size
`500`	Unexpected internal server error

Notes¶

db_name and namespace are sanitized via _sanitize_identifier() — avoid special characters.
Row IDs are assigned sequentially and are not reset on append; they continue from the last known next_row_id.
Storage usage is recorded after every successful ingest via record_ingest().

Example¶

import requests

SERVER_URL = "http://18.220.128.24:8000"
API_KEY    = "yourapitoken"

def ingest_csv(filepath: str) -> dict:
    with open(filepath, "rb") as f:
        response = requests.post(
            f"{SERVER_URL}/build_ingest_data/",
            headers={"X-API-Key": API_KEY},
            files={
                "file": (filepath.split("/")[-1], f, "text/csv")
            },
            data={
                "dim":       512,
                "seed":      42,
                "depth":     3,
                "db_name":   "my_db",    # optional, defaults to "fractal_db"
                "namespace": "default",  # optional, defaults to "default"
            },
        )
    response.raise_for_status()
    return response.json()


result = ingest_csv(r"yourfilepath.csv")
print(result)

Expected output:

{
  "status": "success",
  "mode": "Create",
  "rows_added": 42,
  "rows_inserted": 42,
  "rows_updated": 0,
  "rows_skipped": 0,
  "conflicts": [],
  "on_conflict": "error",
  "vector_source": "none",
  "namespace": "default"
}

POST /build_ingest_data/¶