POST /build_ingest_data/¶
Builds a new vector database or appends data to an existing one. Requires sufficient storage quota.
Request¶
Content-Type: multipart/form-data
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
file | ✅ | — | CSV file containing rows to ingest |
dim |
int | ✅ | — | Vector dimensionality (must be 128–8192 for new DBs) |
seed |
int | ✅ | — | Random seed for the HyperBinder index |
depth |
int | ✅ | — | Index depth parameter |
db_name |
string | ❌ | "fractal_db" |
Name of the database |
namespace |
string | ❌ | "default" |
Namespace within the database |
vector_col |
string | ❌ | null |
CSV column containing precomputed vectors (JSON arrays). Excluded from metadata fields. |
dtype_config |
string | ❌ | null |
Optional dtype configuration (currently unused in routing logic) |
template_name |
string | ❌ | null |
Name of the schema template to associate |
template_schema |
string | ❌ | null |
JSON string defining schema with semantic_fields. Auto-generated from column names if omitted. |
on_conflict |
string | ❌ | "error" |
Behavior when an incoming row's primary key already exists: "error" (409), "update" (re-encode existing row), or "skip" (keep existing, drop incoming). |
Behavior¶
Create vs. Append — If db_name + namespace already exists in the cache, rows are appended with IDs continuing from the last ingested row. If it's new, a fresh index is created.
Dimension enforcement — On append, the dim parameter must match the stored dimension, or the request is rejected.
Vector handling has three modes:
- generated — semantic_fields are defined in the schema, so embeddings are computed automatically
- precomputed — vector_col is provided and parsed from the CSV
- none — neither is present
Schema inference — If template_schema is not provided, one is auto-generated from the CSV column names via auto_bundle_schema().
Primary-key conflict resolution — When the schema declares a primary key, each incoming row is checked against existing rows in the namespace. Resolution is controlled by on_conflict:
"error"(default) — any collision aborts the request with 409 Conflict. The responsedetailnames the PK field and lists the conflicting value(s)."update"— conflicting rows are re-encoded with the incoming values; new rows are inserted normally. Note: the update path is currently N+1 (one atomic re-encode per conflicting row) — a bulk path is not yet implemented."skip"— conflicting rows are left unchanged; only new rows are inserted.
For update and skip, the response reports per-mode counts plus the list of conflicting PK values.
Responses¶
200 OK¶
{
"status": "success",
"mode": "Create",
"rows_added": 150,
"rows_inserted": 150,
"rows_updated": 0,
"rows_skipped": 0,
"conflicts": [],
"on_conflict": "error",
"vector_source": "generated",
"namespace": "default"
}
| Field | Description |
|---|---|
status |
Always "success" on a successful request |
mode |
"Create" for new DBs, "Append" for existing ones |
rows_added |
Total rows processed (inserted + updated, excluding skipped) |
rows_inserted |
New rows written to the namespace |
rows_updated |
Existing rows re-encoded (only non-zero when on_conflict="update") |
rows_skipped |
Existing rows left unchanged (only non-zero when on_conflict="skip") |
conflicts |
Primary-key values that already existed in the namespace. Currently uncapped. |
on_conflict |
Echoes the mode applied to this ingest |
vector_source |
"generated", "precomputed", or "none" |
namespace |
The namespace used for this ingest |
Error Responses¶
| Status | Condition |
|---|---|
400 |
Dimension mismatch on append, dim out of range [128, 8192], malformed vector_col data, or invalid on_conflict value |
409 |
Primary-key collision with on_conflict="error". detail names the PK field and value(s). |
413 |
Uploaded file exceeds the maximum allowed size |
500 |
Unexpected internal server error |
Notes¶
db_nameandnamespaceare sanitized via_sanitize_identifier()— avoid special characters.- Row IDs are assigned sequentially and are not reset on append; they continue from the last known
next_row_id. - Storage usage is recorded after every successful ingest via
record_ingest().
Example¶
import requests
SERVER_URL = "http://18.220.128.24:8000"
API_KEY = "yourapitoken"
def ingest_csv(filepath: str) -> dict:
with open(filepath, "rb") as f:
response = requests.post(
f"{SERVER_URL}/build_ingest_data/",
headers={"X-API-Key": API_KEY},
files={
"file": (filepath.split("/")[-1], f, "text/csv")
},
data={
"dim": 512,
"seed": 42,
"depth": 3,
"db_name": "my_db", # optional, defaults to "fractal_db"
"namespace": "default", # optional, defaults to "default"
},
)
response.raise_for_status()
return response.json()
result = ingest_csv(r"yourfilepath.csv")
print(result)
Expected output: