POST /build_ingest_data/¶
Builds a new vector database or appends data to an existing one. Requires sufficient storage quota.
Request¶
Content-Type: multipart/form-data
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
file | ✅ | — | CSV file containing rows to ingest |
dim |
int | ✅ | — | Vector dimensionality (must be 128–8192 for new DBs) |
seed |
int | ✅ | — | Random seed for the HyperBinder index |
depth |
int | ✅ | — | Index depth parameter |
db_name |
string | ❌ | "fractal_db" |
Name of the database |
namespace |
string | ❌ | "default" |
Namespace within the database |
vector_col |
string | ❌ | null |
CSV column containing precomputed vectors (JSON arrays). Excluded from metadata fields. |
dtype_config |
string | ❌ | null |
Optional dtype configuration (currently unused in routing logic) |
template_name |
string | ❌ | null |
Name of the schema template to associate |
template_schema |
string | ❌ | null |
JSON string defining schema with semantic_fields. Auto-generated from column names if omitted. |
Behavior¶
Create vs. Append — If db_name + namespace already exists in the cache, rows are appended with IDs continuing from the last ingested row. If it's new, a fresh index is created.
Dimension enforcement — On append, the dim parameter must match the stored dimension, or the request is rejected.
Vector handling has three modes:
- generated — semantic_fields are defined in the schema, so embeddings are computed automatically
- precomputed — vector_col is provided and parsed from the CSV
- none — neither is present
Schema inference — If template_schema is not provided, one is auto-generated from the CSV column names via auto_bundle_schema().
Responses¶
200 OK¶
{
"status": "success",
"mode": "Create",
"rows_added": 150,
"vector_source": "generated",
"namespace": "default"
}
| Field | Description |
|---|---|
status |
Always "success" on a successful request |
mode |
"Create" for new DBs, "Append" for existing ones |
rows_added |
Number of rows ingested from the uploaded CSV |
vector_source |
"generated", "precomputed", or "none" |
namespace |
The namespace used for this ingest |
Error Responses¶
| Status | Condition |
|---|---|
400 |
Dimension mismatch on append, dim out of range [128, 8192], or malformed vector_col data |
413 |
Uploaded file exceeds the maximum allowed size |
500 |
Unexpected internal server error |
Notes¶
db_nameandnamespaceare sanitized via_sanitize_identifier()— avoid special characters.- Row IDs are assigned sequentially and are not reset on append; they continue from the last known
next_row_id. - Storage usage is recorded after every successful ingest via
record_ingest().
Example¶
import requests
SERVER_URL = "http://hbserver:8000"
API_KEY = "yourapitoken"
def ingest_csv(filepath: str) -> dict:
with open(filepath, "rb") as f:
response = requests.post(
f"{SERVER_URL}/build_ingest_data/",
headers={"X-API-Key": API_KEY},
files={
"file": (filepath.split("/")[-1], f, "text/csv")
},
data={
"dim": 512,
"seed": 42,
"depth": 3,
"db_name": "my_db", # optional, defaults to "fractal_db"
"namespace": "default", # optional, defaults to "default"
},
)
response.raise_for_status()
return response.json()
result = ingest_csv(r"yourfilepath.csv")
print(result)
Expected output: