POST /upload_document/¶
Uploads and indexes a document (PDF, TXT, DOCX, or JSON) into the vector database. The file is automatically parsed, chunked, and embedded. Requires sufficient storage quota.
Request¶
Content-Type: multipart/form-data
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
file | ✅ | — | Document to upload. Supported types: .pdf, .txt, .docx, .json |
dim |
int | ❌ | 512 |
Vector dimensionality |
seed |
int | ❌ | 42 |
Random seed for the HyperBinder index |
depth |
int | ❌ | 3 |
Index depth parameter |
vector_col |
string | ❌ | null |
Reserved for precomputed vector passthrough (JSON documents only) |
Behavior¶
File parsing — The file is routed to the appropriate processor based on its type:
- PDF / TXT / DOCX — Full text is extracted, then split into overlapping chunks (size: 2500 chars, overlap: 200 chars). Each chunk is labeled with a role, parent, and chunk ID via TextLabeler.
- JSON — Cells are extracted directly via JSONProcessor, preserving structure.
Namespace isolation — Each upload gets its own auto-generated namespace in the format document_upload_{8-char UUID}, so uploads never overwrite each other. All uploads share the fractal_db database.
Vector handling has two modes:
- generated — Embeddings are computed server-side using the app's sentence model
- precomputed — If cells carry an embedding field (JSON documents), those vectors are used directly and auto-normalized for cosine similarity
Schema — The schema is fixed for all document uploads with the fields: value, chunk_id, parent, role.
Responses¶
200 OK¶
{
"status": "success",
"namespace": "document_upload_3f9a1c2e",
"total_cells": 87,
"vector_source": "generated"
}
| Field | Description |
|---|---|
status |
Always "success" on a successful request |
namespace |
Auto-generated namespace assigned to this upload — save this to query the document later |
total_cells |
Number of chunks/cells indexed |
vector_source |
"generated" or "precomputed" |
Error Responses¶
| Status | Condition |
|---|---|
400 |
No file provided, or unsupported file type |
413 |
Uploaded file exceeds the maximum allowed size |
500 |
Unexpected internal server error |
Notes¶
- Save the returned
namespace— it is the only way to reference this document in subsequent query calls. - Unlike
/build_ingest_data/, the namespace is always auto-generated and cannot be set manually. - All document uploads share
db_name = "fractal_db"internally. - Storage usage is recorded after every successful upload via
record_ingest().
Example¶
import requests
SERVER_URL = "http://hbserver:8000"
API_KEY = "yourapitoken"
def upload_document(filepath: str) -> dict:
with open(filepath, "rb") as f:
response = requests.post(
f"{SERVER_URL}/upload_document/",
headers={"X-API-Key": API_KEY},
files={
"file": (filepath.split("/")[-1], f)
},
data={
"dim": 512, # optional, defaults to 512
"seed": 42, # optional, defaults to 42
"depth": 3, # optional, defaults to 3
},
)
response.raise_for_status()
return response.json()
result = upload_document(r"yourpdf.pdf")
print(result)
Expected output: