`POST /upload_document/`¶

Uploads and indexes a document (PDF, TXT, DOCX, or JSON) into the vector database. The file is automatically parsed, chunked, and embedded. Requires sufficient storage quota.

Request¶

Content-Type: multipart/form-data

Parameter	Type	Required	Default	Description
`file`	file	✅	—	Document to upload. Supported types: `.pdf`, `.txt`, `.docx`, `.json`
`dim`	int	❌	`512`	Vector dimensionality
`seed`	int	❌	`42`	Random seed for the HyperBinder index
`depth`	int	❌	`3`	Index depth parameter
`vector_col`	string	❌	`null`	Reserved for precomputed vector passthrough (JSON documents only)

Behavior¶

File parsing — The file is routed to the appropriate processor based on its type: - PDF / TXT / DOCX — Full text is extracted, then split into overlapping chunks (size: 2500 chars, overlap: 200 chars). Each chunk is labeled with a role, parent, and chunk ID via TextLabeler. - JSON — Cells are extracted directly via JSONProcessor, preserving structure.

Namespace isolation — Each upload gets its own auto-generated namespace in the format document_upload_{8-char UUID}, so uploads never overwrite each other. All uploads share the fractal_db database.

Vector handling has two modes: - generated — Embeddings are computed server-side using the app's sentence model - precomputed — If cells carry an embedding field (JSON documents), those vectors are used directly and auto-normalized for cosine similarity

Schema — The schema is fixed for all document uploads with the fields: value, chunk_id, parent, role.

Responses¶

200 OK¶

{
  "status": "success",
  "namespace": "document_upload_3f9a1c2e",
  "total_cells": 87,
  "vector_source": "generated"
}

Field	Description
`status`	Always `"success"` on a successful request
`namespace`	Auto-generated namespace assigned to this upload — save this to query the document later
`total_cells`	Number of chunks/cells indexed
`vector_source`	`"generated"` or `"precomputed"`

Error Responses¶

Status	Condition
`400`	No file provided, or unsupported file type
`413`	Uploaded file exceeds the maximum allowed size
`500`	Unexpected internal server error

Notes¶

Save the returned namespace — it is the only way to reference this document in subsequent query calls.
Unlike /build_ingest_data/, the namespace is always auto-generated and cannot be set manually.
All document uploads share db_name = "fractal_db" internally.
Storage usage is recorded after every successful upload via record_ingest().

Example¶

import requests

SERVER_URL = "http://hbserver:8000"
API_KEY    = "yourapitoken"

def upload_document(filepath: str) -> dict:
    with open(filepath, "rb") as f:
        response = requests.post(
            f"{SERVER_URL}/upload_document/",
            headers={"X-API-Key": API_KEY},
            files={
                "file": (filepath.split("/")[-1], f)
            },
            data={
                "dim":   512,  # optional, defaults to 512
                "seed":  42,   # optional, defaults to 42
                "depth": 3,    # optional, defaults to 3
            },
        )
    response.raise_for_status()
    return response.json()


result = upload_document(r"yourpdf.pdf")
print(result)

Expected output:

{
  "status": "success",
  "namespace": "document_upload_3f9a1c2e",
  "total_cells": 87,
  "vector_source": "generated"
}

POST /upload_document/¶