Skip to content

POST /upload_document/

Uploads and indexes a document (PDF, TXT, DOCX, or JSON) into the vector database. The file is automatically parsed, chunked, and embedded. Requires sufficient storage quota.


Request

Content-Type: multipart/form-data

Parameter Type Required Default Description
file file Document to upload. Supported types: .pdf, .txt, .docx, .json
dim int 512 Vector dimensionality
seed int 42 Random seed for the HyperBinder index
depth int 3 Index depth parameter
vector_col string null Reserved for precomputed vector passthrough (JSON documents only)

Behavior

File parsing — The file is routed to the appropriate processor based on its type: - PDF / TXT / DOCX — Full text is extracted, then split into overlapping chunks (size: 2500 chars, overlap: 200 chars). Each chunk is labeled with a role, parent, and chunk ID via TextLabeler. - JSON — Cells are extracted directly via JSONProcessor, preserving structure.

Namespace isolation — Each upload gets its own auto-generated namespace in the format document_upload_{8-char UUID}, so uploads never overwrite each other. All uploads share the fractal_db database.

Vector handling has two modes: - generated — Embeddings are computed server-side using the app's sentence model - precomputed — If cells carry an embedding field (JSON documents), those vectors are used directly and auto-normalized for cosine similarity

Schema — The schema is fixed for all document uploads with the fields: value, chunk_id, parent, role.


Responses

200 OK

{
  "status": "success",
  "namespace": "document_upload_3f9a1c2e",
  "total_cells": 87,
  "vector_source": "generated"
}
Field Description
status Always "success" on a successful request
namespace Auto-generated namespace assigned to this upload — save this to query the document later
total_cells Number of chunks/cells indexed
vector_source "generated" or "precomputed"

Error Responses

Status Condition
400 No file provided, or unsupported file type
413 Uploaded file exceeds the maximum allowed size
500 Unexpected internal server error

Notes

  • Save the returned namespace — it is the only way to reference this document in subsequent query calls.
  • Unlike /build_ingest_data/, the namespace is always auto-generated and cannot be set manually.
  • All document uploads share db_name = "fractal_db" internally.
  • Storage usage is recorded after every successful upload via record_ingest().

Example

import requests

SERVER_URL = "http://hbserver:8000"
API_KEY    = "yourapitoken"

def upload_document(filepath: str) -> dict:
    with open(filepath, "rb") as f:
        response = requests.post(
            f"{SERVER_URL}/upload_document/",
            headers={"X-API-Key": API_KEY},
            files={
                "file": (filepath.split("/")[-1], f)
            },
            data={
                "dim":   512,  # optional, defaults to 512
                "seed":  42,   # optional, defaults to 42
                "depth": 3,    # optional, defaults to 3
            },
        )
    response.raise_for_status()
    return response.json()


result = upload_document(r"yourpdf.pdf")
print(result)

Expected output:

{
  "status": "success",
  "namespace": "document_upload_3f9a1c2e",
  "total_cells": 87,
  "vector_source": "generated"
}