`POST /query_rag_symbolic/`¶

Retrieves the most relevant chunks from an ingested document namespace using a hybrid semantic + symbolic search pipeline. Results are ranked and returned directly — no LLM inference is performed.

Request¶

Content-Type: application/json

Parameter	Type	Required	Default	Description
`query`	string	✅	—	Natural language query string
`db_name`	string	❌	`"fractal_db"`	Name of the database to query
`namespace`	string	❌	`"pdf_upload"`	Namespace to query within the database
`role`	string	❌	`"value"`	Filter results to a specific cell role (e.g. `"value"`, `"header"`, `"all"`)
`top_k`	int	❌	`10`	Number of top results to return

Behavior¶

Hybrid retrieval — Candidates are fetched using a combination of semantic similarity (sentence embeddings) and keyword matching via hybrid_search_matrix(). The initial retrieval pool is top_k * 5 to allow for downstream filtering and re-ranking.

Symbolic encoding — The role field is encoded symbolically and used to bind results to the requested cell role during both retrieval and scoring.

Role filtering — After retrieval, results whose role field does not match the requested role are discarded (unless role is "all").

Re-ranking — Surviving candidates are re-scored using a weighted average: - 60% hybrid score (semantic + keyword) - 40% symbolic score (structural similarity)

Results are then sorted by final score and trimmed to top_k.

Responses¶

200 OK¶

{
  "query": "what is the refund policy?",
  "role_restriction": "value",
  "namespace": "document_upload_3f9a1c2e",
  "top_k": 10,
  "chunks_retrieved": 4,
  "results": [
    {
      "row_id": 17,
      "score": 0.812,
      "hybrid_score": 0.874,
      "symbolic_score": 0.721,
      "role": "value",
      "value": "Refunds are processed within 5–7 business days...",
      "parent": "chunk_3"
    }
  ]
}

results array — each item contains:

Field	Description
`row_id`	Internal row identifier
`score`	Final weighted score (0.6 × hybrid + 0.4 × symbolic)
`hybrid_score`	Raw hybrid retrieval score
`symbolic_score`	Symbolic vector similarity score
`role`	Cell role label (e.g. `"value"`, `"header"`)
`value`	Text content of the chunk
`parent`	Parent chunk identifier

Error Responses¶

Status	Condition
`500`	Symbolic encoding failed, namespace not found, or unexpected internal error

Notes¶

The namespace must already be ingested via /upload_document/ or /build_ingest_data/ before querying.
Setting role to "all" disables role filtering and returns results across all cell types.
No GPT or LLM inference is performed — this endpoint returns raw ranked chunks only.

Example¶

import requests

SERVER_URL = "http://hbserver:8000"
API_KEY    = "yourapitoken"

def query_rag_symbolic(query: str, namespace: str) -> dict:
    response = requests.post(
        f"{SERVER_URL}/unstructured/query_rag_symbolic/",
        headers={"X-API-Key": API_KEY},
        json={
            "query":     query,
            "db_name":   "fractal_db",
            "namespace": namespace,
            "role":      "value",   # optional, defaults to "value"
            "top_k":     10,        # optional, defaults to 10
        },
    )
    response.raise_for_status()
    return response.json()


result = query_rag_symbolic(
    query="what is the refund policy?",
    namespace="document_upload_3f9a1c2e",
)
print(result)

Expected output:

{
  "query": "what is the refund policy?",
  "role_restriction": "value",
  "namespace": "document_upload_3f9a1c2e",
  "top_k": 10,
  "chunks_retrieved": 4,
  "results": [
    {
      "row_id": 17,
      "score": 0.812,
      "hybrid_score": 0.874,
      "symbolic_score": 0.721,
      "role": "value",
      "value": "Refunds are processed within 5–7 business days...",
      "parent": "chunk_3"
    }
  ]
}

POST /query_rag_symbolic/¶