Skip to content

POST /query_rag_symbolic/

Retrieves the most relevant chunks from an ingested document namespace using a hybrid semantic + symbolic search pipeline. Results are ranked and returned directly — no LLM inference is performed.


Request

Content-Type: application/json

Parameter Type Required Default Description
query string Natural language query string
db_name string "fractal_db" Name of the database to query
namespace string "pdf_upload" Namespace to query within the database
role string "value" Filter results to a specific cell role (e.g. "value", "header", "all")
top_k int 10 Number of top results to return

Behavior

Hybrid retrieval — Candidates are fetched using a combination of semantic similarity (sentence embeddings) and keyword matching via hybrid_search_matrix(). The initial retrieval pool is top_k * 5 to allow for downstream filtering and re-ranking.

Symbolic encoding — The role field is encoded symbolically and used to bind results to the requested cell role during both retrieval and scoring.

Role filtering — After retrieval, results whose role field does not match the requested role are discarded (unless role is "all").

Re-ranking — Surviving candidates are re-scored using a weighted average: - 60% hybrid score (semantic + keyword) - 40% symbolic score (structural similarity)

Results are then sorted by final score and trimmed to top_k.


Responses

200 OK

{
  "query": "what is the refund policy?",
  "role_restriction": "value",
  "namespace": "document_upload_3f9a1c2e",
  "top_k": 10,
  "chunks_retrieved": 4,
  "results": [
    {
      "row_id": 17,
      "score": 0.812,
      "hybrid_score": 0.874,
      "symbolic_score": 0.721,
      "role": "value",
      "value": "Refunds are processed within 5–7 business days...",
      "parent": "chunk_3"
    }
  ]
}

results array — each item contains:

Field Description
row_id Internal row identifier
score Final weighted score (0.6 × hybrid + 0.4 × symbolic)
hybrid_score Raw hybrid retrieval score
symbolic_score Symbolic vector similarity score
role Cell role label (e.g. "value", "header")
value Text content of the chunk
parent Parent chunk identifier

Error Responses

Status Condition
500 Symbolic encoding failed, namespace not found, or unexpected internal error

Notes

  • The namespace must already be ingested via /upload_document/ or /build_ingest_data/ before querying.
  • Setting role to "all" disables role filtering and returns results across all cell types.
  • No GPT or LLM inference is performed — this endpoint returns raw ranked chunks only.

Example

import requests

SERVER_URL = "http://hbserver:8000"
API_KEY    = "yourapitoken"

def query_rag_symbolic(query: str, namespace: str) -> dict:
    response = requests.post(
        f"{SERVER_URL}/unstructured/query_rag_symbolic/",
        headers={"X-API-Key": API_KEY},
        json={
            "query":     query,
            "db_name":   "fractal_db",
            "namespace": namespace,
            "role":      "value",   # optional, defaults to "value"
            "top_k":     10,        # optional, defaults to 10
        },
    )
    response.raise_for_status()
    return response.json()


result = query_rag_symbolic(
    query="what is the refund policy?",
    namespace="document_upload_3f9a1c2e",
)
print(result)

Expected output:

{
  "query": "what is the refund policy?",
  "role_restriction": "value",
  "namespace": "document_upload_3f9a1c2e",
  "top_k": 10,
  "chunks_retrieved": 4,
  "results": [
    {
      "row_id": 17,
      "score": 0.812,
      "hybrid_score": 0.874,
      "symbolic_score": 0.721,
      "role": "value",
      "value": "Refunds are processed within 5–7 business days...",
      "parent": "chunk_3"
    }
  ]
}