POST /query_rag_symbolic/¶
Retrieves the most relevant chunks from an ingested document namespace using a hybrid semantic + symbolic search pipeline. Results are ranked and returned directly — no LLM inference is performed.
Request¶
Content-Type: application/json
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | ✅ | — | Natural language query string |
db_name |
string | ❌ | "fractal_db" |
Name of the database to query |
namespace |
string | ❌ | "pdf_upload" |
Namespace to query within the database |
role |
string | ❌ | "value" |
Filter results to a specific cell role (e.g. "value", "header", "all") |
top_k |
int | ❌ | 10 |
Number of top results to return |
Behavior¶
Hybrid retrieval — Candidates are fetched using a combination of semantic similarity (sentence embeddings) and keyword matching via hybrid_search_matrix(). The initial retrieval pool is top_k * 5 to allow for downstream filtering and re-ranking.
Symbolic encoding — The role field is encoded symbolically and used to bind results to the requested cell role during both retrieval and scoring.
Role filtering — After retrieval, results whose role field does not match the requested role are discarded (unless role is "all").
Re-ranking — Surviving candidates are re-scored using a weighted average:
- 60% hybrid score (semantic + keyword)
- 40% symbolic score (structural similarity)
Results are then sorted by final score and trimmed to top_k.
Responses¶
200 OK¶
{
"query": "what is the refund policy?",
"role_restriction": "value",
"namespace": "document_upload_3f9a1c2e",
"top_k": 10,
"chunks_retrieved": 4,
"results": [
{
"row_id": 17,
"score": 0.812,
"hybrid_score": 0.874,
"symbolic_score": 0.721,
"role": "value",
"value": "Refunds are processed within 5–7 business days...",
"parent": "chunk_3"
}
]
}
results array — each item contains:
| Field | Description |
|---|---|
row_id |
Internal row identifier |
score |
Final weighted score (0.6 × hybrid + 0.4 × symbolic) |
hybrid_score |
Raw hybrid retrieval score |
symbolic_score |
Symbolic vector similarity score |
role |
Cell role label (e.g. "value", "header") |
value |
Text content of the chunk |
parent |
Parent chunk identifier |
Error Responses¶
| Status | Condition |
|---|---|
500 |
Symbolic encoding failed, namespace not found, or unexpected internal error |
Notes¶
- The namespace must already be ingested via
/upload_document/or/build_ingest_data/before querying. - Setting
roleto"all"disables role filtering and returns results across all cell types. - No GPT or LLM inference is performed — this endpoint returns raw ranked chunks only.
Example¶
import requests
SERVER_URL = "http://hbserver:8000"
API_KEY = "yourapitoken"
def query_rag_symbolic(query: str, namespace: str) -> dict:
response = requests.post(
f"{SERVER_URL}/unstructured/query_rag_symbolic/",
headers={"X-API-Key": API_KEY},
json={
"query": query,
"db_name": "fractal_db",
"namespace": namespace,
"role": "value", # optional, defaults to "value"
"top_k": 10, # optional, defaults to 10
},
)
response.raise_for_status()
return response.json()
result = query_rag_symbolic(
query="what is the refund policy?",
namespace="document_upload_3f9a1c2e",
)
print(result)
Expected output:
{
"query": "what is the refund policy?",
"role_restriction": "value",
"namespace": "document_upload_3f9a1c2e",
"top_k": 10,
"chunks_retrieved": 4,
"results": [
{
"row_id": 17,
"score": 0.812,
"hybrid_score": 0.874,
"symbolic_score": 0.721,
"role": "value",
"value": "Refunds are processed within 5–7 business days...",
"parent": "chunk_3"
}
]
}