Search
The Ragex search pipeline converts your natural language query into relevant document chunks. Three search modes are available: vector (default), keyword, and hybrid.
Search Modes
Section titled “Search Modes”Vector Search (default)
Section titled “Vector Search (default)”Uses semantic embeddings to find chunks that are conceptually similar to your query, even if the exact words differ. Best for natural language questions and conceptual queries.
{ "query": "How do I configure authentication?", "mode": "vector"}Keyword Search
Section titled “Keyword Search”Uses full-text search to find chunks containing specific terms. No embedding is generated, making it faster and cheaper. Best for exact term matching, error codes, or proper nouns.
{ "query": "authentication", "mode": "keyword"}Hybrid Search
Section titled “Hybrid Search”Runs both vector and keyword search in parallel, then fuses the results using Reciprocal Rank Fusion (RRF). Combines the precision of keyword matching with the recall of semantic search.
{ "query": "How do I configure OAuth?", "mode": "hybrid", "alpha": 0.6}The alpha parameter controls the weighting between vector and keyword results:
alpha: 1.0— vector only (same asmode: "vector")alpha: 0.0— keyword only (same asmode: "keyword")alpha: 0.6(default) — slightly favors semantic search
You can also provide a separate keyword query to use different terms for each search component:
{ "query": "How do I configure single sign-on?", "keyword": "SSO SAML OAuth", "mode": "hybrid"}Pipeline Overview (Vector Mode)
Section titled “Pipeline Overview (Vector Mode)”Query → Embed → Vector Search (ANN) → Rerank → Filter → Results1. Embed the Query
Section titled “1. Embed the Query”Your query text is converted into a vector using the same embedding model used during document ingestion. This ensures query vectors and chunk vectors live in the same semantic space.
Caching: Repeated or identical queries are served from cache, reducing latency.
2. Vector Search (ANN)
Section titled “2. Vector Search (ANN)”The query vector is compared against all chunk vectors in the knowledge base using approximate nearest neighbor (ANN) search. This returns the top candidates ranked by cosine similarity.
More candidates than your requested top_k are retrieved to give the reranker a larger pool to score from.
3. Rerank
Section titled “3. Rerank”A cross-encoder reranker re-scores the candidates by examining the full query-chunk text pairs. Cross-encoders are more accurate than embedding similarity alone because they see both texts together.
Reranking is enabled by default (rerank: true). You can disable it for lower latency if your use case doesn’t need the accuracy boost.
4. Filter and Return
Section titled “4. Filter and Return”Results are filtered by:
- Metadata filters — match on document metadata fields using operators like
$eq,$gt,$in, etc. - Score threshold — drop results below a minimum relevance score
The final top_k results are returned with scores, chunk text, and metadata.
Graceful Degradation
Section titled “Graceful Degradation”The API is designed to return results even when components fail:
| Failure | Behavior |
|---|---|
| Reranker times out | Returns un-reranked results. Response includes rerank_applied: false. |
| Reranker errors | Same as timeout — un-reranked results returned. |
| Embedding fails | Returns 502 EXTERNAL_SERVICE_ERROR. |
Check the usage.rerank_applied field in the response to know whether reranking was applied.
Search Parameters
Section titled “Search Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | (required) | Natural language search query (1-2000 chars) |
top_k | integer | 10 | Number of results to return (1-50) |
rerank | boolean | true | Apply cross-encoder reranking |
mode | string | "vector" | Search mode: vector, keyword, or hybrid |
keyword | string | none | Separate keyword query for hybrid mode (defaults to query if omitted) |
alpha | float | 0.6 | Vector/keyword weighting in hybrid mode (0 = keyword only, 1 = vector only) |
filter | object | none | Metadata filter (see Search & Filtering) |
score_threshold | float | 0.0 | Minimum relevance score (0-1) |
include_metadata | boolean | true | Include chunk and document metadata |
Performance Characteristics
Section titled “Performance Characteristics”| Metric | Typical Value |
|---|---|
| Cache hit (repeated query) | ~50-150ms |
| Full pipeline (embed + ANN + rerank) | ~300-800ms |
| Without reranking | ~150-400ms |
| Keyword search | ~50-200ms |
| Hybrid search | ~300-800ms |
Latency depends on the number of chunks in the knowledge base and whether the query embedding is cached.
Search Results Cache
Section titled “Search Results Cache”Search results are cached for performance. The cache is automatically invalidated when documents are added to or removed from the knowledge base.