ChatRaj
Retrieval & search

Hybrid search

Hybrid search runs a keyword retriever (usually BM25) and a dense vector retriever in parallel against the same corpus, then fuses the two ranked lists into one.

Bottom line
Hybrid search runs a keyword retriever (usually BM25) and a dense vector retriever in parallel against the same corpus, then fuses the two ranked lists into one. It is the production default for modern RAG because BM25 and dense embeddings have complementary blind spots and the fused result beats either arm alone on the BEIR benchmark.
Reviewed by ··5 min read
Jump to section

What hybrid search actually is

Hybrid search is a retrieval pattern, not a single algorithm. You run two retrievers over the same corpus, in parallel, against the same user query, and then merge their ranked results into one list before passing the top candidates to whatever consumes them (an LLM, a reranker, a UI).

The two retrievers are almost always:

  1. A lexical, term-frequency engine, typically BM25. This arm scores documents by how often the query's literal tokens appear, weighted by inverse document frequency and length-normalized.
  2. A semantic, embedding-based engine, typically dense retrieval over a vector index. This arm encodes the query into a high-dimensional vector and finds the documents whose embedding vectors are closest by cosine similarity.

The fusion step combines the two ranked lists. The dominant method in 2026 is Reciprocal Rank Fusion (RRF), which scores each document by the sum of 1 / (k + rank) across the two result sets, with k typically set to 60. Some systems use a weighted score sum instead, normalizing BM25 and cosine scores onto a common scale and blending them with an alpha parameter.

Either way, the output is one ranked list of the top N candidates. That list goes downstream.

How hybrid search works (the fusion step)

The interesting engineering happens in the fusion step, because BM25 scores and dense similarity scores live in different unit systems and cannot be added directly. There are two common approaches.

Rank-based fusion (RRF). Ignore the raw scores. Look only at the rank position each document holds in each list. A document that appears at rank 3 in BM25 and rank 7 in dense gets 1/(60+3) + 1/(60+7). A document that appears in only one list still gets credit, just less. RRF requires no tuning and works out of the box. Elasticsearch 8.8+ ships RRF as a native query option for exactly this reason: hybrid scoring with no parameter tuning needed.

Score-based fusion (alpha blending). Normalize each retriever's scores to [0, 1], then take a weighted average. Weaviate exposes this as an alpha parameter where alpha=1.0 is pure vector, alpha=0.0 is pure BM25, and the default is 0.75. This is more tunable but more fragile, because score distributions shift as your corpus grows.

In Postgres land, neither is built in. You run a full-text search query and a pgvector similarity query, then do RRF in application code.

Why hybrid search matters for AI chatbots

The complementarity argument is the whole pitch. BM25 and dense retrieval fail on different inputs:

  • BM25 nails exact tokens: SKUs like MX-7841-B, names like useEffect, version numbers, error codes, acronyms the embedding model never saw during pretraining. It fails on paraphrase. A query asking "how do I cancel my plan" will miss a doc that says "ending your subscription."
  • Dense retrieval nails paraphrase: it understands that "cancel" and "end subscription" mean the same thing. It fails on exact tokens, especially out-of-vocabulary identifiers, because the embedding has nothing meaningful to encode.

In a chatbot context this matters constantly. A user might ask "what's the price of the SX-200" (BM25 wins, dense will fuzzy-match unrelated products), and the next user asks "what does that thing cost" referring to a recent message (dense wins, BM25 has no tokens to grip).

The Thakur et al. 2021 BEIR paper is the cleanest evidence that hybrid wins in aggregate. Across 18 heterogeneous IR datasets covering fact-checking, question answering, biomedical, and scientific retrieval, hybrid models pushed nDCG@10 from BM25's 43.42 up to 52.59. Hybrid was the strongest non-reranking approach on the majority of datasets, and the result has held up across follow-up work.

That is why hybrid is the production default for Retrieval-Augmented Generation in 2026. Elasticsearch, Weaviate, Vespa, Qdrant, OpenSearch, and Pinecone all support hybrid out of the box. OpenAI's Assistants v2 file-search tool uses hybrid retrieval under the hood. The conversation has shifted from "should we use hybrid" to "which fusion method and where do we put the reranking pass."

ChatRaj's retrieval is hybrid by default: BM25 + dense vectors, fused with RRF.

When hybrid is overkill vs essential

Hybrid is not free. It doubles the index footprint, adds latency, and complicates the ops story. So when does it actually pay off?

Overkill territory. A tiny corpus of a few hundred documents in one language, all FAQ-shaped, where the user vocabulary matches the doc vocabulary. Dense retrieval alone, or even BM25 alone, will saturate the achievable quality. Adding the second arm and a fusion step buys nothing measurable and costs latency.

Essential territory. Product catalogs with SKUs and model numbers. Technical documentation with code identifiers, library names, and error strings. Multilingual corpora where the same concept appears in different surface forms. Any RAG system over arbitrary user-uploaded content, because you have no idea in advance whether queries will be paraphrase-heavy or identifier-heavy. In all of these, the BM25 arm catches exact-match queries that the dense arm whiffs on, and the lift is large and reproducible.

A safe default for any chatbot whose corpus might grow past a few thousand documents, or whose users are not under your control, is to ship hybrid from day one. Retrofitting later means re-indexing the entire corpus on the lexical side, which is the kind of migration that gets postponed until something breaks in production.

The short version: in 2026, hybrid is not a fancy upgrade. It is the floor.

FAQ

Common Hybrid search questions

Two retrieval algorithms running together: a keyword retriever (BM25) and a semantic retriever (dense vectors). Both search the same corpus in parallel, and their ranked results are fused into one list.

Was this helpful?

Ship your first chatbot in 60 seconds.

Sign in with Google and you'll be answering visitor questions before your coffee gets cold.

60-second setup · One-line install · Works on any site

Works on any website
SShopify
WWebflow
WPWordPress
SqSquarespace
FFramer
</>Plain HTML