What does hybrid in hybrid search mean?

Two retrieval algorithms running together: a keyword retriever (BM25) and a semantic retriever (dense vectors). Both search the same corpus in parallel, and their ranked results are fused into one list.

Is hybrid search always better than dense alone?

On most benchmarks yes, especially when exact match queries (SKUs, names, identifiers, error codes) are common. The lift is marginal on pure paraphrase corpora where the dense arm already captures everything.

What is the difference between hybrid search and reranking?

Hybrid combines two retrieval passes that run in parallel. Reranking is a downstream scoring pass: a slower cross-encoder re-scores the top hybrid candidates. They stack. Most production RAG pipelines use both.

Does OpenAI's Assistants API use hybrid search?

Yes. The file-search tool in Assistants v2 has used hybrid retrieval (keyword plus semantic) under the hood since 2024, with no configuration needed from the developer.

Can I implement hybrid search in Postgres?

Yes. Use Postgres full-text search for the BM25 style arm, pgvector for the dense arm, run both queries from your app, and combine the two ranked lists with Reciprocal Rank Fusion in application code.

What is Hybrid Search? (BM25 + Dense Retrieval, Fused)

What hybrid search actually is

Hybrid search is a retrieval pattern, not a single algorithm. You run two retrievers over the same corpus, in parallel, against the same user query, and then merge their ranked results into one list before passing the top candidates to whatever consumes them (an LLM, a reranker, a UI).

The two retrievers are almost always:

A lexical, term frequency engine, typically BM25. This arm scores documents by how often the query's literal tokens appear, weighted by inverse document frequency and length normalized.
A semantic, embedding based engine, typically dense retrieval over a vector index. This arm encodes the query into a high dimensional vector and finds the documents whose embedding vectors are closest by cosine similarity.

The fusion step combines the two ranked lists. The dominant method in 2026 is Reciprocal Rank Fusion (RRF), which scores each document by the sum of 1 / (k + rank) across the two result sets, with k typically set to 60. Some systems use a weighted score sum instead, normalizing BM25 and cosine scores onto a common scale and blending them with an alpha parameter.

Either way, the output is one ranked list of the top N candidates. That list goes downstream.

How hybrid search works (the fusion step)

The interesting engineering happens in the fusion step, because BM25 scores and dense similarity scores live in different unit systems and cannot be added directly. There are two common approaches.

Rank based fusion (RRF). Ignore the raw scores. Look only at the rank position each document holds in each list. A document that appears at rank 3 in BM25 and rank 7 in dense gets 1/(60+3) + 1/(60+7). A document that appears in only one list still gets credit, just less. RRF requires no tuning and works out of the box. Elasticsearch 8.8+ ships RRF as a native query option for exactly this reason: hybrid scoring with no parameter tuning needed.

Score based fusion (alpha blending). Normalize each retriever's scores to [0, 1], then take a weighted average. Weaviate exposes this as an alpha parameter where alpha=1.0 is pure vector, alpha=0.0 is pure BM25, and the default is 0.75. This is more tunable but more fragile, because score distributions shift as your corpus grows.

In Postgres land, neither is built in. You run a full-text search query and a pgvector similarity query, then do RRF in application code.

Why hybrid search matters for AI chatbots

The complementarity argument is the whole pitch. BM25 and dense retrieval fail on different inputs:

BM25 nails exact tokens: SKUs like MX-7841-B, names like useEffect, version numbers, error codes, acronyms the embedding model never saw during pretraining. It fails on paraphrase. A query asking "how do I cancel my plan" will miss a doc that says "ending your subscription."
Dense retrieval nails paraphrase: it understands that "cancel" and "end subscription" mean the same thing. It fails on exact tokens, especially out-of-vocabulary identifiers, because the embedding has nothing meaningful to encode.

In a chatbot context this matters constantly. A user might ask "what's the price of the SX-200" (BM25 wins, dense will fuzzy-match unrelated products), and the next user asks "what does that thing cost" referring to a recent message (dense wins, BM25 has no tokens to grip).

The Thakur et al. 2021 BEIR paper is the cleanest evidence that hybrid wins in aggregate. Across 18 heterogeneous IR datasets covering fact-checking, question answering, biomedical, and scientific retrieval, hybrid models pushed nDCG@10 from BM25's 43.42 up to 52.59. Hybrid was the strongest non-reranking approach on the majority of datasets, and the result has held up across follow-up work.

That is why hybrid is the production default for Retrieval-Augmented Generation in 2026. Elasticsearch, Weaviate, Vespa, Qdrant, OpenSearch, and Pinecone all support hybrid out of the box. OpenAI's Assistants v2 file-search tool uses hybrid retrieval under the hood. The conversation has shifted from "should we use hybrid" to "which fusion method and where do we put the reranking pass."

ChatRaj's retrieval is hybrid by default: BM25 + dense vectors, fused with RRF.

When hybrid is overkill vs essential

Hybrid is not free. It doubles the index footprint, adds latency, and complicates the ops story. So when does it actually pay off?

Overkill territory. A tiny corpus of a few hundred documents in one language, all FAQ shaped, where the user vocabulary matches the doc vocabulary. Dense retrieval alone, or even BM25 alone, will saturate the achievable quality. Adding the second arm and a fusion step buys nothing measurable and costs latency.

Essential territory. Product catalogs with SKUs and model numbers. Technical documentation with code identifiers, library names, and error strings. Multilingual corpora where the same concept appears in different surface forms. Any RAG system over arbitrary content uploaded by users, because you have no idea in advance whether queries will be heavy on paraphrase or heavy on identifiers. In all of these, the BM25 arm catches exact match queries that the dense arm whiffs on, and the lift is large and reproducible.

A safe default for any chatbot whose corpus might grow past a few thousand documents, or whose users are not under your control, is to ship hybrid from day one. Retrofitting later means re-indexing the entire corpus on the lexical side, which is the kind of migration that gets postponed until something breaks in production.

The short version: in 2026, hybrid is not a fancy upgrade. It is the floor.

Hybrid search

What hybrid search actually is

How hybrid search works (the fusion step)

Why hybrid search matters for AI chatbots

When hybrid is overkill vs essential

Common Hybrid search questions

Sources & further reading

Ship your first chatbot in 60 seconds.

Hybrid search

What hybrid search actually is

How hybrid search works (the fusion step)

Why hybrid search matters for AI chatbots

When hybrid is overkill vs essential

Related terms

Common Hybrid search questions

Sources & further reading

Ship your first chatbot in 60 seconds.