What is k in the RRF formula?

k is a dampening constant added to each rank before taking the reciprocal. The Cormack 2009 paper used k = 60 and most production systems (Elasticsearch, Weaviate, Vespa) keep that default. A higher k flattens the rank-position weight curve so top items dominate less; a lower k makes the first-place document overwhelm everything else.

Why not just average the scores from BM25 and cosine similarity?

Their scales are totally different. BM25 scores are unbounded log probability sums; cosine similarity is bounded in [-1, 1]. Averaging without normalization lets BM25 dominate, and normalization is its own problem because score distributions shift per query. RRF avoids normalization by using ranks instead of raw scores.

Does Elasticsearch support RRF natively?

Yes. Native hybrid search RRF landed in Elasticsearch 8.8 in May 2023 via a new _rank parameter on the search API, and the RRF retriever became the recommended way to fuse BM25 with vector or ELSER queries.

No. RRF is a fusion step. A reranker re-scores candidates with a stronger model, typically a cross-encoder. RRF just combines existing rankings from multiple retrievers into one ranked list. Most modern pipelines do both: fuse with RRF, then rerank the top-N.

What is a good number of input rankings for RRF?

Two is the most common setup, usually BM25 plus a dense vector retriever. Adding a third signal (a sparse neural method like SPLADE or a domain specific filter) gives a small additional lift on diverse benchmarks but with diminishing returns. Beyond four or five rankings, the marginal improvement is usually not worth the latency.

What is Reciprocal Rank Fusion (RRF)? Hybrid Search Ranking

What Reciprocal Rank Fusion actually is

Reciprocal Rank Fusion is a way to combine multiple ranked lists of documents into a single, better ranked list. You do not need the scores from each system. You only need the position (rank) of each document in each list. That makes RRF the simplest, most boringly reliable answer to a problem that haunts every hybrid search stack: how do you merge a BM25 ranking with a dense retrieval ranking when their scores live on completely different scales?

The trick is to ignore the scores entirely. Treat each retriever as a black box that returns an ordered list. Then fuse those lists using ranks alone. The result is a fused ordering that consistently beats any individual retriever on standard IR benchmarks, with no tuning, no normalization, and almost no code.

RRF was introduced by Gordon Cormack, Charles Clarke, and Stefan Buettcher in their 2009 SIGIR paper, "Reciprocal Rank Fusion outperforms Condorcet and Individual Rank Learning Methods." The paper showed that this two-line formula beats more elaborate fusion strategies, including learned rank fusion methods of the era. Seventeen years later it is still the default.

The RRF formula and why k=60

The formula for a single document d, fused across n input rankings, is:

code

score(d) = Σ  1 / (k + rank_i(d))
         i=1..n

Where rank_i(d) is the position of document d in input list i, counting from 1, and k is a dampening constant. If a document is missing from one list, that list simply contributes 0 to its score.

Cormack and colleagues set k to 60 in the original paper, and that value has stuck as the de facto default in Elasticsearch, Weaviate, Vespa, OpenSearch, and Qdrant. Why 60? It is not magical, it is empirically robust. A small k (say 1) makes the top-ranked item dominate. A large k (say 1000) flattens the curve so position barely matters. Sixty sits in a moderate zone: it gives meaningful weight to the top of each list without letting a single number-one result outvote agreement across systems. The Cormack paper found that performance was stable across a wide range of mid-sized k values, so the constant became a convention rather than a tuned hyperparameter.

A worked example

Suppose your hybrid pipeline returns two rankings for a query.

BM25 list: A, X, B, Y, Z
Dense list: Y, B, Z, W, A

Using k = 60:

Document A: 1/(60+1) from BM25 plus 1/(60+5) from dense = 0.01639 + 0.01538 = 0.03177
Document B: 1/(60+3) from BM25 plus 1/(60+2) from dense = 0.01587 + 0.01613 = 0.03200
Document Y: 1/(60+4) + 1/(60+1) = 0.01563 + 0.01639 = 0.03202
Document Z: 1/(60+5) + 1/(60+3) = 0.01538 + 0.01587 = 0.03125

Fused order: Y, B, A, Z, then X and W (each appearing in only one list). Notice how B and Y win, not because either retriever ranked them first, but because both retrievers ranked them highly. RRF rewards consensus.

Why score normalization is hard without RRF

BM25 scores are unbounded log probabilities that depend on corpus statistics. They can range from near zero to 30 or more depending on query length and term rarity. Dense cosine similarity scores are bounded between -1 and 1, usually clustered between 0.3 and 0.9 in practice. If you naively averaged them, BM25 would always dominate. If you min-max normalized inside each query, you would lose calibration and amplify noise on short result lists. If you trained a weighting function, you would need labeled relevance data per domain.

RRF sidesteps the whole problem. Ranks are scale-free. A first-place document in BM25 contributes exactly as much as a first-place document in dense retrieval. The math no longer cares what the underlying score units look like.

Why RRF matters for AI chatbots

Modern Retrieval-Augmented Generation pipelines almost always run two or more retrievers in parallel: lexical for exact match terms and product codes, dense for semantic paraphrases, and sometimes a sparse vector method like SPLADE for both. The retrieval quality of the candidate set directly bounds answer quality. Garbage in, hallucinations out.

RRF is the cheapest fusion step that meaningfully lifts recall@k on the merged candidate pool, which gives the reranker (if you use one) more relevant documents to choose from, which gives the LLM better grounding context. ChatRaj's hybrid retrieval fuses BM25 and dense vector rankings via RRF before passing the top-N candidates to a cross-encoder reranker. The whole pipeline costs about one extra millisecond per query and reliably reduces "I could not find that in your docs" failures on long tail queries.

Adoption tells the same story. Elasticsearch shipped native RRF support in version 8.8 in May 2023. Weaviate, Vespa, OpenSearch, and Qdrant all expose RRF as the default fusion strategy. Microsoft's Azure AI Search uses RRF for its hybrid scoring. When every major vector and search database converges on the same algorithm, it is usually because that algorithm is hard to beat with anything more clever.

RRF vs weighted score sums vs learned fusion

Weighted score sums (alpha * BM25 + (1 - alpha) * cosine) require you to normalize the two score distributions and tune alpha. They can outperform RRF when you have labeled data and a stable corpus, but they are brittle across domains and document distributions.

Learned rank fusion (training a gradient boosted tree or small neural model on rank features) can edge out RRF on specific benchmarks, but you need relevance judgments and ongoing retraining. The Cormack paper's central result was that RRF beat the learned methods available at the time without any training data at all.

Reciprocal Rank Fusion wins on simplicity, robustness, and zero data requirements. It is what you reach for first. If you later have the labels, the traffic, and the time to tune something better, you can replace it. Most teams never bother.

Reciprocal Rank Fusion (RRF)

What Reciprocal Rank Fusion actually is

The RRF formula and why k=60

A worked example

Why score normalization is hard without RRF

Why RRF matters for AI chatbots

RRF vs weighted score sums vs learned fusion

Common Reciprocal Rank Fusion questions

Sources & further reading

Ship your first chatbot in 60 seconds.

Reciprocal Rank Fusion (RRF)

What Reciprocal Rank Fusion actually is

The RRF formula and why k=60

A worked example

Why score normalization is hard without RRF

Why RRF matters for AI chatbots

RRF vs weighted score sums vs learned fusion

Related terms

Common Reciprocal Rank Fusion questions

Sources & further reading

Ship your first chatbot in 60 seconds.