ChatRaj
Retrieval & search

Cosine similarity

Cosine similarity scores how close two vectors point in the same direction, ignoring their length.

Bottom line
Cosine similarity scores how close two vectors point in the same direction, ignoring their length. It is the workhorse metric for embedding search: a value near 1 means semantically related, near 0 means unrelated. Most modern AI chatbots use it to rank document chunks against a question.
Reviewed by ··5 min read
Jump to section

What cosine similarity actually is

Cosine similarity is a number between -1 and 1 that tells you how aligned two vectors are. If the vectors point in exactly the same direction, the score is 1. If they are perpendicular (totally unrelated, in the geometric sense), the score is 0. If they point in opposite directions, the score is -1.

The key idea: cosine similarity looks at direction only. It throws away length. That property is the whole reason it became the default metric for ranking text by semantic closeness.

In a retrieval pipeline, you turn every document chunk and every user question into a high-dimensional vector with an embedding model. Cosine similarity is then the scoring function that decides which chunk best answers the question. ChatRaj's semantic search ranks document chunks by cosine similarity between the question embedding and each chunk's embedding.

How cosine similarity is computed

The formula is straightforward:

code
cos(θ) = (A · B) / (||A|| × ||B||)

Where:

  • A · B is the dot product of the two vectors (sum of pairwise products of their components).
  • ||A|| and ||B|| are the Euclidean lengths (magnitudes) of each vector.

So the numerator captures how much the two vectors "agree" component by component, and the denominator divides out their lengths. What survives is the cosine of the angle between them.

The general range is [-1, 1]. For real text embeddings, the score almost always lands in [0, 1] because language models put related concepts in a roughly positive subspace, and truly antonymous content does not produce a vector pointing in the exact opposite direction. In practice a "good" match in a RAG system sits somewhere around 0.7 to 0.9, depending on the model.

Concrete example. Imagine two short queries: "How do I reset my password?" and "I forgot my login credentials, what now?" These sentences are different lengths and share almost no words, so a keyword scorer would underrate the second one. But their embeddings point in nearly the same direction, so the cosine similarity between them is high (often above 0.85 with a modern model). Cosine similarity is what lets the chatbot recognize them as the same intent.

Why cosine similarity matters for AI chatbots

Magnitude invariance is the practical reason cosine wins for embedding search. A long document's embedding tends to have a larger magnitude than a short snippet's. If you used raw Euclidean distance, the long document could either dominate or be unfairly penalized, depending on the data. Cosine sidesteps that entirely. It asks: "regardless of how 'loud' each vector is, are they pointing the same way?"

For a chatbot doing dense retrieval, this matters because you want the model to find the chunk whose meaning matches the question, not the chunk whose embedding happens to have the right size. A two-sentence FAQ answer and a five-paragraph blog post should be ranked by how relevant they are, not by length.

There is a useful shortcut that production systems lean on heavily. If you normalize every vector to unit length (divide by its magnitude up front), then the denominator in the cosine formula becomes 1, and:

code
cos(θ) = A · B   (when ||A|| = ||B|| = 1)

The cosine similarity is exactly the dot product. Most modern embedding models, including OpenAI's text-embedding-3 family and the BGE series, return unit-normalized vectors by default. So most vector databases compute a plain inner product under the hood and call it cosine similarity. It is faster (no division, no square roots) and gives identical results.

Postgres with the pgvector extension exposes this directly. The <=> operator computes cosine distance, where:

code
cosine_distance = 1 - cosine_similarity

To index for it you create an HNSW or IVFFlat index with vector_cosine_ops. For already-normalized vectors, pgvector's inner product operator <#> (negative inner product) gives the same ranking as cosine distance with less arithmetic, which is why production teams often switch to it once they confirm their model normalizes outputs.

Cosine similarity vs dot product vs Euclidean distance

These three are the metrics you will encounter in any vector store. They are related but not interchangeable.

Cosine similarity measures angle. Magnitude does not matter. Range [-1, 1], higher is more similar. Use it for embedding-based semantic search by default.

Dot product measures angle and magnitude together. It is equal to cosine similarity only when both vectors are unit-normalized. If your model does not normalize, dot product will reward longer vectors, which is sometimes what you want (some recommendation systems exploit this on purpose) and sometimes a bug.

Euclidean distance (also called L2) measures straight-line distance between the two vector tips. Two vectors pointing the same direction but with very different magnitudes are far apart in Euclidean terms but identical in cosine terms. For text embeddings, Euclidean tends to underperform cosine because it lets vector length leak into the score.

Rule of thumb: if your embedding model normalizes outputs (most do), pick cosine or inner product, they will give the same ranking. If your model does not normalize and length carries meaning, dot product. If you are doing geometric clustering on positions in physical space, Euclidean. For a chatbot retrieving documentation chunks, cosine similarity is the right answer almost every time.

FAQ

Common Cosine similarity questions

In theory, cosine similarity is bounded by -1 and 1. For text embeddings from modern models, scores almost always land between 0 and 1, because language models place related concepts in a roughly positive subspace and truly opposite content rarely produces a negative score in practice.

Was this helpful?

Ship your first chatbot in 60 seconds.

Sign in with Google and you'll be answering visitor questions before your coffee gets cold.

60-second setup · One-line install · Works on any site

Works on any website
SShopify
WWebflow
WPWordPress
SqSquarespace
FFramer
</>Plain HTML