Glossary

AI Chatbot Glossary

35 plain-English definitions across retrieval, LLM internals, chatbot architecture, and chunking.

Retrieval & search

(10)

BM25
BM25 ranks documents by keyword relevance using term frequency, inverse document frequency, and length normalization. The default in Lucene, Elasticsearch, and Postgres FTS.
TF-IDF
TF-IDF scores how distinctive a word is in a document, weighing how often it appears against how rare it is across the corpus. The 50 year old foundation of keyword search.
Cosine similarity
Cosine similarity measures the angle between two vectors, scoring semantic closeness regardless of vector magnitude. The default similarity metric for embedding based search.
Dense retrieval
Dense retrieval encodes queries and documents as fixed dimension vectors and finds matches by vector similarity, surfacing semantically related content that keyword search misses.
Sparse retrieval
Sparse retrieval represents documents as high dimensional vectors with mostly zero values, indexed in an inverted index for fast keyword lookup. The category that includes BM25, TF-IDF, and SPLADE.
Reciprocal Rank Fusion (RRF)
Reciprocal Rank Fusion (RRF) combines multiple ranked lists into one by summing 1/(k+rank) across systems. The standard fusion algorithm for hybrid search.
ColBERT
ColBERT is a late interaction retrieval model that stores per-token embeddings and computes relevance via MaxSim, giving dense quality accuracy with sparse like efficiency.
Hybrid search
Hybrid search runs keyword and semantic retrieval in parallel and fuses their rankings. The default architecture for modern RAG, beating either approach alone on the BEIR benchmark.
Vector database
A vector database indexes high dimensional embeddings for fast nearest neighbor search. Powers RAG, semantic search, and recommendation systems. Often Postgres with pgvector is enough.
Embedding model
An embedding model converts text into a fixed dimension vector that encodes meaning. The choice between OpenAI, Cohere, Voyage AI, and open-source BGE drives RAG quality.

LLM internals

(10)

Application & chatbot architecture

(10)

Architecture & chunking

(5)

Ship your first chatbot in 60 seconds.

60-second setup · One-line install · Works on any site

Works on any website

SShopify

WWebflow

WPWordPress

SqSquarespace

FFramer

</>Plain HTML