Do I need a vector database for RAG?

Not necessarily. Postgres with the pgvector extension handles up to several million vectors comfortably with an HNSW index. Dedicated vector databases become useful at much larger scale or when you need specialized features like managed multi-region replication.

What's the best vector database?

It depends on scale. Pgvector is the most pragmatic choice for under 10 million vectors. Pinecone, Qdrant, Weaviate, and Milvus are stronger for larger deployments or when you want managed convenience and hybrid search out of the box.

Is pgvector slower than Pinecone?

At small to mid scale, query latency is comparable once both indexes are warm. Pinecone scales further with less operational work, and its serverless tier (generally available May 2024) decouples storage from compute so cost tracks usage.

Hierarchical Navigable Small World, the dominant ANN algorithm in 2026. It builds a layered graph of vectors so similarity queries reduce to a greedy walk through the graph. Fast queries, tunable recall, and modest indexing cost.

How big are typical embedding vectors?

Common dimensions are 384, 768, 1024, 1536, and 3072. Each float32 value takes 4 bytes per dimension, so a 1536-dim vector is about 6 KB on disk before any index overhead.

What is a Vector Database? (When You Actually Need One)

What a vector database actually is

A vector database is a storage system that indexes high dimensional embedding vectors and answers similarity queries against them. The defining feature is not storing the vectors (any database can do that) but indexing them with an approximate nearest neighbor (ANN) algorithm so that a query of the form "find the 10 vectors closest to this one" returns in under 100 milliseconds even when the index holds tens of millions of vectors.

Concretely, here is the pipeline a vector database sits inside. An embedding model converts a piece of text, an image, or an audio clip into a dense float vector with somewhere between 384 and 3072 dimensions. The vector database stores that vector together with metadata (document id, chunk position, tenant id, timestamp). At query time, another embedding is produced from the user's question and the database returns the k nearest stored vectors, ranked by a distance metric such as cosine similarity or dot product.

This pattern is the storage half of dense retrieval. The model produces the vectors; the database stores and searches them. The two roles are easy to confuse but they live in different boxes.

What an ANN index does (HNSW, IVF, ScaNN)

The word approximate in approximate nearest neighbor is doing real work. An exact nearest neighbor scan over ten million 1536-dimensional vectors requires roughly ten million dot products per query, which is far too slow for an online chatbot. ANN algorithms trade a small amount of recall (typically 95 to 99 percent of the true neighbors) for a 100x to 1000x speedup.

Three algorithms dominate the landscape in 2026:

HNSW (Hierarchical Navigable Small World) is the default choice nearly everywhere. It builds a layered graph where each node links to a handful of close neighbors at multiple scales, then searches by greedy descent. HNSW underpins pgvector, Qdrant, Weaviate, Milvus, FAISS, and most newer libraries. Query latency is excellent, recall is tunable through the ef parameter, and the only real downside is memory: the graph lives in RAM.

IVF (Inverted File) clusters vectors with k-means, stores them in cluster buckets, and at query time searches only the closest nprobe buckets. Recall is lower than HNSW for the same speed, but memory cost is much lower because the index does not need to hold a graph in RAM.

ScaNN is Google's algorithm, used internally for large scale retrieval. It combines quantization with partitioned search and is the fastest published option at very large scale, though most of the ecosystem still defaults to HNSW for operational simplicity.

Why vector databases matter for AI chatbots

Every AI chatbot built on retrieval augmented generation needs a place to put its embeddings. When a user asks a support question, the chatbot embeds the question, looks up the closest chunks of indexed documentation, stuffs those chunks into the prompt, and asks the language model to answer. The retrieval step is exactly an ANN query against a vector store.

Without that step the chatbot would have to fit the entire knowledge base into the context window of every request, which is both prohibitively expensive and noticeably worse at picking out the relevant passage. A well tuned vector index gives the model exactly the few chunks it needs and nothing else.

Vector databases also power semantic search inside the chatbot itself, deduplication of incoming conversations, recommendation of next questions, and clustering of unanswered queries for content gap analysis. The same index that handles RAG often handles those side jobs without modification.

ChatRaj uses Postgres with the pgvector extension. No separate vector database service needed at our scale. That keeps the data plane on a single primary, lets us join vector results against tenant tables in one query, and removes a moving part from the on-call rotation.

Do you need a dedicated vector DB? (probably not)

The marketing around vector databases makes them sound like a hard requirement for any AI feature. They are not. The honest version is: under roughly 10 million vectors, Postgres with pgvector and an HNSW index is fast enough, cheap enough, and operationally simpler than running a second database.

The pgvector extension is open source, has roughly 13,000 GitHub stars, and added HNSW support in version 0.5.0 in 2023. Performance is within the same order of magnitude as dedicated vector databases at small to mid scale, and you keep all the operational tooling you already use for Postgres: backups, replicas, point-in-time recovery, IAM, the works.

Dedicated vector databases earn their keep when you cross into territory where Postgres genuinely strains. Pinecone went generally available with its serverless architecture in May 2024 and is the pragmatic choice for teams that want managed infrastructure without thinking about pods or shards. Qdrant, Weaviate, and Milvus are open-source dedicated stores with strong support for hybrid search, rich metadata filtering, and payload schemas. LanceDB and Chroma are newer entrants oriented around embedded and notebook friendly workflows.

The decision tree is roughly:

Under a few million vectors, single tenant, want minimal moving parts: Postgres plus pgvector.
Tens of millions of vectors or strict latency SLAs across regions: managed Pinecone or self-hosted Qdrant.
Hybrid search or complex payload filtering as a first class need: Weaviate or Qdrant.
Want everything on one box for a prototype: Chroma or LanceDB.

The trap is reaching for a dedicated vector DB before you have a scale problem. You then own two databases, two backup pipelines, two access control models, and two failure modes, all to serve a corpus that would have fit comfortably inside the one database you already operate.

Vector database

What a vector database actually is

What an ANN index does (HNSW, IVF, ScaNN)

Why vector databases matter for AI chatbots

Do you need a dedicated vector DB? (probably not)

Common Vector database questions

Sources & further reading

Ship your first chatbot in 60 seconds.

Vector database

What a vector database actually is

What an ANN index does (HNSW, IVF, ScaNN)

Why vector databases matter for AI chatbots

Do you need a dedicated vector DB? (probably not)

Related terms

Common Vector database questions

Sources & further reading

Ship your first chatbot in 60 seconds.