What an AI hallucination actually is
A hallucination is text generated by a large language model that reads as confident and fluent but is factually wrong, fabricated, or unsupported by any verifiable source. The model is not lying in any intentional sense. It is doing the only thing it knows how to do, which is predict the next plausible-sounding token. When plausibility and truth diverge, the model picks plausibility.
The term gets used loosely to mean "the AI got something wrong," but a clean definition matters. A hallucination is specifically a confident-sounding output presented as fact when the model had no grounded reason to produce it. A retrieval pipeline that returns the wrong passage and a model that faithfully summarizes that wrong passage is not really hallucinating, it is being mis-fed. A model that, given the correct passage, then invents a quote or a statistic that is not in that passage is hallucinating in the strict sense.
This distinction matters because the mitigations are completely different. Bad retrieval is fixed by better retrieval. True hallucination is fixed by output-side controls like citation grounding, refusal prompts, and verifiers.
The two main types: intrinsic and extrinsic
The Ji et al. (2023) survey on hallucination in natural language generation introduced the taxonomy that most of the field now uses. Both types are worth knowing because they fail in different ways and respond to different defenses.
Intrinsic hallucination is when the generated output contradicts the input context. You hand the model a source passage that says "the refund window is 30 days," and the model produces "the refund window is 60 days." The truth was sitting right there in the prompt. The model overrode it with something more familiar or more fluent. Intrinsic hallucinations are particularly damaging for retrieval-augmented generation systems because they break the core promise: if the retrieved passage is correct but the answer contradicts it, the whole grounding chain has failed.
Extrinsic hallucination is when the generated output introduces information that was not in any input at all. Fabricated citations, invented case law, fake URLs, made-up quotes from real people, and statistics that do not exist anywhere. Extrinsic hallucinations are the ones that make headlines, partly because they are the easiest to spot once a human checks the references.
Most production chatbot failures are a mix. A user asks about a feature. The retrieval layer surfaces a partially relevant doc. The model fills the gap with extrinsic invention while contradicting the small piece that was correctly retrieved.
Why hallucinations happen for AI chatbots
The root cause is the training objective itself. A language model is rewarded for producing text that looks like text from the training distribution. "I do not know" is statistically rare in the training data. Confident, specific, helpful-sounding answers are common. So the model learns to sound confident and specific even when it should not.
There is no fact-checking layer inside the model. The weights encode patterns of co-occurrence, not a truth database. When the model writes "the CEO of Acme Corp is Jane Smith," nothing internal verified that claim. The sentence is produced because, given the preceding tokens, those continuations had high probability.
A second contributor is the long-tail problem. Common facts repeated thousands of times in pretraining tend to come out reliably. Rare facts seen once or twice get blended with similar-shaped facts. This is why hallucinations cluster around obscure entities, specific dates, exact numbers, and citation strings, the parts of an answer that demand precision the model cannot guarantee.
Decoding settings amplify this. Higher temperature gives more creative phrasing but more variance in factual claims. Long-form generation gives more opportunities to drift. Anything that asks the model to invent structure (a list of five sources, three case studies, four objections) creates pressure to confabulate to meet the requested shape.
How to reduce hallucinations: RAG, refusal, citations
You cannot eliminate hallucinations from a generative model. You can stack mitigations until the residual rate is tolerable for the use case.
Retrieval-augmented generation. Instead of relying on the model's parametric memory, fetch relevant passages from a trusted knowledge base and ask the model to answer using those passages. RAG addresses extrinsic hallucination directly: if the answer must come from the retrieved chunk, there is less room to invent. RAG does not stop intrinsic hallucination on its own, which is why grounding alone is not enough.
Citation grounding. Require the model to attach an inline reference to each substantive claim, then validate that those references actually point to passages that support the claim. Done well, this turns the model into a "quote and link" engine rather than a freestyle writer. See the separate citation grounding entry for the implementation details.
Refusal prompts. Explicitly instruct the model to say "I do not know" when the retrieved context does not contain a clear answer, rather than reaching into parametric memory. This is the cheapest single intervention with the biggest payoff and is often skipped because builders assume the model will refuse by default. It will not.
Output verifiers. A second model call, sometimes a smaller model, reads the generated answer alongside the retrieved sources and flags claims that are not supported. Verifiers add latency and cost but catch the residual cases the primary model gets wrong. They also pair well with AI guardrails, which gate the output before it ever reaches the user.
ChatRaj's refusal pattern explicitly tells the model to say "I do not know" rather than guess, and grounds every answer in retrieved passages with inline citations. Combined with operator-side analytics that surface low-confidence answers, this keeps the hallucination rate low enough for customer-facing deployments without removing the model's ability to actually be helpful.