What AI hallucination actually means
In the world of large language models, a hallucination is any case where the model outputs fluent, confident text that is factually wrong, unsupported by any real source, or in direct contradiction with the input it was given. The term is borrowed by analogy from human perception: a human hallucinates when they perceive something that is not there. An LLM hallucinates when it states something that is not true.
The word can be misleading because it suggests that the model "sees" something. It does not. A language model has no perception, no beliefs, and no model of truth. It is a function that takes a sequence of tokens and produces probabilities over the next token. When that function emits a confident-sounding sentence about a court case that never existed, the model is not lying and it is not mistaken in any psychologically meaningful sense. It is doing exactly what it was trained to do: produce text that pattern-matches the shape of the training data. The training data is full of confident sentences. The model has learned to imitate that shape. Whether the content is true is a separate question that the training objective does not directly address.
The clearest working definition comes from a widely cited 2023 survey by Ziwei Ji and colleagues, "Survey of Hallucination in Natural Language Generation" (arXiv:2202.03629). They define hallucination as "the generated content that is nonsensical or unfaithful to the provided source content." That definition does two important things. It separates hallucination from simple grammatical errors (the output can be perfectly fluent). And it grounds the definition in faithfulness to a source. A model is hallucinating when the output does not match what the source says or what is verifiably true in the world.
The two-type taxonomy: intrinsic vs extrinsic
The Ji 2023 survey introduced the cleanest split between the two main flavors of hallucination, and that split has become the standard reference point in the literature.
An intrinsic hallucination is output that contradicts the input the model was given. If you paste an article into the prompt and ask the model to summarize it, and the summary states something the article does not say, that is intrinsic. The "source" is right there in the prompt, and the model failed to be faithful to it. Intrinsic hallucinations show up most often in summarization, translation, and any task where the truth is supposed to be defined by an input document.
An extrinsic hallucination is output that asserts something the input did not say one way or the other, but that turns out to be false in the world. If you ask a model "who is the CEO of Acme Corp?" and the model says "Jane Smith" when no one named Jane Smith has ever worked at Acme, that is extrinsic. There is no input passage to contradict; the model is filling in a fact from its training memory, and the fact is wrong. Extrinsic hallucinations are the dominant failure mode in open-ended question answering, chatbots, and any task where the user is treating the model as a knowledge source.
The two types matter because they get caught differently. Intrinsic hallucinations can be detected by checking the output against the input. Extrinsic hallucinations require a separate ground truth source, which is what retrieval-augmented generation (RAG) is designed to provide.
A finer-grained taxonomy: five kinds
In practice, "intrinsic vs extrinsic" is a useful but coarse split. Production systems care about more specific failure shapes because each shape needs a different mitigation. Here are five kinds that show up over and over in real deployments.
Kind one is the fabricated fact. The model invents a specific, checkable claim that is just wrong. "The CEO is Jane Smith" when the CEO is John Doe. "The product ships in three colors" when it ships in five. Fabricated facts are the canonical hallucination and the easiest to catch when the ground truth is known, because a single lookup falsifies them.
Kind two is the fabricated citation. The model invents a reference to a source that does not exist. A made-up court case, a fake academic paper, a quote attributed to a person who never said it. The legal-brief case described later in this article is the textbook example. Fabricated citations are particularly dangerous because the presence of a citation looks like evidence of grounding when in fact the citation is itself part of the hallucination.
Kind three is the numeric hallucination. The model emits a wrong number: a wrong date, a wrong price, a wrong percentage, a wrong dose. Numbers are an especially common failure point because the model has no way to know which digits are statistically plausible versus actually correct. Internal evaluations at most major labs show numeric outputs are among the highest-error-rate categories in general question answering.
Kind four is the confused entity. The model conflates two real things. It correctly knows that there is a person named John Smith who is a poet and another person named John Smith who is a tennis coach, and the output attributes the tennis-coaching career to the poet. The output is partially grounded in reality, which makes confused-entity hallucinations harder to flag than pure fabrication.
Kind five is source distortion. The model has been given a real source passage and gets the passage wrong: misquotes it, leaves out a critical qualifier, or paraphrases in a way that changes the meaning. Source distortion is the intrinsic-hallucination flavor that shows up most in RAG systems. The retriever returns a correct passage, the model summarizes it, and the summary is subtly wrong in a way the user cannot detect without going back to the source.
These five categories cover almost every hallucination you will see in a production chatbot. The comparison table below puts them side by side with how each is detected and what the visitor risk looks like.
Why hallucinations happen
To understand why hallucinations happen at all, it helps to look at how an LLM actually generates text. The model produces one token at a time. At each step, it takes the sequence of tokens so far, runs that sequence through a stack of transformer layers, and ends up with a vector of scores (one per possible next token in the vocabulary). A softmax turns those scores into probabilities. The model samples or argmaxes to pick a token. That token gets appended to the sequence. Repeat until the model emits a stop token.
Three properties of that process produce hallucinations as a structural consequence, not as a bug.
First, the training objective. Pretraining optimizes the model to predict the next token given the previous tokens, across a giant corpus of internet text. The loss function rewards probability mass on the token that actually appeared in the training data. It does not check whether the resulting sentence is true. A model that produces fluent, confident, well-structured falsehoods gets a lower loss than a model that produces stilted true statements, because the training data is itself full of confident-sounding writing. The model has learned the shape of confident assertion. Truth was never in the loss.
Second, the softmax always produces an answer. There is no "I don't know" token in the vocabulary by default. At every step, the model picks something. If the model has no information about a specific question, the next-token distribution will still concentrate on the tokens that are most plausible given the context. Those tokens often form sentences that look exactly like correct answers. The model is not aware that it is guessing. From the inside of the decoding loop, every token feels the same.
Third, there is no fact-check pass. The base architecture has no separate step that asks "is this output true?" before emitting it. Modern systems bolt on tool use, retrieval, and refusal training to compensate, but the underlying generation step does not, on its own, distinguish between "I generated this because I have strong evidence" and "I generated this because it sounds like the kind of thing I would say next."
A common misunderstanding is that hallucinations are a sign the model is broken or buggy. They are not. They are the predictable output of an architecture that optimizes for plausibility and does not, by default, optimize for truth. Mitigating hallucinations means adding architecture on top of the base model: retrieval, citation grounding, refusal training, and human review.
Famous real-world cases
The technical definition is abstract. The real-world cases are what make hallucination concrete for non-engineers, and they are what regulators, judges, and journalists now point to when they talk about AI risk.
Case 1: Mata v. Avianca (2023), the lawyer who cited fake cases
In June 2023, a federal judge in the Southern District of New York sanctioned attorneys Steven Schwartz and Peter LoDuca and their firm Levidow, Levidow and Oberman for filing a legal brief that cited six judicial opinions that did not exist. The case was Mata v. Avianca, Inc., 678 F.Supp.3d 443. Mr. Schwartz had used ChatGPT to research the brief and had asked the model to find supporting case law on tolling under the Montreal Convention. The model produced six confident-looking case citations, including Varghese v. China Southern Airlines, Martinez v. Delta Airlines, Shaboon v. Egyptair, Petersen v. Iran Air, Estate of Durden v. KLM Royal Dutch Airlines, and Miller v. United Airlines. None of those cases existed.
When opposing counsel could not find the cases, Mr. Schwartz asked ChatGPT to confirm they were real. The model confirmed they were. Judge P. Kevin Castel issued a written sanction of $5,000 against Mr. Schwartz, $5,000 against Mr. LoDuca, and $5,000 against the firm under Federal Rule of Civil Procedure 11. The case became the canonical example of fabricated-citation hallucination and is now a standard reference in continuing-legal-education programs on AI.
Case 2: Moffatt v. Air Canada (2024), the chatbot that invented a refund policy
In February 2024, the British Columbia Civil Resolution Tribunal ruled in Moffatt v. Air Canada (2024 BCCRT 149) that Air Canada was liable for incorrect information its website chatbot had given a customer named Jake Moffatt. Mr. Moffatt was booking a flight to attend his grandmother's funeral. The chatbot told him he could book at the regular fare and then apply for a bereavement-fare refund within 90 days. He did so. Air Canada then refused the refund, arguing that the actual bereavement policy did not allow retroactive claims and that the chatbot was, in their words, "a separate legal entity that is responsible for its own actions."
Tribunal member Christopher C. Rivers rejected that argument. He found Air Canada owed a duty of care to its chatbot users, that the chatbot's output amounted to negligent misrepresentation, and that the airline was responsible for everything its website said, including statements produced by an AI. He ordered Air Canada to pay C$812.02 in damages, fees, and interest. The ruling has been cited around the world as the first significant legal precedent that companies cannot disclaim responsibility for what their AI chatbots say to customers.
Case 3: The New York Times v. OpenAI evidence
In late 2023, the New York Times sued OpenAI and Microsoft alleging that ChatGPT reproduced large portions of Times articles verbatim. The Times complaint included examples where the model, when prompted, generated text that closely matched copyrighted articles. The legal core of that case is about training-data overlap and reproduction, which is a distinct issue from hallucination. But the same case file documented several examples of misattributed quotes and fabricated Times-style content that the model presented as if it came from Times reporting. The combination matters because it shows that, even when a model is reproducing real content from memory, it can still mix that real content with fabricated content of the same style, and the visitor cannot tell the difference from the surface text alone.
Case 4: medical advice errors in published research
Several peer-reviewed studies between 2023 and 2025 evaluated general-purpose LLMs on medical question answering and consistently found hallucinated drug interactions, fabricated study citations, and confidently wrong dosing recommendations at non-trivial rates. The pattern across these studies is the same: the model produces fluent, professional-sounding clinical text; the text contains specific claims that look like the kind of thing a clinician would write; and the claims are sometimes wrong in ways that would harm a patient if acted on without review. This category of hallucination is why every serious medical-AI deployment ships with explicit refusal training, citation grounding to specific guideline documents, and a "this is not medical advice" disclaimer on every output.
How hallucinations are measured
Because hallucination is a structural property of LLMs rather than a bug, the field has developed standardized benchmarks for measuring it. Three are particularly common.
TruthfulQA, introduced by Lin, Hilton, and Evans in 2021 ("TruthfulQA: Measuring How Models Mimic Human Falsehoods," arXiv:2109.07958), is a hand-crafted benchmark of 817 questions across 38 categories where the answer most likely given by humans is, in fact, false. The benchmark exists specifically to measure whether models reproduce common human misconceptions. Higher TruthfulQA scores mean the model is better at not regurgitating popular falsehoods.
HaluEval, introduced by Li et al. in 2023 ("HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models," arXiv:2305.11747), is a 35,000-example benchmark covering question answering, dialogue, and summarization, where the task is to classify whether a given model output is hallucinated. It is the standard reference for fine-grained hallucination evaluation across multiple task types.
HELM, the Holistic Evaluation of Language Models from Stanford, runs LLMs across dozens of scenarios and reports accuracy, calibration, and several robustness measures. HELM does not isolate hallucination as a single number, but the accuracy and calibration measures together capture the same underlying property: how often does the model confidently emit something wrong.
In production, most teams complement these public benchmarks with internal evaluations on their own knowledge base. A typical setup: hand-curate a few hundred questions with verified answers, run them through the chatbot, and score the outputs for correctness and refusal behavior. The internal benchmark catches domain-specific hallucinations that the public benchmarks cannot.
Mitigation in the abstract
This page is about the definition, not the playbook, but a quick overview of the mitigation surface helps complete the picture.
Retrieval-augmented generation (RAG) is the primary structural mitigation. By retrieving relevant passages from a knowledge base and instructing the model to answer using only those passages, RAG addresses the extrinsic-hallucination case where the model would otherwise be filling in facts from memory. RAG does not eliminate hallucinations; the model can still distort the passages it was given, ignore them, or be misled by irrelevant retrievals. But it reduces hallucination rates substantially in well-tuned systems.
Refusal training teaches the model to say "I don't know" or "the documentation does not cover this" instead of guessing. This is added during the instruction-tuning and reinforcement-learning stages and is critical for any production chatbot. A model that has not been refusal-trained will confidently answer questions it has no basis for, which is exactly the failure mode that produced Mata v. Avianca and the Air Canada ruling.
Citation grounding requires the model to attach a source reference to every factual claim. The user can then click the citation and verify the claim in the source. Citation grounding does not prevent the model from making things up, but it makes the hallucination visible and verifiable. A claim without a working citation is treated as low-confidence by downstream UI.
ChatRaj uses refusal-first prompts, citation grounding to the operator's own crawled content, and dashboard surfacing of low-confidence answers so the operator can review them. The combination keeps hallucinations rare and, when they happen, visible to the team rather than hidden from the operator. The operator-facing details (how to monitor, how to triage, how to fix root causes in source content) live in the companion playbook page linked below.
What we deliberately did not cover
This article stopped at the definition, the taxonomy, the root cause, the famous cases, and a brief pointer at mitigation. It did not cover the operator-side playbook of how to detect hallucinations in your own chatbot, how to set up review queues, how to wire up source-content audits, or how to design refusal prompts. Those topics live in a dedicated operator playbook at /answers/how-to-handle-ai-chatbot-hallucinations and in the shorter reference definition at /glossary/hallucination. If you are evaluating LLM products and need the definition, you are in the right place. If you are running a chatbot in production and need the playbook, follow the link.