You have a friend who is always certain. That friend will tell you, with complete confidence, that the Battle of Hastings was in 1067 (it was 1066), that water boils at 102 degrees Celsius at sea level (it does not), that Iceland is in the Arctic (partially true but not quite). Your friend is not lying. Your friend genuinely believes these things. And you cannot tell which facts are right and which are wrong without checking. The confidence is real. The accuracy is not.
Large language models have the same problem. They generate fluent, confident text. They do not generate false text deliberately. They generate text that is statistically consistent with their training data, and that statistical consistency does not guarantee factual accuracy. Hallucination is not lying. It is being wrong with high confidence because the training signal was fluency, not truth. The friend read a lot of sources and internalized patterns; the friend did not check facts against a ground truth.
Understanding why models hallucinate is prerequisite to knowing how to detect and mitigate it. A language model is trained to predict the next token given the previous tokens. The training objective is coherence and fluency, not truth. When the model generates “the Battle of Hastings was in 1067,” it is generating a sequence of tokens that is statistically plausible in the context of historical discussion. The model has seen similar sequences in its training data and reproduces them. The compression was lossy; the dates blurred.
The model does not have access to a fact database it queries. It does not check claims against a ground truth. It has a compressed representation of statistical relationships in human text, and it generates from that representation. The compression is lossy. Specific facts get averaged with similar facts. Dates get blurred. Precise figures get rounded. And the model has no signal that this happened. The friend has no internal fact-checker; the friend has pattern recognition.
The hallucination problem is most acute for recent events, specialized knowledge, proprietary information, and anything that was rare in the training data. A model with a knowledge cutoff of 2023 cannot know what happened in 2024. A model that has never seen a specific internal policy document cannot answer questions about it accurately. A model that has seen a rare disease described in only a handful of papers may generate plausible-sounding but inaccurate descriptions of it. The friend is most wrong about the things they know least about.
Compositional hallucination is subtler. The model may know the individual facts but combine them incorrectly. It may know that Person A was born in Year X and that Person A achieved Y, but generate a plausible but wrong year for Y. It may know that Company Z acquired Company W in Year V, but generate a plausible wrong acquisition year. These compound errors are harder to detect because each component is plausible, but the composition is wrong. The friend gets the individual facts right but connects them incorrectly.
Detection Approaches
Retrieval augmentation (RAG) is the most practical mitigation for production systems. If the model answers from a retrieved passage, the hallucination surface shrinks to cases where the passage itself is wrong or the model misinterprets it. The passage becomes a ground truth the model must stay close to. This is not perfect: the model can still misread a correct passage. But it removes the model’s dependence on its training data for factual content. The friend checks the encyclopedia before answering.
This diagram requires JavaScript.
Enable JavaScript in your browser to use this feature.
RAG grounds the model’s output but does not eliminate hallucination. The model can still confabulate details that are not in the passage, or infer relationships the passage does not support. A retrieval-grounded answer about a legal contract can still misstate which party bears a particular obligation if the model misreads the clause. Grounding reduces hallucination frequency; it does not eliminate it. The friend read the right book but misunderstood a sentence.
Self-consistency sampling asks the model the same question multiple times and checks whether answers converge. If the model gives three different answers to “What year was the Magna Carta signed?”, you have a signal that its confidence is misplaced. This is useful for high-stakes outputs but comes with real costs. You are making three or five or ten model calls instead of one. The latency and cost multiply accordingly. Self-consistency is appropriate when the stakes of a wrong answer justify the cost of verification.
Chain-of-thought prompting surfaces the reasoning path. This does not eliminate hallucination but makes it easier to spot. If the error is in the reasoning, the chain-of-thought makes it visible: the model cites a specific year as fact, and that citation is where the error originates. Without the chain, the error hides in the conclusion. With it, you can trace back to where the model went wrong. The friend explains how they reached the conclusion; the explanation reveals where they went wrong.
Calibrated confidence output attempts to have the model express uncertainty explicitly. Some models support this. Most are poorly calibrated in the sense that they express high confidence even on wrong answers. The model is trained to produce fluent text, not to accurately assess its own knowledge. Calibration is an active research area, and most production systems cannot rely on the model’s expressed confidence as a trustworthy signal. The friend does not know what they do not know.
The Confidence Calibration Problem
LLMs are notoriously overconfident. They produce fluent text about topics they know nothing about with the same confidence as topics they know well. A model asked about a fictitious legal case will generate a detailed description with citations and holdings, all confidently presented. The fluency creates an illusion of knowledge that is not there. The confident tone is not correlated with accuracy.
This is not a bug that will be fixed in the next model version. It is a structural property of how language models work. The training objective does not penalize confident wrong answers, only incoherent ones. Until training methods change to penalize calibration error, confident incorrect outputs will remain a feature of the species. The friend will keep being wrong with certainty.
For high-stakes applications, this means you cannot use the model as-is without additional guardrails. The confident tone does not convey reliable information. You need external verification, retrieval grounding, or human review for any output where factual accuracy is non-negotiable. Treat the model like a research assistant who never says “I do not know.”
Entailment verification is an underused approach. Given the retrieved documents and the generated answer, use a separate model call to check whether the answer is actually supported by the documents. This second call flags cases where the model went beyond the grounding material. The cost is an additional model call, but it catches hallucination that RAG alone does not prevent. The friend writes an answer; a second friend checks if the answer matches the sources.
Where Hallucination Is Most Dangerous
Hallucination risk is not uniform across use cases. In creative writing, hallucination is not a meaningful problem: there is no ground truth to violate. In factual summarization, hallucination is a serious problem. In advisory contexts (legal, medical, financial), hallucination can cause real harm and create liability. The question is whether there is a ground truth that matters.
Regulated industries have specific hallucination risks. A model that generates incorrect medical advice may expose the deploying organization to liability. A model that generates incorrect legal analysis may cause a client to make a wrong decision. A model that generates incorrect financial information may violate securities regulations. These domains require not just mitigation but demonstrable mitigation with audit trails. The friend is dangerous when the stakes are high and the friend does not know they are wrong.
The user population matters. Expert users who can recognize hallucinated citations or implausible facts are better equipped to catch errors. Non-expert users who accept the model’s confident tone as ground truth are more vulnerable. Consumer-facing applications with non-expert users require more robust hallucination mitigation than professional tools used by domain experts. The friend is less dangerous to people who can catch them.
Residual Risk
No detection mechanism catches all hallucinations. The goal is reduction to acceptable levels. If your application cannot tolerate any factual error, you need human review or retrieval grounding on every output. If your application is creative writing, hallucination risk is lower because there is no factual ground truth to violate. The acceptable error rate is a business and ethical decision.
Acceptable residual risk is a business and ethical decision, not a technical one. A system that generates marketing copy can tolerate more hallucination than a system that generates medical advice. A system that generates historical summaries can tolerate less hallucination than one that generates fiction. The tolerance level should drive the mitigation investment.
When you accept residual hallucination risk, make that acceptance explicit and documented. A system that is known to hallucinate 5% of citations is manageable. A system that is believed to be accurate but is actually hallucinating 5% of citations is dangerous. Knowing your error rate is safer than assuming you have no error rate. The friend’s track record should be documented.
A layered mitigation approach is more robust than any single technique. First layer: retrieval grounding for factual queries. Second layer: entailment verification on grounded answers. Third layer: human review for highest-stakes outputs. Each layer catches hallucination that the previous layers missed. Defense in depth.
Not every query needs all three layers. A simple factual lookup might only need retrieval grounding. A complex analytical query that synthesizes across multiple documents needs entailment verification. A query that informs a consequential decision needs human review. Apply mitigation proportional to the stakes. The friend should not be checked by three people for every sentence; but for consequential claims, triple-checking is appropriate.
Monitoring hallucination rates in production is essential. Track the rate at which entailment verification flags non-grounded answers. Track the rate at which human reviewers catch hallucination in reviewed outputs. These rates should inform whether your mitigation investment is adequate or needs to increase. If the friend is wrong 10% of the time, you need more mitigation than if they are wrong 1% of the time.
Reduce hallucination risk when factual accuracy matters for your use case, when you are in a regulated domain where errors have consequences, when users cannot easily verify the outputs themselves, when the model’s knowledge cutoff means it may lack current information, and when the cost of a wrong answer exceeds the cost of verification. Choose your mitigation: retrieval augmentation (RAG) for factual grounding at reasonable cost, self-consistency sampling for high-stakes outputs with budget for latency, chain-of-thought for transparency in reasoning errors, human review for the highest-stakes outputs, structured output constraints to limit confabulation space, and entailment verification as a second check on grounded answers.
Accept residual hallucination risk when the task is primarily generative or creative, when verification is feasible at the application layer, when the cost of perfect accuracy exceeds the harm of occasional errors, and when the user has context to evaluate the output (expert users who know when to double-check). Your confident friend is useful company for some conversations. Know which ones require double-checking, and do not let the confident tone fool you into skipping it.