LLMs still hallucinate – Here’s what you need to know
Written by Aapo Tanskanen.
Sam Altman predicted in 2023 that the LLM hallucination problem would be solved within a year or two, and that we would no longer be talking about it. Fast forward two years, however, and the issue remains unresolved, with people still debating the topic.
As a business leader, you don’t need to understand the inner workings of LLMs, but understanding a few basic fundamentals is enough to steer your AI initiatives towards reliable outcomes, rather than exposing your organisation to unnecessary business risks from hallucinations.
So are you ready to explore a couple of common fundamental misconceptions around hallucinations and some ways to mitigate the risks?
Misconception 1: Hallucinations can be fixed
Hallucinations mean that LLMs sometimes generate outputs which sound plausible but are, in fact, incorrect or unfounded. This cannot be entirely solved, despite whatever LLM service providers might claim in their marketing. Current LLMs are trained to predict the next word (token), even when there is insufficient information to predict the correct one. The anthropomorphised term “hallucination” can mislead non-technical people into believing that LLMs are imagining things like a human brain, when in reality the model is doing exactly what it was designed to do: using learned statistical patterns from vast amounts of text data to generate new text, without any human-like understanding of reality. For this reason, reframing hallucinations as “confabulations” is often suggested.
You can also use simple maths to think how hallucinations appear in LLM outputs, particularly how they compound over time. Every single word (token) an LLM produces, one by one, can contain a hallucination, i.e. a small error. So, for example, if you assume 99.9% reliability per word generated, the reliability of the overall text falls exponentially, such that after 100 words the text is only about 90% reliable.
It is usually more practical to think at a slightly higher level, but the same exponential principle applies. For instance, if you intend to use AI agents, powered by LLMs, to automate business processes, each step the agent takes will include a small error which begins to compound across multi-step processes. In practice, in one real monthly business accounting process, AI agents were 99% accurate for the first couple of months, but after 12 months accuracy had dropped to around 83%.
Misconception 2: new LLMs hallucinate less
If you tried ChatGPT when it first launched in late 2022 and were disappointed by its outputs — hallucinated or otherwise — and have since abandoned LLMs altogether, you might be surprised at how much better the latest models perform today, just a few years on. Broadly speaking, LLMs have improved considerably: they hallucinate less often, and they’re better at refusing to answer when they don’t actually know the answer. That said, as discussed before, hallucinations cannot be eliminated entirely, and even the latest LLMs still produce them.
What’s particularly interesting, however, is that the newest LLMs are not necessarily less prone to hallucinations than their predecessors. By 2025, much of the focus in LLM development has shifted to so-called “reasoning” models, such as OpenAI’s o-series. For instance, their newer o3 model has been found to hallucinate more — sometimes twice as much — than the earlier o1 and non-reasoning models like GPT-4o. This stems from the use of a new form of “reinforcement learning” training technique, which has made the models less consistent in practice: some outputs are outstanding, while others fail disastrously, riddled with odd hallucinations — even in cases where older models had previously produced good results for the very same prompts.
So, although LLMs have generally improved and hallucinate less overall, the usage of new techniques to develop their capabilities sometimes means taking a few steps backwards when it comes to hallucinations.
From Abstract to Concrete: The Mitigation Strategy.
Every Leader needs to understand what LLM hallucinations are, in order to build relevant mitigation strategies.
Mitigation strategy 1: context engineering
LLMs perform well within the domains covered by their training data, but if you prompt a model on something beyond that scope, it will begin to hallucinate. Without having to train your own LLM, you can also “teach” the model something new by including that new knowledge within the context of your input prompt. Chances are your company has developed its own lingo over the years, or operates in a niche business area that general LLMs haven’t been trained on. That’s why carefully engineering the context of your input prompt is crucial if you want the model to understand the particularities of your business.
Context engineering involves designing and building a dynamic system that can deliver the right information, examples and tools at the right time, so the LLM has everything it needs to complete the task successfully. A key part of this is retrieving relevant information from your company’s internal knowledge bases — which is often where the real problems lie.
It’s highly likely that your internal documents and other information sources are of poor quality, or even contain inaccuracies. No wonder the LLM hallucinates if the context you feed it is filled with errors. Today’s LLMs are already capable enough that, when supplied with high-quality, relevant context, they very rarely hallucinate. So rather than waiting around for future LLMs to magically solve hallucinations, start by fixing your company’s data.
Mitigation strategy 2: process redesigning
Since the AI boom, many have rushed to retrofit LLMs onto existing processes. Without understanding the fundamentals — such as the fact that LLM hallucinations can never be completely eliminated and that their errors compound across multi-step workflows — these retrofitted solutions have resulted in hallucinated legal cases and chatbots offering customers false discounts.
While retrofitting can create value if executed well, a more effective approach is to fundamentally redesign the entire business process and then redistribute the work between humans, LLMs, and traditional software according to their respective strengths. Redesigning processes at this level is far from easy, but this strategic redistribution of work produces far more efficient, scalable, and resilient processes than simply bolting an LLM onto today’s workflows. Accept the inherent unreliability of LLMs, and redesign your processes to manage it.
Key takeaways
Hallucinations cannot be fully resolved with the current LLM architecture, and genuine research breakthroughs will be required to address this. While some of the newer LLMs — particularly the so-called “reasoning” models — may in fact hallucinate more than earlier ones, overall the newer models tend to perform better. If you stopped using LLMs after disappointing experiences a few years ago, it’s worth updating your perspective.
Errors in LLM outputs compound over time when used across long-horizon workflows. This is why redesigning business processes has consistently led to the most valuable results. A redesigned process should also include a continuous feedback loop that improves your organisation’s internal data over time, as high-quality data is essential for engineering the right context for the LLM to complete tasks successfully.
Hallucinations are not roadblocks to AI adoption but rather design constraints that demand thoughtful engineering. Successful outcomes are certainly achievable when AI and LLMs are approached with the right strategy.