Anatomy of a withdrawn AI report: the verification step nobody built in

A firm of roughly 270,000 people that sells AI advisory had to pull its flagship report after an outside check found only five of forty-five citations held up and four named organizations said the case studies about them were wrong. The story reads as an AI hallucination. It is really about the human review step that was never built into the work.

The instructive part of this story is rarely the part people reach for first. Yes, an AI model made things up, but models do that, and everyone who works with them knows it. What actually deserves attention is that a firm whose business is advising other companies on AI shipped a flagship, branded report to the public with nothing standing between an AI draft and the printed page. The hallucination itself was ordinary, and it was the missing review stage that turned an ordinary model behaviour into a public liability, which makes it an operating-model question every leadership team now faces.

What happened, and what makes it worse

In October 2025, KPMG published a report titled Total Experience: Redefining Excellence in the Age of Agentic AI. By mid-June 2026 the firm had removed it from its websites while it investigated, and the reason it had to is unflattering. GPTZero, an outside tool, sampled the report's citations and found that only five of forty-five pointed correctly to their source. It called the pattern vibe citing: references that look authoritative and read plausibly, attached to material that does not actually say what the report claims. The footnotes carried the confidence of real scholarship and the substance of guesswork.

The citation problem would have been embarrassing on its own. What deepened it is that four named organizations, UBS, NHS Greater Manchester, Swiss Federal Railways, and Transport for London, came forward to say the case studies written about them were inaccurate, factually incorrect, or misleading. The report had not only cited sources that did not check out, it had described real institutions doing things they say they did not do, and those institutions were reachable, on the record, and willing to correct it in public. For a firm that is one of the Big Four and sells AI advisory, the document meant to demonstrate command of the subject became a demonstration of the opposite.

The model did its job, the workflow did not

It is worth being precise about where the failure sits, because the convenient reading lets too many people off the hook. A language model that fabricates a citation and writes a confident case study about an organization it has never examined is behaving exactly as designed. That is the expected output of these tools when they are asked to produce something they cannot ground, and it will keep being the expected output for the foreseeable future, so anyone who builds a process on the assumption that the model will simply stop doing this has misunderstood the tool. The model did exactly what models do.

The breakdown happened one step later, in the place where a person was supposed to be and was not. Somewhere in the path from draft to publication, a human being should have pulled five of those forty-five citations and checked whether they led anywhere, and called two of the four named organizations to ask whether the story was true. That check is dull, slow, and entirely human, and on this report nobody owned it. The output reached the public carrying the firm's name because the review stage that would have caught it had never been built into how the work got made.

A draft becomes a document the moment someone is willing to put their name on it. That step is the one no model performs for you.

Bolting AI on is how the gap gets built in

The reason this is worth more than a moment of schadenfreude is that the same gap is sitting, unnoticed, inside most companies right now. When a team adds an AI tool to a workflow that was designed for human pace and human volume, it speeds up the production of drafts without adding any new capacity to verify them. The old workflow assumed that the person writing the document was also the person who knew whether it was true, because writing it slowly was how they came to know. AI severs that link. The writing gets fast and the knowing does not come along with it, so a step that used to happen automatically now has to be designed in on purpose, and almost nobody designs it in.

What you get instead is liability produced at speed. The volume of plausible-looking output goes up, the rate at which anyone is checking it stays where it was, and the gap between the two fills with exactly the kind of error that ended up on KPMG's website. The blame belongs to neither the tool nor the person who used it; it belongs to a workflow that was never rebuilt to account for what changes when a machine can generate a finished-looking artefact faster than any human can confirm it. An AI capability dropped into an unchanged operating model does worse than fail to help. It manufactures risk, and it does so quietly, until the day an outside party runs the check the company skipped.

Trust is an operating-model property

The lesson leaders are tempted to draw from this is narrow: proofread your AI. That advice is true and almost useless, because it treats verification as a personal habit rather than a structural commitment. Whether AI output can be trusted inside a company turns less on how careful any individual happens to be on a given afternoon than on whether the way the company runs has a named, accountable place where output is checked before it carries weight, and whether that place is staffed, resourced, and impossible to skip when a deadline is tight.

That is what makes this an operating-model question and not a quality-control footnote. A report can be withdrawn and reissued. The harder thing to fix is the absence of a stage that should have existed, because adding it back means redrawing who is responsible for what, when in the process the check happens, and what is allowed to ship without it. The companies that come through the next few years with their credibility intact will be the ones that treated the arrival of fast AI output as a reason to rebuild that part of how they work, so that trust is a property of the operating model itself rather than a thing they hope each employee remembers to supply.

Exhibit 1

The check that was never run.

Citations an outside tool sampled in the report.

That correctly pointed to their source.

GPTZero analysis, reported by The Register and the Financial Times, June 2026.

The numbers are the cleanest way to see the absence. An outside tool pulled forty-five citations and found that five of them led where they claimed to. No one inside the process had done that arithmetic before the report went out, which is the whole point: the check was available, it was cheap, and it simply was not part of the work.

The uncomfortable question worth sitting with

Where in your company does AI output already reach a customer, a board, or a regulator without a named human accountable for checking it first? The honest answer for most organizations is that they do not know, because the AI tools arrived faster than anyone mapped where their output ends up, and the assumption that someone, somewhere, is still reading it carefully has quietly stopped being true. The report that got pulled belonged to a firm with deep expertise, formal review structures, and every reason to be careful, and the step still went missing. The question worth sitting with is not whether your people are diligent. It is whether the way your company runs would have caught this one, or whether you would have found out the way KPMG did.