Why Starbucks pulled AI from 11,000 stores: the shop floor leadership couldn't see

Nine months after putting an AI counting tool in roughly 11,000 stores, Starbucks pulled it. The tool was sold at 99% accuracy. On the floor, a small miscount meant staff recounted everything by hand, so the AI added work instead of removing it. The accuracy was never the real problem. The gap between what leadership saw and how the work actually runs was.

The story everyone is telling about this is that an AI counting tool was not accurate enough, so Starbucks killed it, and the lesson is that AI inventory counting just is not ready yet. The accuracy number was never the test the rollout had to pass. A tool was chosen at the center and pushed to roughly 11,000 stores without anyone at the top having a clear view of how counting actually happens at the back of a busy store, and that blind spot is what decided it. The reporting gives us the facts cleanly. In September 2025, as part of the Back to Starbucks turnaround under CEO Brian Niccol, the company rolled out NomadGo's computer-vision tool, Automated Counting, across its North American stores, marketed at 99% accuracy. About nine months later, Starbucks retired it, because in practice it miscounted and mislabeled often enough that store staff went back to counting by hand. The news broke on May 21, 2026, in CNBC reporting that cited Reuters, with follow-up from Fortune a week later.

What the headline says, and what it quietly admits

Read at face value, the headline is a verdict on the technology: the tool was wrong too often, so it had to go. That reading is comfortable because it puts the blame on the model and lets everyone else off the hook, and it carries an implied excuse that any leader writing checks for AI will want to reach for, which is that the category is simply premature. The same sentence admits something larger if you sit with it. A tool that ships at 99% accuracy and still gets pulled from 11,000 stores failed on contact with how the work is actually done, in thousands of rooms that the people who approved it had never stood in during a real count. The headline reads as a story about a counting tool, but what it quietly admits is a gap between the people deciding and the people doing.

The accuracy number was never the test the rollout had to pass

Ninety-nine percent sounds like a passing grade, and in a slide deck it reads like one. On the floor of a store, the math runs the other way. A counting tool exists to remove the manual count, so its real test is simple: when it finishes, does anyone have to count again. If the answer is yes even once in a while, the tool has added a layer on top of the work instead of removing it. A staff member who cannot trust the number has to verify it, and verifying a machine count against reality means recounting, which is the exact task the tool was bought to delete. That is why a 1% miss does not cost you 1% of the labor. When a small error forces a full manual recount to find it, a tool that is right almost every time still leaves the entire counting job in place, plus the time spent discovering that this was one of the times it was wrong.

Exhibit 1

The number the tool was sold on, and the one that decided it.

99%

Accuracy the counting tool was marketed on.

Hours of counting it actually removed on the floor.

Reported across CNBC (citing Reuters) and Fortune, May 2026. The floor outcome is the reporting on staff reverting to manual counts, not a measured figure.

This is the part the marketed figure could never capture, because it was measuring the wrong thing. Accuracy answered a question about the model, whereas the question that decided the rollout was about the process the model landed in, and nobody had asked it where it could be answered honestly, which is on the floor.

A tool that is right almost every time, in a place that cannot afford to be wrong, just moves the work somewhere quieter.

Leadership was optimizing a process it could not see

The decision to standardize one counting tool across 11,000 stores is a reasonable instinct in the abstract, promising consistency, one vendor, and one number to manage. It only works if the people approving it understand how counting actually runs at the store level, with its odd corners, its mislabeled cases, its rushes when a count gets squeezed between customers. The reporting suggests that understanding was missing, because the tool that looked clean from the center turned out to fight the work at the edge. This is the same blind spot that sits underneath most AI investment that produces no measurable return. McKinsey's State of AI 2025 found that 61% of organizations report no measurable EBIT impact from their AI spend, and the common thread is rarely the technology. It is that the tool was chosen against a picture of the work that leadership held in its head rather than the work as it is actually performed. You cannot optimize a process you cannot see, and a process you only see in summary is one you cannot see.

Every AI investment lands on someone, and they decide if it lives

However a rollout is approved, it eventually arrives in the hands of a person doing a task, and that person quietly renders the real verdict. At Starbucks, that person was a store employee with a count to finish and customers waiting, and when the tool's number could not be trusted, the rational move was to ignore it and count by hand. Multiply that decision across thousands of stores and you have the reversal, settled on ten thousand floors one shift at a time rather than in a boardroom. The uncomfortable mechanism here is that this verdict travels slowly upward, if it travels at all, so while the people on the floor knew within weeks, the decision to retire the tool took about nine months. That lag is the cost of a rollout designed without a clear line of sight to the people it lands on, and it is the lag that turns a small process mismatch into a 11,000-store reversal.

The uncomfortable question worth sitting with

If your next AI rollout quietly doubled the work of the people on the floor, how long before you would hear about it, and through whom? The honest answer for most companies is months, routed through whoever is brave enough to say the expensive thing out loud, which means the number that would tell you is exactly the number you do not have. Starbucks had real talent, a serious vendor, and a tool that passed its own accuracy test, and it still spent nine months running a counting tool that removed no counting. The failure was not the model. It was that the work the tool had to survive was never fully visible to the people who funded it, and a number that high can hide a process that broken for a long time before anyone with the authority to stop it can see why.