The token-price paradox: why 98% cheaper AI tripled the bill

The cost of running a leading AI model has fallen about 98% since 2022. Over the same stretch, the average enterprise AI bill rose roughly 320%. A new industry body wants to bring discipline to the spend. Cheaper inputs and a standards group both miss the cause: spend runs away when AI is added to work that was never redesigned for it.

The price of running AI keeps falling while the bill for it keeps climbing, and the contradiction is really a tell about where the cost actually comes from. Spend tracks how much of the work runs through AI rather than the price of a token, and when AI gets added to a process that was never redesigned for it, usage grows faster than any price can fall. The news this month gives the paradox a shape worth looking at directly.

Two numbers moving the wrong way

The Next Web reported in early June that the cost of running a GPT-4-equivalent model has dropped to roughly $0.40 per million tokens, down from about $20 per million in late 2022. That is close to a 98% reduction in the price of the raw input. Over the same window, by The Next Web's own aggregate, the average enterprise AI budget went from about $1.2M a year in 2024 to roughly $7M in 2026, a rise of around 320%. Those budget figures are a journalistic estimate rather than a single audited dataset, so treat them as a directional read. The direction is what matters: the input got about fifty times cheaper and the bill more than tripled.

Exhibit 1

Cheaper to run, costlier to own.

98% lower

Price to run a leading-grade model than in 2022, from roughly $20 to $0.40 per million tokens.

320% higher

Average enterprise AI bill from 2024 to 2026, from roughly $1.2M to $7M a year.

The Next Web, June 2026. Budget figures are TNW's aggregate, not a single dataset.

Read on its own, the price drop looks like good news that should arrive on the budget any quarter now. It has not, and the reason it will not is built into how the spending is created in the first place.

Cheaper inputs feed runaway usage

A lower unit price does not lower a bill when the number of units is rising faster than the price is falling. Consumption is the part nobody quotes in the headline, and it is the part that actually sets the invoice. As the price per token fell, the easy move was to point AI at more things: more drafts, more summaries, more agents looping over the same documents, more steps in a process now running a model where a person used to think for a second and stop. Each of those is cheap on its own. Together they are a usage curve that climbs steeper than the price curve drops, which is exactly how an input that got fifty times cheaper still produces a bill that more than tripled.

The deeper cause is that most of this AI is added rather than designed in. A company keeps the workflow it already had and lays a model over the top, so the AI does the same work the old way, only now it runs constantly and meters every call. Nothing about the process was rebuilt to use the capability well, which means there is no natural ceiling on how much it consumes. The spend has no shape because the work underneath it has not changed shape, and a cheaper token simply lets the shapeless usage grow with less friction.

When the input gets cheaper and the bill gets bigger, the price was never the problem.

A standards body cannot fix a usage problem

The Linux Foundation announced on June 3 that it intends to launch a Tokenomics Foundation, a standards body meant to bring FinOps-style cost discipline to AI tokens, with a formal launch planned for July. The instinct behind it is reasonable. When a cost is large and hard to read, a shared language for measuring it helps, and better visibility into token spend is genuinely useful. What a standards body cannot do is decide what the AI is for. It can tell a company in fine detail how many tokens each team burned, and it can make the invoice legible, but legibility is not the same as control. The runaway spend was caused by pointing AI at work that was never redesigned for it, and no amount of measurement reaches back to that decision. A clearer meter on a tap that should never have been left running still leaves the tap running.

Cost discipline is an operating-model decision

Spend comes back under control when a company decides what the AI is actually for and rebuilds the work around that answer, rather than waiting for a cheaper input or a new foundation to do it. The leaders who keep the bill in proportion to the value are not the ones who negotiated the lowest token price but the ones who chose a small number of places where AI genuinely changes the economics of a process, redesigned those processes so the AI is doing fewer things that matter more, and let everything else stay manual until it earns a place. That is an operating-model choice, made before any tool is bought. It puts a natural ceiling on usage because the work itself defines how much AI it needs, so spend tracks value rather than how easy it has become to call a model one more time. Cheaper tokens make a well-designed operating model slightly cheaper to run, and they make a badly designed one far more expensive, because they remove the last bit of friction that was holding the usage back.

The uncomfortable question worth sitting with

If the price of every token went to zero tomorrow, would your AI spend make more sense, or would it just grow faster?

The honest answer sorts companies into two groups more cleanly than any maturity model. If free tokens would make the spend more sensible, the work has already been designed around the capability and the price was a real constraint. If free tokens would only make the bill climb faster, then the price was never the thing holding the cost in check, and no drop in it, and no standards body measuring it, will change where the spend ends up. The number on the invoice was always a decision about what the AI is for, made or skipped long before the first token was billed.