AI product pricing is cracking: the per-token era is ending

May 19, 2026 5 min read Singrey

By mid-2026 the per-token pricing model for AI products is cracking. Outcome pricing, agent hours, success-guaranteed plans — a short map of what comes next.

Between 2023-25, AI product pricing had a clear default: per token, billed at month-end. It fit the lab's cost structure but never matched the user's sense of value. By mid-2026 the cracks are visible: large platforms are testing new models, smaller products are spinning in different directions.

Why the old model is cracking

Per-token billing assumed three things: users understand how many tokens they're burning, spend is proportional to the value they get, and the bill is predictable. All three turned out to be wrong.

A solo developer who says "this month will be $80" and gets a $340 invoice often just drops the product. For enterprise customers the problem is sharper: the finance team that owns the budget can't model per-token pricing, so procurement drags on.

Candidates for the new model

Three main directions are being tested in the market right now:

• Outcome pricing: pay for the result of the user's job — "one successful code review", "one verified bug fix", "one accepted design iteration". Cognition Devin and several agent products are pushing this.

• Agent hours: pay for how long an agent ran. Like AWS spot instances. When I wrote earlier about Windsurf 2.0 shifting to an Agent Command Center, the trend I flagged is maturing in this direction.

• Flat + success guarantee: monthly flat rate, refunded if accuracy drops below a threshold. Enterprise SaaS practice transplanted into AI.

None are perfect, but all of them sit one step beyond the token model: they make it easier for the user to understand what they're buying.

Platform vs product distinction is sharpening

The interesting split: the model layer (Anthropic, OpenAI, Google) keeps per-token pricing because its cost structure is directly tied to tokens. But products built on top of that layer no longer pass tokens to the end user — they embed tokens inside their own value unit.

When I covered Anthropic's Colossus compute deal with SpaceX, the point I underlined still holds: the model layer is pumping capital to expand capacity, which pushes token prices down over time. The product layer captures that drop as margin rather than passing it straight to users.

What a solo developer should do

In Cubitz I face this same question. My current approach:

• Price the user on output units — "how many X did you produce this month" — and hide the token underneath.

• Pre-filter calls to the expensive model with a small one (I covered this in small models eating the big market).

• Make a flat monthly plan the default — "no surprise bill" trust shortens sales conversations.

Putting token price in the user's face is technically the most transparent path, emotionally the worst experience. In 2026 the products that win are the ones balancing those two best.

Where it's going

My prediction: by 2027 "per token" pricing survives only at the API layer. Every end-user interface prices in jobs, not tokens. That's a sign of AI's product economy maturing into SaaS — not exciting, but healthy.

Singrey's note

As both a consumer and a founder, I never liked per-token pricing. As a founder I can't plan well; as a consumer I can't tell what I'm buying. The products that win will be the ones that find a unit calming both sides. Token is an engineering unit; the product unit shouldn't have to see it.