Per-seat pricing is about to break. Here's what running agents actually costs.

GitHub Copilot moved to usage-based billing on June 1. Claude Code reverted its $100/month flat plan after burning through margin in a quarter. Cursor's $20 tier added rate limits everyone hated. Replit's Agent had to introduce credits.

Pattern, not coincidence.

The CFOs I've talked to over the last few months are all asking the same question, and most agent vendors don't have a clean answer for it: what does running this thing actually cost us per month, per user, at steady state?

If you don't know the answer to that, you're about to get a bill you didn't budget for. Or you bought a tool whose vendor is about to go bankrupt and rip the contract up. Both happen.

The 1000x token problem

A chat session with an LLM burns tokens in the hundreds to low thousands. A user types a question, the model writes a paragraph, conversation moves on.

An agent session is a different physics problem. It's reading documents, calling tools, running searches, writing code, validating output, calling more tools, retrying when things fail. Anthropic's own data shows agent runs burning roughly 1000x the tokens of a chat session, with a variance of about 30x between cheap runs and expensive ones.

That variance is the part nobody talks about. Two users running "the same" workflow can produce token bills that differ by 30x. The 95th percentile user isn't 30% more expensive than the median, they can be 20x more expensive.

Per-seat pricing assumes a normal distribution of usage. Agent usage is a power law. The math doesn't work.

The math, with real numbers

Let's run a 50-seat team. Each seat has access to 5 agents. Each user runs maybe 20 agent sessions a week.

50 users x 5 agents x 20 sessions x 4 weeks = 20,000 agent runs a month.

At a conservative average of $0.30 in model costs per run (some are $0.05, some are $5+), that's $6,000/month in raw inference cost. Before margin, before infrastructure, before support.

If the vendor sells this for $30/seat/month, they're collecting $1,500/month against $6,000 in COGS. They're losing $90 per seat.

If they sell it for $50/seat/month, they're collecting $2,500 against $6,000. Still upside-down.

The break-even on a 50-seat deployment with that usage profile is around $120/seat/month, just to cover inference. That's before they pay engineers, before they pay AWS, before anyone takes home a salary.

This is why Cursor added rate limits. Why Copilot moved usage-based. Why Claude Code killed the flat plan. The vendors weren't being greedy. The flat plan was already losing them money on every power user.

The three pricing models that actually survive

What's emerging in 2026 sorts into three buckets.

The capped per-seat. Flat fee per seat, hard usage cap per seat per month, overages billed per token or per run. Predictable for buyers, sustainable for vendors. The downside is friction: power users hit the cap and stop, which kills the exact behavior you wanted to encourage.

Pure usage-based. Buyers pay per run, per token, or per task completed. The cleanest economics on the vendor side, the messiest budgeting on the buyer side. CFOs hate variability. Procurement hates open-ended commitments.

Hybrid. Base platform fee covers seats, governance, audit, and admin. Usage charges sit on top for actual agent runtime. This is where the market is heading. Salesforce, Microsoft, and the serious incumbents have all moved here. The agent platforms that survive 2026 will be on this model.

The pure per-seat plans you see today are either subsidized by VC dollars or about to get re-priced. Both end the same way for the buyer: the price you signed isn't the price you'll pay next year.

Five questions to ask before signing

Before you sign any agent contract longer than 6 months, run these.

What's the unit economics on a high-usage seat? If the vendor can't show you the math on a power user, they don't have a sustainable price. You're not asking them to share trade secrets, you're asking if their business model survives the customer they say they want.

What happens at 5x my projected usage? Does the contract get re-priced? Are there caps? Are there overages? "We'll work with you" is not an answer.

How is runtime cost passed through? Is the LLM model cost baked into the seat price, or are you on the hook for it separately? Vendors who eat model cost completely are going to feel the squeeze first.

What's the renewal pricing trigger? Most agent contracts are getting written with usage-based renewal triggers. You signed at $30/seat. If usage went up, you renew at $80. Read the renewal clause carefully.

Can I see your top customer's bill? You won't get to see the actual number, but the vendor's reaction tells you everything. Confident vendors talk about their power users with pride. Worried vendors change the subject.

What this means for buyers in 2026

If your CFO hasn't asked you "what's our agent runtime budget" yet, they will by Q3. The answer "it's bundled into the seat price" is going to age badly when the renewal comes through.

The buyers I've watched make smart decisions on this are doing two things. They're modeling agent usage like cloud usage (variable, with monitoring and alerts) instead of like SaaS usage (flat, predictable). And they're picking vendors whose pricing model survives a 10x usage scenario without forcing a renegotiation.

That's the model that holds up. Everything else gets repriced eventually.

See how Vybe approaches agent pricing and forecast runtime costs you can actually plan around.