98% of Companies Now Track AI Costs. The Other 2% Are About to Learn Why.
By JR Intelligence
Two years ago, 31% of FinOps practitioners managed AI spend as part of their job. Today that number is 98%, according to the State of FinOps 2026 report published by the FinOps Foundation in February. The Foundation didn't just publish a statistic — it changed its own mission statement, from "maximizing the value of cloud" to "maximizing the value of technology," explicitly to accommodate the AI cost management work that now dominates the discipline.
That's not a trend. That's a structural shift in how businesses operate, and it happened in 24 months.
If you're running an SMB with any meaningful AI tooling in place — and 58% of SMBs are, up from 40% in 2024 — the question is no longer whether AI costs need to be managed. It's whether you're the business that figured that out in Q1 2026 or the one that figures it out after the quarterly invoice.
Inference Bill Shock: Why This Isn't Like Your Old SaaS Bills
The businesses getting surprised by AI costs share a common assumption: they treated AI tools like software licenses.
Software licenses are predictable. You pay $X per seat per month. Usage varies, cost doesn't. You can budget it once and move on.
Inference billing doesn't work that way. You pay per token — the units of text your AI processes, both input and output. Add conversational memory to a customer-facing chatbot, and every conversation now sends the full chat history as context with each new message. That's not a flat cost. It grows with every exchange, every session, every user who talks to your bot for more than two minutes.
One mid-market tech firm documented an 8x increase in their monthly inference bill after shipping a conversational memory feature — not because the feature was poorly designed, but because nobody implemented a sliding window to truncate old context. The tokens kept accumulating. The bill reflected it.
A second example: an e-commerce brand running AI-powered product recommendations hit a retry loop bug. When the API returned an error, their integration retried automatically. Ten retries per failed call, at token cost each time. The bug ran for a weekend before anyone noticed. The invoice was not what they expected.
Neither of these is exotic. They're the ordinary failure modes of treating inference like it's compute — deterministic, bounded, predictable. It isn't. And the optimization playbook for cloud infrastructure (rightsizing instances, reserved capacity, spot pricing) doesn't translate. You can't "reserve" inference capacity the same way. You need to govern the workflows that drive consumption.
The Subscription Floor Is Dropping Out
The pricing context matters here. Flat-rate AI subscriptions — the all-you-can-eat model that made budgeting simple — are eroding.
Techaisle's 2026 forecast is direct: usage-based AI pricing is replacing flat subscriptions across the market. OpenAI and Anthropic have been building toward per-token billing for their API products for years. Snowflake and Databricks moved to consumption-based models. Enterprise software vendors are following because that's where the margin is when usage varies 10x between customers.
For SMBs, this means predictable monthly AI bills are becoming the exception. The operational implication: you can't treat AI costs as a fixed overhead line item much longer. They're becoming variable — and variable costs need to be governed differently than fixed ones.
(The pricing model shift itself — from per-seat to per-task, and what it means for SMB buying decisions — is covered in our April 22 piece. The point here is narrower: governance, not purchasing.)
The Discipline Gap Is the Advantage Gap
Here's the bullish case, and it's real.
87% of AI-using small businesses report positive business impact, per Enova's January 2026 report. 75% of SMBs are investing in AI agents, per Salesforce. The adoption is happening, and the businesses doing it are seeing results.
The discipline gap isn't between businesses using AI and those not using it. It's between businesses that treat AI spend like a P&L line item and those treating it like a software subscription they don't need to look at again.
Cost discipline compounds. Every dollar you don't waste on bloated context windows, uncapped retry logic, or inference jobs running on production models when a smaller model would do the same work — that dollar is available for the next AI initiative. The businesses governing their AI costs rigorously are the same ones with budget headroom to ship the next workflow, the next agent, the next capability.
The businesses running inference unchecked are spending that dollar twice: once on the waste, and again on the inefficiency that results from not having budget for what comes next.
That gap widens over time. Cost discipline isn't just accounting hygiene. It's a compounding strategic advantage.
The SMB AI Cost Playbook
The operational steps here are not complicated. They're just not default.
Set inference budgets per workflow, not per tool. A per-tool budget tells you how much you spent on OpenAI. A per-workflow budget tells you how much your customer support bot costs versus your sales prospecting pipeline. The second number is the one that connects to ROI.
Monitor token usage weekly, not monthly. Monthly reviews catch cost problems after the damage is done. Weekly monitoring catches the retry loop on a Monday morning instead of at invoice time. Most inference providers expose usage APIs. If you're not pulling from them, you're flying blind.
Implement sliding windows on any conversational AI. This is the single highest-impact technical decision for businesses running customer-facing chatbots or internal AI assistants. A sliding window truncates old conversation history so context length stays bounded. Without it, every long conversation is a cost that grows without ceiling.
Negotiate usage-based contracts with caps. When your AI vendor offers you a usage-based pricing tier, negotiate a hard cap or a notification threshold before you sign. A cap protects you from the 8x scenario. A notification at 80% of projected spend gives you time to respond. Neither is standard — both are available if you ask.
Track cost-per-outcome, not cost-per-seat. How much does it cost your AI to close a qualified lead? To resolve a support ticket? To generate a campaign draft that converts? These numbers tell you whether your AI investment is working. Cost-per-seat tells you what you signed. The outcome metric is the one worth managing.
None of this requires a dedicated FinOps function. It requires treating AI spend with the same rigor you apply to any other variable cost in the business.
The Bottom Line
The AI adoption wave is real and accelerating — we covered the JPMorgan transaction data on that this morning. Adoption is not the constraint anymore.
The constraint is governance. The businesses winning on AI in 2026 are not necessarily the ones who adopted earliest. They're the ones who built operational discipline around what they adopted.
That's a solvable problem. The FinOps playbook exists. The monitoring tools exist. The contract terms are negotiable. The companies that put these systems in place now are building an advantage that compounds every month — in headroom, in efficiency, and in the ability to move faster on the next AI initiative because they know exactly what the last one cost.
The other 2% will learn eventually. The question is whether you're positioned to benefit when they do.
JR Intelligence works with SMBs on AI operations — from workflow design to cost governance to measurable ROI. If you're not sure what your AI stack actually costs or what it's producing, start with a conversation.
Ready to Build
See what this looks like for your operation.
One audit. We map your workflow, find the leverage, and show you the automated version of your business.