Your CFO can tell you, to the cent, what the company spent on cloud infrastructure last quarter. They can break down headcount costs by department, software licences by vendor, and travel expenses by region. Ask them what the company spent on AI last month, and you will likely get a blank stare, or worse, a confident answer that is catastrophically wrong.
This is not their fault. The tooling does not exist in most organisations to make AI spend visible at the executive level. What finance teams typically see is a single line item on an AWS or Azure invoice labelled something like “Amazon Bedrock: $47,312.89.” That number tells you almost nothing. It does not tell you whether that money was well spent, which teams drove the cost, or whether you could achieve the same outcomes for half the price.
The companies that will dominate the next phase of AI adoption are the ones that treat LLM spend with the same rigour they apply to every other operational cost. That starts with surfacing the right metrics to the right people.
Here are the five numbers your CFO should be looking at every Monday morning.
1. Cost Per Successful Completion
What it is: The total LLM spend divided by the number of requests that actually achieved their intended purpose.
Why it matters: Raw API spend is a vanity metric. A $50,000 monthly bill means nothing without context. If your customer support agent resolved 200,000 tickets with that spend, you are paying $0.25 per resolution, almost certainly cheaper than a human agent. If it resolved 500 tickets because the other 199,500 failed, hallucinated, or required human escalation anyway, you are paying $100 per resolution and would have been better off hiring temps.
Cost Per Successful Completion forces your teams to define what “success” actually means for each AI use case, and then measure whether they are achieving it economically. It is the single most important metric for determining whether your AI investment is generating returns or burning runway.
What good looks like: This varies wildly by use case, but the trend matters more than the absolute number. If this metric is climbing week over week, something is degrading, prompt quality, model performance after an update, or increased complexity of incoming requests.
2. Token Waste Ratio
What it is: The percentage of tokens consumed that did not contribute to the final output delivered to the user or downstream system.
Why it matters: Most enterprise AI systems are haemorrhaging tokens. Every retry, every failed function call, every overstuffed context window, and every verbose system prompt is money evaporating into the ether. In a typical RAG application, we routinely see 60–80% of input tokens consisting of retrieved context that the model never actually references in its response. You are paying to send the model a 50-page document so it can extract a single paragraph.
Token Waste Ratio makes this invisible tax visible. When your CFO sees that 72% of tokens are being wasted, they will ask the obvious question: “Can we get that number down?” And the answer, almost always, is yes.
What good looks like: Below 40% for chat applications. Below 50% for RAG systems. Anything above 70% is a red flag screaming for prompt optimisation or context windowing improvements.
3. Model-Task Mismatch Rate
What it is: The percentage of API calls where the model used was more capable (and expensive) than the task required.
Why it matters: This is the metric that directly quantifies the “Model Laziness” problem. If your engineering team is routing every request to GPT-5 or Claude Opus regardless of complexity, your Mismatch Rate will be sky-high, and so will your bill.
A healthy organisation should see a Mismatch Rate below 20%. If more than half your calls are mismatched, you are likely overspending by 40–60% with zero impact on output quality.
4. Agent Loop Depth
What it is: The average number of LLM calls an autonomous agent makes before completing (or abandoning) a task.
Why it matters: Agentic AI is the fastest-growing cost vector in enterprise AI, and it is the hardest to predict. A well-designed agent might resolve a task in 3–5 LLM calls. A poorly designed one, or one encountering an edge case, might spiral into 50, 100, or 500 calls before hitting a timeout, each one burning tokens at frontier-model prices.
Even a modest increase in average loop depth from 5 to 8 calls per task represents a 60% cost increase that will never show up in a standard cloud bill. It will just look like “AI costs went up.”
What good looks like: Establish baselines per agent type and track the distribution, not just the average. A mean of 6 with a max of 200 tells a very different story than a mean of 6 with a max of 9. The outliers are where the money hides.
5. Cost Per Business Outcome
What it is: The total AI spend attributed to producing a specific, measurable business result, a closed support ticket, a processed invoice, a generated lead, a completed code review.
Why it matters: This is the metric that bridges the gap between engineering and the boardroom. Your CFO does not care about tokens. They do not care about model versions. They care about unit economics. If AI-powered invoice processing costs $0.12 per invoice compared to $4.50 for manual processing, that is a story the board understands.
| Business Outcome | AI Cost | Manual Cost | Savings |
| Support ticket resolved | $0.25 | $12.00 | 97.9% |
| Invoice processed | $0.12 | $4.50 | 97.3% |
| Code review completed | $2.30 | $35.00 | 93.4% |
| Contract clause flagged | $0.85 | $18.00 | 95.3% |
Without this metric, AI remains a mysterious line item that finance tolerates during good times and targets during cost cuts. With it, AI becomes a quantifiable investment with measurable returns, which is exactly what it should be.
Putting It All Together: The Monday Morning Dashboard
None of these metrics are useful in isolation. The power comes from seeing them together, on a single screen, every Monday morning.
| Metric | This Week | Last Week | Trend |
| Cost / Successful Completion | $0.31 | $0.28 | ↑ 10.7% |
| Token Waste Ratio | 58% | 62% | ↓ 4pts |
| Model-Task Mismatch | 34% | 41% | ↓ 7pts |
| Avg Agent Loop Depth | 6.2 | 5.8 | ↑ 6.9% |
| Cost / Business Outcome | $0.47 | $0.52 | ↓ 9.6% |
A dashboard like this tells a complete story in thirty seconds. The CFO can immediately see that while overall cost per business outcome is improving (good), the cost per completion is creeping up and agent loop depth is rising (investigate). Token waste is improving, probably because the team just optimised their RAG pipeline. Model mismatch is down, suggesting that routing improvements are taking effect.
This is the kind of operational intelligence that turns AI from a speculative expense into a managed investment.
Your AI Bill Deserves the Same Scrutiny as Your Headcount
We have entered an era where AI spend will rival, and for some companies, exceed traditional cloud infrastructure costs. The organisations that thrive will not be the ones that spend the most on AI. They will be the ones that spend the most intelligently.
That starts with measurement. You cannot optimise what you cannot see. And right now, most enterprises are flying blind.
These five metrics give your finance team, your engineering leadership, and your board a shared language for discussing AI economics. They transform vague concerns about “AI costs going up” into specific, actionable insights that drive real optimisation.