AI Cost Management

5 LLM Cost Metrics
Every CFO Should See
on Monday Morning

Cost Per Completion · Token Waste Ratio · Model-Task Mismatch · Agent Loop Depth · Cost Per Business Outcome

17 February 202510 min read

Evidence note: This article is a practitioner perspective. Scenarios and figures are illustrative unless the article links to an original source. See the sourced research hub for external evidence reviews.

Your CFO can tell you, to the cent, what the company spent on cloud infrastructure last quarter. They can break down headcount costs by department, software licences by vendor, and travel expenses by region. Ask them what the company spent on AI last month, and you will likely get a blank stare, or worse, a confident answer that is catastrophically wrong.

This is not their fault. The tooling does not exist in most organisations to make AI spend visible at the executive level. What finance teams typically see is a single line item on an AWS or Azure invoice labelled something like “Amazon Bedrock: $47,312.89.” That number tells you almost nothing. It does not tell you whether that money was well spent, which teams drove the cost, or whether you could achieve the same outcomes for half the price.

The companies that will dominate the next phase of AI adoption are the ones that treat LLM spend with the same rigour they apply to every other operational cost. That starts with surfacing the right metrics to the right people.

Here are the five numbers your CFO should be looking at every Monday morning.

1. Cost Per Successful Completion

What it is: The total LLM spend divided by the number of requests that actually achieved their intended purpose.

Why it matters: Raw API spend is a vanity metric. A $50,000 monthly bill means nothing without context. If your customer support agent resolved 200,000 tickets with that spend, you are paying $0.25 per resolution, almost certainly cheaper than a human agent. If it resolved 500 tickets because the other 199,500 failed, hallucinated, or required human escalation anyway, you are paying $100 per resolution and would have been better off hiring temps.

Cost Per Successful Completion forces your teams to define what “success” actually means for each AI use case, and then measure whether they are achieving it economically. It is the single most important metric for determining whether your AI investment is generating returns or burning runway.

What good looks like: This varies wildly by use case, but the trend matters more than the absolute number. If this metric is climbing week over week, something is degrading, prompt quality, model performance after an update, or increased complexity of incoming requests.

2. Token Waste Ratio

What it is: The percentage of tokens consumed that did not contribute to the final output delivered to the user or downstream system.

Why it matters: Retries, failed function calls, overstuffed context windows, and verbose system prompts can all create avoidable cost. In a RAG application, retrieved context should be tested for whether it contributes to the final output rather than assumed to be useful.

Token Waste Ratio makes this invisible tax visible. When your CFO sees that 72% of tokens are being wasted, they will ask the obvious question: “Can we get that number down?” And the answer, almost always, is yes.

What good looks like: Establish an internal baseline by use case, test whether context contributes to output quality, and investigate material deterioration or unexplained variance.

3. Model-Task Mismatch Rate

What it is: The percentage of API calls where the model used was more capable (and expensive) than the task required.

Why it matters: This is the metric that directly quantifies the “Model Laziness” problem. If your engineering team is routing every request to GPT-5 or Claude Opus regardless of complexity, your Mismatch Rate will be sky-high, and so will your bill.

A healthy organisation should define an acceptable mismatch rate based on task requirements and tested output quality. A material mismatch rate creates a clear investigation and optimisation opportunity.

4. Agent Loop Depth

What it is: The average number of LLM calls an autonomous agent makes before completing (or abandoning) a task.

Why it matters: Agentic AI is the fastest-growing cost vector in enterprise AI, and it is the hardest to predict. A well-designed agent might resolve a task in 3–5 LLM calls. A poorly designed one, or one encountering an edge case, might spiral into 50, 100, or 500 calls before hitting a timeout, each one burning tokens at frontier-model prices.

Even a modest increase in average loop depth from 5 to 8 calls per task represents a 60% cost increase that will never show up in a standard cloud bill. It will just look like “AI costs went up.”

What good looks like: Establish baselines per agent type and track the distribution, not just the average. A mean of 6 with a max of 200 tells a very different story than a mean of 6 with a max of 9. The outliers are where the money hides.

5. Cost Per Business Outcome

What it is: The total AI spend attributed to producing a specific, measurable business result, a closed support ticket, a processed invoice, a generated lead, a completed code review.

Why it matters: This is the metric that bridges the gap between engineering and the boardroom. Your CFO does not care about tokens. They do not care about model versions. They care about unit economics. If AI-powered invoice processing costs $0.12 per invoice compared to $4.50 for manual processing, that is a story the board understands.

Business Outcome	AI Cost	Manual Cost	Savings
Support ticket resolved	$0.25	$12.00	97.9%
Invoice processed	$0.12	$4.50	97.3%
Code review completed	$2.30	$35.00	93.4%
Contract clause flagged	$0.85	$18.00	95.3%

Without this metric, AI remains a mysterious line item that finance tolerates during good times and targets during cost cuts. With it, AI becomes a quantifiable investment with measurable returns, which is exactly what it should be.

Putting It All Together: The Monday Morning Dashboard

None of these metrics are useful in isolation. The power comes from seeing them together, on a single screen, every Monday morning.

Metric	This Week	Last Week	Trend
Cost / Successful Completion	$0.31	$0.28	↑ 10.7%
Token Waste Ratio	58%	62%	↓ 4pts
Model-Task Mismatch	34%	41%	↓ 7pts
Avg Agent Loop Depth	6.2	5.8	↑ 6.9%
Cost / Business Outcome	$0.47	$0.52	↓ 9.6%

A dashboard like this tells a complete story in thirty seconds. The CFO can immediately see that while overall cost per business outcome is improving (good), the cost per completion is creeping up and agent loop depth is rising (investigate). Token waste is improving, probably because the team just optimised their RAG pipeline. Model mismatch is down, suggesting that routing improvements are taking effect.

This is the kind of operational intelligence that turns AI from a speculative expense into a managed investment.

Your AI Bill Deserves the Same Scrutiny as Your Headcount

We have entered an era where AI spend will rival, and for some companies, exceed traditional cloud infrastructure costs. The organisations that thrive will not be the ones that spend the most on AI. They will be the ones that spend the most intelligently.

That starts with measurement. You cannot optimise what you cannot see, and many leaders still lack task-level evidence they can act on.

These five metrics give your finance team, your engineering leadership, and your board a shared language for discussing AI economics. They transform vague concerns about “AI costs going up” into specific, actionable insights that drive real optimisation.

Diagnose the Opportunity

Put cost and operational signals into a wider improvement cycle.

Take the short indicative assessment and receive a maturity stage with practical next actions.

Take the Assessment Book a Demo

Start a conversation

5 LLM Cost Metrics Every CFO Should See on Monday Morning: Your CFO can tell you, to the cent, what the company spent on cloud infrastructure last quarter. Ask them what the company spent on AI last month, and you will likely get a blank stare.

Share on LinkedIn

Take the Assessment

Related Insights

Continue the conversation

View all insights

AI Cost Management

GitHub Copilot Just Ended An Era. Your Other AI Tools Are Next.

On June 1, 2026, GitHub Copilot moved every plan from flat-rate request limits to token-metered AI Credits. It looks like a pricing tweak, but it is a leading indicator: usage-based billing is coming to every agentic AI tool, and most enterprises have no visibility into the two-sided spend problem it exposes.

AI Cost Management

Observability ≠ Intelligence: Why Your LLM Monitoring Tool Won’t Save You Money

The entire LLM tooling market has converged on a single promise: visibility. But somewhere along the way, the industry started treating “you can see what’s happening” as synonymous with “you can fix what’s happening.” They are not the same thing.

AI Cost Management

Self-Fund Your AI: How FinOps Teams Are Using Optimisation Savings to Pay for AI Investment

There’s a conversation happening in boardrooms right now: “We need to invest more in AI. Where’s the budget coming from?” The data suggests the answer is already in your existing AI spend.

5 LLM Cost MetricsEvery CFO Should Seeon Monday Morning

1. Cost Per Successful Completion

2. Token Waste Ratio

3. Model-Task Mismatch Rate

4. Agent Loop Depth

5. Cost Per Business Outcome

Putting It All Together: The Monday Morning Dashboard

Your AI Bill Deserves the Same Scrutiny as Your Headcount

Put cost and operational signals into a wider improvement cycle.

Continue the conversation

GitHub Copilot Just Ended An Era. Your Other AI Tools Are Next.

Observability ≠ Intelligence: Why Your LLM Monitoring Tool Won’t Save You Money

Self-Fund Your AI: How FinOps Teams Are Using Optimisation Savings to Pay for AI Investment

5 LLM Cost Metrics
Every CFO Should See
on Monday Morning