Fluent, Confident, and Wrong
Thirteen enterprise AI failures later, a pattern no board is naming: the executive who cannot evaluate what they are deploying.
I have sat in enough boardroom AI briefings to recognise the tell. It is not anything the presenter says. It is the questions the senior AI leader does not ask.
Not: how does this model handle distribution shift over time? Not: what is the per-inference unit cost at production scale? Not: Has anyone independently validated the vendor’s accuracy claims against our actual data environment? Those questions do not come. Instead, the conversation pivots quickly to use-case portfolios, transformation narratives, and roadmap timelines.
The technology investment gets approved. The governance gap stays invisible. And eighteen months later, the organisation is dealing with a write-down, a regulatory reversal, or a public embarrassment that everyone assures themselves was unforeseeable.
It was not unforeseeable. It was predictable at the point of hire.
Thirteen Companies. One Pattern.
Over the past several months, I have systematically reviewed thirteen enterprise AI failures spanning banking, fintech, technology, automotive, retail, entertainment, and logistics — drawing on primary engineering disclosures, regulatory records, corporate filings, and verified press reporting. The cases include organisations that have become cautionary industry shorthand: IBM Watson Health, Klarna, Zillow, General Motors Cruise, McDonald’s automated ordering, Air Canada’s chatbot, and our very own Commonwealth Bank of Australia’s voice-bot reversal. And more recent disclosures from Uber, Microsoft, JPMorgan Chase, LinkedIn, Netflix, and eBay.
The question I was trying to answer was not ‘what went wrong technically’. The technical failures were usually well-documented. The question was: who was in charge, and what did their background tell us about why the failure took the shape it did?
The pattern that emerged is consistent enough that I am prepared to state it directly.
Leaders with deep technical credentials but limited business formation tend to produce recoverable failures. Leaders with deep business credentials but limited technical formation tend to produce categorical ones. The categorical failures are larger, more expensive, and harder to reverse.
This is not an argument that technical leaders are better executives. It is an argument about the specific failure modes of AI governance and which credential gap is more dangerous in that context.
What a Recoverable Failure Looks Like
In April 2026, Uber’s Chief Technology Officer Praveen Neppalli Naga disclosed to reporters at The Information that the company had exhausted its entire planned 2026 AI tools budget within four months of the fiscal year. The culprit was Claude Code, Anthropic’s agentic coding assistant, rolled out to Uber’s engineering division in December 2025. Adoption accelerated faster than anyone modelled: from around 32 per cent of engineers in February to 84 per cent in March and approximately 95 per cent by April. Around 70 per cent of committed code was AI-generated by the time the budget ran out.
Per-engineer monthly costs ranged from $150 to $250 for typical users, rising to $500 to $2,000 for heavy users. According to Fortune, in one documented instance, the CTO himself generated $1,200 in token costs within two hours during a hands-on demonstration. A compounding factor was structural: Uber had built internal leaderboards ranking teams by Claude Code activity levels, optimising for token volume rather than business output. Uber’s Chief Operating Officer described the CTO’s disclosure as a ‘head-exploding moment’, acknowledging that nobody had established a link between rising token consumption and shipped consumer features.
The governance response — a per-employee monthly cap of $1,500, enforced through an internal dashboard with executive approval required to exceed the limit — was implemented in weeks. It was a one-meeting correction.
Neppalli Naga’s profile is that of a deep systems engineer: Master of Science in Computer Science, seven years at LinkedIn building core data architecture, nine years of engineering leadership at Uber. The leaderboard incentive that drove the overrun is a recognisable engineering-management artefact: optimising for adoption metrics rather than unit economics. The fix was bounded and immediate because the leader understood exactly what had gone wrong.
Microsoft’s parallel situation — cancelling most Claude Code licences for its Experiences and Devices division and moving engineers back to GitHub Copilot CLI — reflects a technically informed trade-off between tool costs and strategic product alignment. Kevin Scott, Microsoft’s CTO, holds multiple patents, peer-reviewed publications, and board-level commercial exposure across three major technology organisations. The decision was a governance correction, not a governance failure.
LinkedIn’s engineering response to inference cost pressure — domain-adapted models, change-detection-optimised embedding pipelines, nearline-first serving architectures achieving up to 3x lower embedding cost — reflects the decisions of a research-trained leader, Deepak Agarwal, who holds a doctorate in Statistics, a fellowship from the American Statistical Association, and company-wide AI accountability across Yahoo, LinkedIn, and Pinterest. When inference economics became a design constraint, LinkedIn solved the engineering problem.
What a Categorical Failure Looks Like
IBM invested more than $4 billion across four healthcare-data acquisitions to build Watson Health. According to reporting by STAT News, Watson for Oncology attracted sustained critical scrutiny from 2017 for unsafe and incorrect cancer-treatment recommendations — a significant gap between IBM’s marketing representations and actual clinical adoption. IBM’s Q2 2022 SEC filing disclosed a cash receipt of $1.065 billion from Francisco Partners for the Watson Health assets. The net loss on the programme: approximately $3 billion before internal operating costs.
Deborah DiSanzo, the General Manager of IBM Watson Health from 2015 to 2018, held an MBA from Babson College and had previously served as CEO of Philips Healthcare, a €10 billion healthcare-informatics business. IBM had selected a commercial executive from the healthcare industry, betting that the product’s primary challenge was sales and distribution. The actual failure mode — Watson’s natural language processing could not reliably interpret clinical narratives — was a fundamental technical limitation that DiSanzo was not positioned to diagnose or correct.
At Klarna, CEO Sebastian Siemiatkowski — an economics graduate with no engineering credentials on record — launched an OpenAI-powered AI assistant in January 2024 that he claimed was handling the equivalent work of 700 full-time agents. Headcount fell from approximately 5,527 to approximately 3,422. By May 2025, Siemiatkowski acknowledged publicly to Bloomberg that ‘cost unfortunately seems to have been too predominant an evaluation factor’ and that the all-AI model had produced ‘lower quality’ outcomes. Klarna began rehiring human agents. The company’s NYSE IPO in September 2025 was priced at $40 per share — substantially below its 2021 peak valuation.
Commonwealth Bank of Australia announced 45 customer service redundancies in July 2025, citing an AI voice bot it claimed had reduced call volumes by 2,000 per week. The Finance Sector Union challenged the claim before the workplace-relations tribunal, adducing evidence that call volumes had in fact risen. On 21 August 2025, according to Bloomberg, CBA reversed the decision, reinstated the roles, and conceded they had not been genuinely redundant. The Group CIO accountable during this period, Gavin Munroe, departed on 22 December 2025. CBA subsequently appointed its first-ever dedicated Chief AI Officer — a structural acknowledgment that the prior governance model had been insufficient.
McDonald’s tested IBM’s Automated Order Taking voice AI at more than 100 US drive-thru locations between 2021 and 2024. The programme was terminated in July 2024 following sustained failures — including the system appending bacon to ice cream orders and generating 260-piece McNugget orders. McDonald’s Global CIO at the time, Brian Rice, held a Bachelor of Science in Computer Information Systems and a 30-year career spanning enterprise IT and CIO roles with no postgraduate credentials and no hands-on AI or machine learning experience on record. His prior AI exposure had been an IBM Watson-powered project at Kellogg. He then reprised the IBM-vendor pattern at McDonald’s.
A leader with hands-on experience in natural language processing would have stress-tested the system’s ambient-noise performance before committing to 100 locations. That evaluation requires technical depth that AI literacy training does not produce.
The Asymmetry That Matters
The table below consolidates the thirteen cases. The depth ratings are qualitative, drawn from primary sources — corporate biographies, academic profiles, patent records, published research, and regulatory filings. Where sources were uncertain, I noted it explicitly.
Leader profiles during the failure period:
Technical High / Business High: Agarwal (LinkedIn) · Mekel-Bobrov (eBay) · Scott (Microsoft) · Elshenawy (GM Cruise, inherited) Technical High / Business Low–Med: Humphries (Zillow) · Kumar (Netflix) · Neppalli Naga (Uber) Technical Low–Med / Business High: Munroe (CBA) · Voris (Disney) · Heitsenrether (JPMorgan) · DiSanzo (IBM Watson) · Rice (McDonald’s) · Siemiatkowski (Klarna)
The direction of the pattern is clear. Dual-depth leaders — those with documented credentials in both technical and commercial domains — either avoided categorical failure or produced recoverable corrections. Single-depth technical leaders produced bounded financial failures: the Zillow write-down was mathematically bounded by the concept drift in a housing market that moved faster than the model’s monthly refresh cycle; the Uber overrun was a sprint-cycle miscalculation; Netflix excluded reasoning models from production when they increased costs without proportional quality gain.
Single-depth business leaders produced structural failures: categorical mis-purchases of AI capability that could not be diagnosed or corrected without replacing the leader. IBM Watson Health wrote off $3 billion. Klarna reversed a 38 per cent headcount reduction within eighteen months of the announcement that created it. CBA required court-adjacent intervention to undo redundancies its AI system had not actually justified. McDonald’s terminated a three-year partnership with IBM after viral customer failures.
$3B+
net loss on IBM Watson Health after $4 billion in acquisitions and a sale for $1.065 billion
IBM 10-Q, Q2 2022; Francisco Partners transaction disclosure
The categorical failures share a structural explanation: a leader who lacked the technical depth to evaluate the original decision also lacked the depth to diagnose what went wrong. The same credential gap that enabled the mistake prevented the correction.
Why the Reverse Path Is Harder
I want to address the obvious counterargument, because I hear it in boardrooms regularly. The argument goes: business and financial rigour are equally hard to acquire late in a career. Why is the upskilling direction asymmetric?
The answer is structural, not personal.
Formal business education — financial modelling, corporate strategy, capital allocation, stakeholder management, P&L accountability — is designed to be taught. Executive MBA programmes, Master of Engineering Management degrees, and rotational operating roles are specifically built to translate quantitative expertise into commercial frameworks. A technically credentialed leader who spends two to four years in structured business formation can genuinely acquire the vocabulary and the discipline of commercial decision-making. According to research from Rice University and Johns Hopkins Engineering, this pathway is well-established and demonstrably effective.
The reverse pathway faces a different constraint. Acquiring technical AI knowledge sufficient to challenge a vendor on model limitations, evaluate concept drift risk, understand token economics at the level of inference architecture, or assess whether a proprietary natural language processing system can reliably parse clinical text is not achievable through AI literacy programmes. According to McKinsey’s research on digital skill building, literacy training builds vocabulary and capacity for use-case identification. It does not build evaluative depth.
AI literacy training tells you what a large language model is. It does not give you the instinct to ask whether this particular model will behave this way in this environment at this scale — and to know what the answer means.
There is also a structural visibility argument. In modern enterprises, the technology division is frequently the only function with an end-to-end view of how data flows across the organisation. Technically credentialed leaders tend to arrive at the executive level with an embedded operational understanding of the business — they have had to understand the business to build systems that serve it. The converse is rarely true.
I want to acknowledge the qualification that Dartmouth Tuck School research introduces: technically trained leaders can exhibit a bias toward research sophistication over commercial execution, and markets have responded positively when commercially focused successors have inherited STEM-focused leaders. This does not refute the asymmetry. It refines it. The question is not whether technical leaders are always superior executives. It is whether they are better positioned to govern AI specifically — where the failure modes are predominantly technical in origin, even when they manifest as financial outcomes.
The Dyadic Workaround — and Its Limits
Several cases illustrate what I have come to call the dyadic model: rather than expecting a single executive to hold both domains, the organisation pairs a commercially experienced AI leader with a technically credentialed deputy.
CBA’s concurrent appointments of Ranil Boteju as inaugural Chief AI Officer and Mary-Anne Williams as inaugural Chief AI Scientist are the most explicit recent examples. According to CBA’s newsroom announcement, Boteju brings commercial and governance leadership; Williams brings research and technical credibility. Together, they are designed to cover what one leader could not.
The dyadic model is a structurally sound workaround. But it is a workaround. The risk is governance ambiguity when technical and commercial authorities disagree — and in AI governance, they will. The organisation with a single dual-depth leader is more robust than one that requires two specialists to achieve the same coverage. And the dyadic model is expensive: it requires two executive salaries, two reporting lines, and a clear protocol for resolving disagreements, which most organisations do not establish up front.
CBA is building that governance architecture now, after the failure that revealed the gap. The more tractable path is building it before the failure.
What Boards Should Actually Do
I am going to be specific here, because I am tired of governance recommendations that generate process without changing decisions.
First: set a credentialing floor for AI executive appointments. Any executive holding the title of Chief AI Officer, Head of AI, Chief Analytics Officer, or equivalent should demonstrate at least one of: a graduate-level qualification in a quantitative discipline, or documented hands-on AI or machine learning engineering experience of seven or more years, evidenced by publications, patents, or significant open-source contributions. This minimum should not be waived on the grounds that the leader will be paired with a technical deputy. Pair them if you want. But do not use the pairing to avoid the floor.
Second: require independent technical sign-off on vendor AI capability claims before operational commitment at scale. A suggested threshold is the lower of $25 million in total contract value or 5% of the annual AI budget. IBM Watson Health, CBA, and McDonald’s-IBM would each have been identified by this condition. A technical reviewer who is neither the vendor nor the AI leader makes a material difference.
Third: add an AI budget realism review to the annual audit cycle. Not a compliance checkbox. A genuine test of whether the AI leader can articulate the per-token or per-inference unit economics of deployed systems, whether concept drift monitoring is in place on models with capital exposure, and whether adoption-rate forecasts are realistic given S-curve dynamics rather than linear projections. Uber’s budget overrun was foreseeable if someone had modelled what 95 per cent adoption of an agentic coding tool actually costs.
The board’s question is not: do we have an AI strategy? It is: does the person responsible for executing that strategy understand the mechanics of the thing they are governing well enough to know when it is going wrong?
According to Altrata’s 2024 census, CAIO appointments grew 70 per cent year-on-year, with most appointees lacking senior experience outside technology. According to IBM’s Institute for Business Value CEO survey published in 2026, 76 per cent of 2,000 surveyed global CEOs claimed a CAIO, up from 26 per cent in 2025. The title is inflating faster than the capability behind it. That is a fiduciary problem.
Concluding Remarks
Every organisation I work with has an AI strategy. Very few have a clear answer to a simpler question: is the person responsible for that strategy equipped to know when it is going wrong before the write-down, the reversal, or the regulator makes it visible?
The evidence from thirteen enterprise cases points in a consistent direction. The credential gap that matters most is the technical one. Not because technical leaders are inherently better executives — they are not — but because the failure modes of AI governance are predominantly technical in origin, and the correction of a technical failure requires someone who can diagnose it.
The wrong person in the room does not always generate the wrong answer immediately. Sometimes the answer looks right for eighteen months. Sometimes it generates headlines about transformation and innovation. But when the model fails, when the vendor’s claims meet production reality, when the cost curve bends in a direction the budget did not anticipate — the person in that room needs to understand what they are looking at.
Literacy training will not get them there. Title inflation will not get them there. A well-structured dyadic appointment might — if the governance protocol is right and the authority is clear.
The most tractable path remains what it has always been: hire for technical depth in the AI function, and invest in structured business formation for technically credentialed candidates. This is not complicated. It is just uncomfortable for organisations that have already made the hire.
A Note on the Evidence
I have applied consistent evidentiary standards throughout this analysis. Where exact costs were not disclosed in primary sources, I have noted amounts as unspecified rather than inferring figures. Leader credentials are drawn from corporate biographies, academic secondary sources, and verified business press — not LinkedIn profiles alone. The Netflix VP identification carries lower confidence than other profiles and is noted as such. Where sources conflicted, I have been explicit about the conflict rather than resolving it by assertion.
The sample is, by construction, the publicly visible failures. There exist large enterprises with business-credentialed AI leaders who have not produced publicly documented failures, and with technical AI leaders who have produced failures not covered in available sources. The thesis is supported by the pattern in this sample. It is not a universal empirical law, and I do not claim otherwise.
The 2026 10-K disclosures, EU AI Act compliance filings, and the continued IBM IBV CEO survey data will all provide material evidence toward or against these findings. I will update the analysis as that evidence arrives.
References
All sources verified as of June 2026. Where access is paywalled, citation is to the public record of the article’s existence and disclosed content.
Altrata (2024). 2024 Executive Insight: Chief AI Officers. Cited via IBM Institute for Business Value (2026) CEO Survey 2026.
Bloomberg (2025). Commonwealth Bank of Australia Reverses Move to Replace 45 Jobs With AI, 21 August 2025.
Bloomberg (2025). Klarna CEO Admits AI Job Cuts Went Too Far (Sebastian Siemiatkowski, interview). May 2025.
British Columbia Civil Resolution Tribunal (2024). Moffatt v. Air Canada, 2024 BCCRT 149.
CommBank Newsroom (2026). Done Right, AI Is ‘Invisible’: Meet Ranil Boteju, CommBank Chief AI Officer. May 2026.
CX Today (2024). McDonald’s AOT programme accuracy and termination. Multiple reports, July 2024.
Dolfing, H. (2022). Case Study 20: The $4 Billion AI Failure of IBM Watson for Oncology. henricodolfing.ch.
eBay Innovation Blog (2022). eBay’s Blazingly Fast Billion-Scale Vector Similarity Engine. innovation.ebayinc.com.
eBay Innovation Blog (2023). How eBay Created a Language Model With Three Billion Item Titles. innovation.ebayinc.com.
Fortune (2026). Uber Burned Through Its Entire 2026 AI Budget in Four Months. Kline, D., 26 May 2026.
Heath, A. (2026). Uber President Says AI Spending Is Getting ‘Harder to Justify.’ The Verge, 26 May 2026.
IBM Corporation (2022). Form 10-Q, Q2 2022. U.S. Securities and Exchange Commission EDGAR.
IBM Institute for Business Value (2026). CEO Survey 2026.
Johns Hopkins Engineering for Professionals (n.d.). Master’s in Engineering Management vs MBA. ep.jhu.edu.
Kombib, M. (2026). Uber Burned Its 2026 AI Budget in Four Months. Medium, April 2026.
Laamanen, T. and Wallin, J. (n.d.). Engineers vs. Business People: Who Should Manage High-Tech Firms? Tuck School of Business at Dartmouth.
LinkedIn Engineering Blog (2024–2026). How We Built Domain-Adapted Foundation GenAI Models to Power Our Platform.
LinkedIn Engineering Blog (2025). JUDE: LLM-Based Representation Learning for LinkedIn Job Recommendations.
LinkedIn News (2025). LinkedIn Welcomes Deepak Agarwal as New Chief AI Officer.
McKinsey & Company (2023). We Are All Techies Now: Digital Skill Building for the Future.
McKinsey & Company (2024). Redefine AI Upskilling as a Change Imperative.
MIT Technology Review / EmTech Digital (2025). Speaker: Nitzan Mekel-Bobrov.
Netflix Research (2026). Towards Generalizable and Efficient Large-Scale Generative Recommenders. arXiv preprint arXiv:2605.23312.
NPR (2024). GM to Retreat from Robotaxis and Stop Funding Its Cruise Autonomous Vehicle Unit, 11 December 2024.
PYMNTS (2026). Uber Caps AI Coding Costs After Exhausting Annual Budget, 2 June 2026.
Reuters (2026). Australia’s CBA Flags Surging AI Costs as Tasks Grow Complex, 2 June 2026.
Rice University MEML (2026). Engineering Management vs. MBA. engineering.rice.edu.
Ross, C. and Swetlitz, I. (2017–2022). Multiple reports on IBM Watson for Oncology. STAT News.
The Information (2026). Uber CTO discloses 2026 AI budget exhausted within four months. April 2026.
Warren, T. (2026). Microsoft Cancels Claude Code Licenses, Shifting Developers to GitHub Copilot CLI. Windows Central, 14 May 2026.
Wikipedia (2025). Cruise (Autonomous Vehicle). en.wikipedia.org.


