We Keep Calling It the Wrong Thing
Digital transformation. AI transformation. We have been naming the symptom and ignoring the disease. What organisations actually need — and almost none have built — is something harder, quieter, and far more generative: a transformation of thinking.
It was a strategy offsite. One of those two-day affairs in a regional retreat where the real decisions get made in the margins of the formal programme. I was there in an advisory capacity, and for most of the first morning, the conversation tracked exactly as you would expect: digital transformation roadmap, AI use cases, data platforms, vendor selection.
Then a quiet executive — head of operations, twenty-two years with the organisation — asked a question that stopped the room.
“We spent four years on digital transformation. Then we spent two years on AI transformation. We changed the systems. We changed the tools. We changed the org chart. But I still see the same decisions being made in the same ways for the same reasons. What exactly did we transform?”
Nobody had a clean answer.
I have thought about that question many times since. Because it is not just the right question for that organisation. It is the right question for the whole industry. And the honest answer — the one the data actually supports — is uncomfortable: most organisations have transformed their technology stack while leaving their thinking almost entirely untouched.
That is not a cultural problem. It is not a change management problem. It is a cognitive and epistemological problem. And until we name it correctly, we will keep designing programmes that address the wrong layer.
The Failure Rate Nobody Interrogates
Let me start with a statistic that almost everyone in this space has quoted at some point and almost no one has examined.
“Seventy per cent of transformation programmes fail.” You have probably said it in a presentation. I have said it in presentations. It appears in McKinsey decks, BCG reports, Kotter keynotes, and roughly ten thousand LinkedIn posts per quarter. It has the ring of settled empirical fact.
It is not a settled empirical fact.
The figure traces back to a 1993 book — Reengineering the Corporation, by Michael Hammer and James Champy — in which Hammer described it as, in his own words, “our unscientific estimate” of the rate at which reengineering efforts (not all transformations, specifically reengineering) failed to meet their targets. He disowned it two years later. John Kotter, the other most-cited source, did not actually state “seventy per cent” in his famous 1995 HBR essay. He generalised from observation of a hundred-odd companies and arrived at the number later, in a 2008 book. Nobody has ever done the rigorous peer-reviewed study that would make the figure defensible.
A field that advocates data-driven decision-making has recycled a contested statistic for thirty years without examining its provenance. That is not an irony. That is a demonstration of the exact problem.
What we do have is BCG’s transparent survey of nearly 900 transformation programmes, which found that only around 30 per cent met or exceeded their target value and produced sustainable change. That is the honest number. And it has not improved over three decades of technological advancement, billions spent on consulting, and an entire industry of transformation methodologies.
The tools have transformed. The failure rate has not moved.
That stability across three technological waves is not a technology problem. If it were a technology problem, better technology would have fixed it. The problem is underneath the technology.
What the AI Numbers Actually Show
The AI transformation data is younger but follows the same trajectory.
According to McKinsey’s 2025 State of AI survey of nearly two thousand organisations across a hundred and five countries, eighty-eight per cent were using AI in at least one function. Impressive adoption numbers. Then the follow-through: only thirty-nine per cent reported any measurable enterprise-level EBIT impact. And the organisations capturing significant value — five per cent or more of EBIT attributed to AI — accounted for roughly six per cent of the sample.
MIT’s Project NANDA published an analysis in mid-2025 that found that approximately 95 per cent of enterprise generative AI pilots produced no measurable P&L return. The study was based on three hundred publicly disclosed AI initiatives, fifty-two structured interviews, and over a hundred and fifty senior-leader survey responses. Lead researcher Aditya Challapally was direct about the cause: it was not model quality, not regulation, not talent scarcity. He called it a learning gap — organisations deploying AI on top of unchanged mental models and unchanged workflows.
~95%
of enterprise GenAI pilots produced no measurable P&L impact.
MIT Project NANDA, 2025.
According to a July 2024 Gartner press release, at least 30% of generative AI projects would be abandoned after proof of concept by the end of 2025. Later data suggested the actual abandonment rate was closer to fifty per cent.
I want to be careful here, as I always try to be with industry statistics. The MIT figure has been contested — critics called the study’s sample too small to be representative. The Gartner forecast was a prediction, not a measurement. BCG and McKinsey have structural incentives to frame AI adoption as a crisis requiring their intervention. I have written before about the importance of reading research against its source.
But here is what is difficult to argue with: the best-resourced organisations in the world, with access to the best AI models, consultants, and data, are largely failing to capture the value they expected. Not in every case. Not uniformly. But persistently, at scale, across industries and geographies.
That is not a technology problem. The technology has never been better. Something else is not working.
The Layer Nobody Is Working On
In the late 1970s, a Harvard professor named Chris Argyris spent years trying to understand why intelligent, capable professionals failed to learn from experience. His answer was both precise and devastating.
He called it single-loop learning: the correction of errors within an existing set of governing assumptions. The analogy is a thermostat. It detects that the room is too cold and turns up the heating. It is correcting an error. But it never asks whether seventy degrees is the right target. It never examines the assumption that generated the target in the first place.
Most transformation programmes, Argyris would have observed, are thermostats.
They measure adoption rates. They track tool usage. They report on deployment progress. They correct errors in implementation. They do all of this within a set of governing assumptions — about how decisions should be made, where human judgment sits relative to algorithmic recommendation, what expertise means, what accountability looks like — that they never examine, never surface, and never challenge.
Double-loop learning is when you question the assumption that generated the problem, not just the problem itself. Almost no transformation programme is designed to do this.
Argyris’s most uncomfortable finding was that skilled professionals are often the hardest to reach with double-loop learning. Success insulates them. When their self-image is challenged, they resort to what he called defensive reasoning — arguing, reframing, blaming external circumstances. Not because they are bad people, but because it is structurally what expertise-as-identity produces.
Read that against the McKinsey finding that the single strongest predictor of EBIT impact from AI is workflow redesign — not model selection, not data quality, not deployment speed. Workflow redesign. And yet only twenty-one per cent of organisations using generative AI had redesigned at least some workflows. Nearly eighty per cent were layering AI on top of existing processes.
Layering AI on top of existing processes is the organisational expression of single-loop learning. You are correcting the speed of a decision without examining whether the decision itself is right. You are automating a process without first asking whether it should exist. You are deploying capability into a cognitive architecture built for a different era and expecting it to drive transformation.
It will not.
The Cognitive Offloading Trap
There is a distinction I have been using in advisory conversations recently that I want to put on the record because I think it names something happening at scale that organisations are not equipped to see.
It is the distinction between cognitive offloading and cognitive outsourcing.
Cognitive offloading is healthy. It is using technology to free up mental bandwidth — to reduce the cognitive load of routine informational tasks so that attention can be directed to what matters most. Writing tools that handle formatting. Search tools that surface relevant context. Data pipelines that automate cleaning. These amplify human judgment without replacing it.
Cognitive outsourcing is different. It is the passive abdication of the cognitive process itself — accepting AI output without analytical filter, without verification, without the friction of genuine evaluation. The convenience is real. The seduction is understandable. And the consequence, over time, is the gradual erosion of the very faculties that make human oversight of AI meaningful.
According to a 2025 study by researchers at Microsoft and Carnegie Mellon University, published in the proceedings of CHI, the premier human-computer interaction conference, a survey of three hundred and nineteen knowledge workers found that higher confidence in generative AI was associated with less critical thinking, while higher self-confidence was associated with more critical thinking. AI use was shifting the nature of cognitive engagement from original reasoning toward verification, integration, and what the researchers called task stewardship — checking, not thinking.
This is not a new phenomenon. The engineering literature has been documenting automation bias, the tendency to over-rely on automated systems at the expense of contradictory information from other sources, even when those sources are correct, since at least the late 1990s. According to Parasuraman and Manzey’s major integrative review published in Human Factors in 2010, automation complacency degrades failure detection, is found in both naive and expert participants, and cannot be overcome with simple practice.
The same quality that makes generative AI commercially attractive — fluent, confident outputs at speed — is what makes epistemic vigilance harder to maintain. A wrong answer delivered with confidence is more dangerous than a wrong answer delivered with hesitation.
There is a deeper version of this concern that Michael Polanyi named in 1966, long before anyone was worried about AI. Genuine expertise, he argued, is substantially non-codifiable — rooted in embodied practice, accumulated pattern recognition, and the kind of contextual judgment that resists explicit articulation. He called it tacit knowledge. “We know more than we can tell.”
When professionals over-rely on AI systems, they are not merely accepting specific wrong answers. They are systematically trading the development of tacit, experiential knowledge for the consumption of explicitly generated text. Over time, this degrades the cognitive substrate that makes human oversight meaningful. Governance becomes theatre when the humans reviewing AI outputs have atrophied the domain expertise required to evaluate them.
I have seen this in practice. I sat in a risk review meeting where an AI-generated analysis was presented to the committee. It was well-structured, internally consistent, and completely wrong on the key assumption. Nobody caught it — not because the people in the room were incompetent, but because nobody had done the underlying reasoning independently. They had outsourced the thinking and retained only the checking. And checking without thinking is not an oversight. It is the appearance of oversight.
Keeping the Guard Is Not What You Think It Means
When I say we need to keep the guard, I am not arguing for scepticism of AI. I am not arguing for slowing down. I am not describing the posture of a cautious laggard who needs to be convinced.
I am describing something much more specific and much more active: the disciplined exercise of human judgment in the presence of systems that are enormously useful and fundamentally untrustworthy in specific, consequential ways.
Hannah Arendt was not writing about artificial intelligence when she developed her analysis of the banality of evil — her conclusion that the most catastrophic failures of institutional decision-making arise not from malice but from thoughtlessness, the absence of the reflective pause that separates execution from judgment. But the structural argument is identical. An organisation that deploys AI systems without cultivating the capacity to question, verify, and override them has not avoided the abdication of judgment. It has automated it.
The cognitive scientist Gary Klein developed what he calls a pre-mortem: before a decision is executed, the team imagines that it has already failed and works backward to understand what went wrong. It is a structured method for compelling the slow, effortful thinking that Daniel Kahneman calls System 2 in Thinking, Fast and Slow into a process that would otherwise default to the fast, automatic, bias-prone System 1. The point is not pessimism. The point is that epistemic vigilance does not arise naturally under time pressure. It must be designed in.
Epistemic vigilance is not a constant state of suspicion. It is the embedding of structured opportunities for critical engagement at the moments when the cost of getting it wrong is highest.
Philip Tetlock spent decades studying forecasting and found something remarkable, documented in Superforecasting, his 2015 book with Dan Gardner: the people who consistently outperformed professional intelligence analysts — even those with access to classified information — shared a specific cognitive profile. They were actively open-minded. They were comfortable with uncertainty. They made granular probability estimates rather than confident predictions. And they changed their minds when evidence required it. These are not innate traits. They are habits. They are learnable. They are measurable.
The organisations capturing significant value from AI are not the ones with the best models. According to McKinsey’s data, the high performers are more than three times more likely to have defined processes for determining when model outputs require human validation. They have built the epistemic vigilance into the workflow rather than leaving it to individual discretion at the moment of decision, when cognitive depletion and time pressure make it least likely to occur.
The Agentic Inflection Point
There is a governance dimension to this argument that has become acute in the past twelve months and that I have written about in previous issues of this publication: the shift from generative AI to agentic AI.
In the generative era, the primary governance concern was systems saying the wrong thing — hallucinating, generating harmful content, leaking intellectual property. Serious risks, but risks that could largely be addressed within existing compliance frameworks.
In the agentic era, the concern shifts categorically. The risk is no longer about systems saying the wrong thing. It is about systems doing the wrong thing — executing unintended transactions, misusing credentials, operating autonomously beyond the boundaries of their sanctioned authority. According to McKinsey’s 2026 AI Trust Maturity Survey, knowledge and training gaps were the single greatest barrier to responsible AI governance, and this gap was widening rather than closing.
The same survey found that approximately 58 per cent of office workers use unapproved AI tools — shadow AI — bypassing corporate governance controls, compromising data security, and generating operational risk that the organisation cannot see, quantify, or address.
58%
of office workers use unapproved AI tools, bypassing governance controls.
McKinsey AI Trust Maturity Survey, 2026.
As I have written before, you cannot govern what you cannot see. Shadow AI is what happens when official governance structures fail to accommodate the cognitive needs of the workforce. It is not primarily a security problem. It is what happens when an organisation asks people to do double-loop thinking within a single-loop framework — and they find a way around it.
The organisations that will govern agentic AI effectively are not the ones that tighten access controls and issue more policy documents. They are the ones that have built the cultural and cognitive conditions under which people feel safe enough to use sanctioned tools, smart enough to evaluate AI outputs critically, and empowered enough to raise concerns when something is wrong.
That is a cognitive transformation. It does not come from a compliance programme.
The Third Transformation
Digital transformation rewires how work gets done. AI transformation rewires what gets done and who — or what — does it.
The transformation almost nobody is undertaking rewires how people think about what they are doing and why. It is neither digital nor artificial. It is human. And it is the one without which the other two cannot sustain their value.
I want to be precise about what this looks like in practice, because “transformation of thinking” is the kind of phrase that can mean everything and therefore nothing.
It means building probabilistic reasoning into decision culture — not asking “is this AI recommendation right?” but “under what conditions is this recommendation likely to be right, and what is the cost of the cases where it is not?”
It means developing what Dave Snowden calls an attitude of wisdom in his Cynefin framework — confident action combined with genuine, non-performative doubt. Not the paralysis of constant second-guessing, and not the recklessness of unconditional trust, but the calibrated engagement of a person who knows both the power and the limits of the tool in their hands.
It means designing what cognitive researchers call the quality of struggle into learning and development — not optimising solely for convenience and speed, but ensuring that professionals are genuinely building the tacit knowledge that makes their judgment worth having.
And it means leaders modelling this publicly. Not just endorsing AI in the town hall. Using it, being visible about its limitations, changing their minds in front of their teams when the evidence requires it, and creating the psychological safety within which people can say “the model got this wrong” without it being career-limiting information.
The most dangerous sentence in most AI transformation programmes is not wrong. It is silent. It is the absence of anyone asking: what are we assuming here, and should we be assuming it?
According to Carol Dweck’s research on organisational mindset cultures, published in the Harvard Business Review in 2014, employees in growth-mindset organisations were forty-nine per cent more likely to say their organisation fostered innovation and sixty-five per cent more likely to say it supported risk-taking. The mindset is set by leaders. Not by the values statement on the wall. By what leaders do when they are wrong, how they respond when their assumptions are challenged, and whether they treat uncertainty as a threat to manage or a condition to navigate.
That is the transformation that creates the conditions for everything else.
What I Tell Boards
When I sit down with a board that is genuinely trying to understand its AI governance obligations — and I have increasingly more of those conversations — I eventually get to a question that I find changes the texture of the discussion.
It is not: what AI tools have you deployed?
It is not: have you met the regulatory requirements?
It is not: what is your AI strategy?
It is this: when was the last time someone in this organisation changed their mind about something important because of what an AI system showed them, rather than despite it?
The question sounds philosophical. It is not. It is a diagnostic. An organisation where AI changes minds has built the cognitive integration required for transformation. People are engaging with the outputs analytically, not performatively. The technology is affecting the thinking, not just the doing.
An organisation where AI generates outputs that everyone processes around — producing the work required to satisfy the requirement, then making the decision they were going to make anyway — has not transformed anything. It has added a step.
Most organisations, when pressed honestly, are in the second category.
The fix is not a new AI platform. It is not a governance framework — though governance matters enormously, and I have spent significant time on it in previous issues of this publication. It is not a training programme, though targeted development is part of the answer.
The fix is a sustained, deliberate, and somewhat uncomfortable process of making assumptions visible, questioning them at the level that matters, and building the institutional habits that allow the organisation to keep doing this as the technology continues to evolve.
It is, in a word, thinking.
Not smarter tools. Not faster processing. Not better data. Thinking.
The organisations that will look back in five years and say that AI transformation worked are not the ones that deployed it first. They are the ones who thought about it well — and kept thinking even when the thinking was difficult.
A Note on the Evidence
As I noted in the section on the BCG and McKinsey statistics, when I cite industry data, I try to name the source and its commercial context. The McKinsey 2025 State of AI figure (survey n=1,993 across 105 nations) and the BCG transformation analysis (approximately nine hundred programmes) are among the more methodologically transparent data points in this literature. The MIT NANDA 95% figure is directionally significant but should be read as an indicative finding from a limited sample — 300 disclosed initiatives, 52 interviews, 153 survey responses — not as a precise market measurement.
References
Argyris, C. (1977). Double loop learning in organizations. Harvard Business Review, 55(5), 115–125.
Argyris, C. (1991). Teaching smart people how to learn. Harvard Business Review, May–June.
BCG (2020). Flipping the Odds of Digital Transformation Success. Boston Consulting Group.
Dweck, C.S. (2014). How companies can profit from a ‘growth mindset.’ Harvard Business Review, November.
Gartner (2024, July 29). Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept By End of 2025. Press release. Analyst: Rita Sallam.
Hammer, M., & Champy, J. (1993). Reengineering the Corporation: A Manifesto for Business Revolution. HarperBusiness.
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
Kotter, J.P. (1995). Leading change: Why transformation efforts fail. Harvard Business Review, March–April.
Lee, H.-P., Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R., & Wilson, N. (2025). The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. Proceedings of CHI 2025, Article 1121. https://doi.org/10.1145/3706598.3713778
McKinsey & Company (2025). The State of AI in 2025: Agents, Innovation, and Transformation. Survey: n=1,993, 105 nations, June–July 2025.
McKinsey & Company (2026). State of AI Trust in 2026: Shifting to the Agentic Era. AI Trust Maturity Survey.
MIT Project NANDA — Challapally, A., Pease, C., Raskar, R., & Chari, P. (2025). The GenAI Divide: State of AI in Business 2025. Massachusetts Institute of Technology. Reported in Fortune, 18 August 2025.
Parasuraman, R., & Manzey, D.H. (2010). Complacency and bias in human use of automation: An attentional integration. Human Factors, 52(3), 381–410.
Polanyi, M. (1966). The Tacit Dimension. Doubleday.
Snowden, D.J., & Boone, M.E. (2007). A leader’s framework for decision making. Harvard Business Review, November.
Tetlock, P.E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown Publishers.
Weick, K.E. (1993). The collapse of sensemaking in organizations: The Mann Gulch disaster. Administrative Science Quarterly, 38(4), 628–652.


