Why so many companies are “dropping AI,” and how to actually make it work.
A lot of executives are quietly freezing or canceling their AI pilots. The headline reasons sound dramatic: “the tech isn’t ready,” “we tried it and didn’t see ROI,” but underneath, most failures are due to fuzzy problems, vague prompts, thin training, and zero changes to how people actually work.
AI hasn’t failed; we’ve failed our people.
What the Data Actually Says (spoiler: context matters)
- MIT’s 2025 industry scan found 95% of enterprise gen-AI pilots produced no measurable impact on P&L. The 5% that did succeed honed in on specific, bounded problems and paired AI with process redesign. (Estrada, 2025)
- METR’s randomized trial with experienced open-source developers showed AI made tasks 19% slower on average, even though participants believed they were 20–24% faster. Translation: without workflow fit and guardrails, AI can add review, cleanup, and orchestration overhead that eats the gains. (Metr, 2025)
- CSIRO’s synthesis put it plainly: even if AI speeds up inbox triage, that doesn’t automatically turn into org-level productivity unless jobs, incentives, and processes change accordingly. Evidence of broad productivity lift is still questionable. (Whittle, 2025)
The Core Mistake: “Ship licenses, hope for magic.”
Too many rollouts mirror how companies adopted Teams or O365: flip the switch, run a lunch-and-learn, and assume value will appear. With AI, that playbook backfires.
- Prompts are vague: Users ask general questions with little context, then blame the tool.
- No process change: People bolt AI onto yesterday’s workflow; the handoffs, metrics, and approvals remain the same, so time “saved” just reappears as rework.
- Shallow enablement: Training covers what buttons to press, not how to communicate and think with AI (task slicing, constraints, verification, and traceability).
- Wrong problems: Leaders chase shiny demos (chatbots, copy) instead of gritty back-office drudgery where AI shines (classification, routing, extraction, summarization). MIT explicitly calls out this issue. (Morales, 2025)
What AI Does Well Today
Beyond creating epic song lyrics, which AI does very well, think of AI as cognitive speed assist, it can help you scan, structure, and propose faster. It can: summarize, cluster, label, draft, compare, reconcile, transform formats, and keep lightweight memory of threads. It can’t do your job’s tacit parts: politics, risk trade-offs, exceptions, ethics, and the “unknown knowns” you’ve built over years.
Put bluntly: AI won’t replace you; it will (and should) magnify your workflow. If you don’t have one, it magnifies the chaos. If you do, it compresses the boring parts so you can spend time on judgment and creativity.
A Simple Litmus Test: Enhance, Don’t Replace
If a pitch centers on “replace humans,” it’s probably a cost-cutting fantasy. If it centers on enhancing specific steps in an already-defined process, and you can measure those steps, now we’re talking.
A Practical Example: Marketing Team
Old way (human-only):
- Manually scrape social comments, Reddit threads, and tickets.
- Triage what needs response vs. ignore.
- Hand-summarize sentiment and themes.
- Cross-check planned campaigns against breaking news or sensitive events.
Total: hours per day, with lots of repetitive reading.
AI-enhanced way (two lightweight agents + human owner):
Listening Agent (nightly):
- Crawl defined sources and queries.
- Auto-classify: “urgent response,” “testimonial candidate,” “FYI.”
- Extract fields: product, feature, sentiment, persona, geography.
- Summarize yesterday with counts and representative quotes and links.
Planning Companion (morning):
- Load the marketing brief and calendar.
- Check for conflicts with news cycles or sensitivities (e.g., events that make copy tone or creative risky).
- Propose edits, swaps, or delays with rationale and impact.
Human owner reviews, decides, and logs outcomes (what changed and why).
What changed: the human still owns judgment, but the reading, sorting, and synthesis are compressed. That’s “enhance,” not “replace.”
A Five-Step Playbook that Separates the 5% from the 95%
- Start with a Process Map, Not a Model
Write the current workflow in 5–9 boxes. Circle steps that are repetitive, text-heavy, or classification-heavy. Those are AI-friendly. - Define a Single Measurable Outcome per Pilot
Examples:
– Reduce time-to-first-draft incident report from 45 → 20 minutes.
– Auto-tag 95% of inbound emails with >98% precision.
– Cut backlog triage cycles from weekly to daily without extra headcount.
If you can’t measure the before/after at the step level, don’t start. - Constrain the Sandbox
Limit data sources, tasks, and users. Curate 20–50 golden examples (inputs + desired outputs) to fine-tune prompts or evaluators. This beats dumping your entire drive into an agent and praying. A single example creates answers amplitudes better.’ - Teach “How to Communicate and Think with AI”
Real enablement = four skills.
1. Context packing: include role, goal, constraints, format, examples.
2. Task slicing: break into label → extract → transform → draft → verify.
3. Verification: checklists, citations, spot-checks, and “red team” prompts.
4. Traceability: save prompts/outputs tied to cases for audit and learning.
(The METR study’s slowdown came largely from review/cleanup overhead, good slicing and verification cut that overhead (note that this is also why there is a new dev title “AI Cleanup Dev.”)(Hale, 2025) - Instrument the Workflow
Track step-level time, accept/reject rates, and error types. If AI adds minutes of review for every minute it “saves,” you’ll see it quickly (and fix it or kill it). MIT’s 5% winners did exactly this. (Morales, 2025)
Prompting isn’t Magic, Give the Machine a Job, Not a Vibe
Bad: “Summarize these complaints.”
Better:
— Role: You are an escalations analyst.
— Goal: Produce a 120-word daily brief for the VP CX.
— Scope: Only items that mention things related to Billing or Payments from the last 24 hours.
— Output: Include headlines, top risks, customers to contact, and under each the top three with titles, names (if applicable), channel, sentiment, link, and confidence level.
— Constraints: Cite message IDs for each headline. No speculation. If you don’t know, don’t include it.
Don’t “chat” with the AI, write a mini business requirements document or a spec. Tell it what you want so it can meet your expectations.
“But aren’t companies laying people off because AI can replace them?”
Some are cutting roles, often the most routine or least connected to core value creation. But that’s more a statement about how those roles were scoped than proof of broad replaceability.
Even the sobering METR result didn’t argue that AI is useless; it argued that without the right workflow, experienced people can go slower. While CSIRO’s point remains: org-level productivity emerges only when work and incentives are redesigned to exploit time saved. (Tong, 2025)
If You Do Just Three Things Next Quarter
- Pick one process and apply the five-step playbook; publish the baseline and target to the team.
- Run a 4-week pilot with golden examples, step-level metrics, and a go/no-go rule.
- Invest in skills, not seats: train a handful of “AI workflow leads” who own process maps, prompts, QA, and measurement.
AI is a force multiplier for the right tasks in the right workflow, owned by people who know what “good” looks like. The bubble isn’t that AI “doesn’t work.” The bubble is thinking you can buy licenses, skip the process work, and get transformation anyway. The 5% who win treat AI like any other operations upgrade: specify, instrument, iterate, or don’t ship.
Need help or not sure how AI can help you or your organization, that’s where I excel. I offer a unique outsider’s view of your role, product, and company and in learning how you work, I am able to see what you can’t.
Let’s make your life easier; connect on LinkedIn or email me.
References:
Estrada, S. (2025, August 18). MIT report: 95% of generative AI pilots at companies are failing. Fortune. https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/ (paywall)
Hale, C. (2025, July 11). Using AI might actually slow down experienced devs. TechRadar. https://www.techradar.com/pro/using-ai-might-actually-slow-down-experienced-devs
Metr. (2025, July 10). Measuring the impact of early-2025 AI on experienced open-source developer productivity. Metr. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
Morales, J. (2025, August 20). 5% of generative AI implementations in enterprise ‘have no measurable impact on P&L’, says MIT — flawed integration cited as why AI projects underperform. Tom’s Hardware. https://www.tomshardware.com/tech-industry/artificial-intelligence/95-percent-of-generative-ai-implementations-in-enterprise-have-no-measurable-impact-on-p-and-l-says-mit-flawed-integration-key-reason-why-ai-projects-underperform
Tong, A. (2025, July 10). AI slows down some experienced software developers, study finds. Reuters. https://www.reuters.com/business/ai-slows-down-some-experienced-software-developers-study-finds-2025-07-10/
Whittle, J. (2025, July 15). Does AI actually boost productivity? The evidence is murky. CSIRO. https://www.csiro.au/en/news/All/Articles/2025/July/Does-AI-actually-boost-productivity-the-evidence-is-murky
AI Disclosure: AI was used to verify grammar, grade level writing, and to find additional articles I was not already aware of during my research (specifically those not behind paywalls).