A rigorous METR study found that experienced developers are 19% slower when using AI coding tools. The twist: they believed they were 20% faster.
Track actual productivity, not perceived productivity. The METR study shows experienced developers are 19% slower with AI tools but believe they're 20% faster. Measure, don't assume.
The logic is sound on paper. AI assistance should make experienced developers faster—they can focus on architecture while AI handles boilerplate.
The AI coding revolution has an inconvenient data point. METR, a respected AI research organization, ran a randomized controlled trial with experienced open-source developers. They worked on their own codebases. The results contradict nearly everything the industry says about AI-assisted development.
Updated January 2026: Added Review Latency Curve analysis and Monday Morning Checklist.
The Review Latency Curve
Writing code is O(n). Reviewing AI-generated code is O(n²). The more it generates, the worse your economics get.
Here is why the METR study found experienced developers 19% slower:
- Writing 100 lines manually: 1 hour. You know what you wrote.
- Reviewing 100 AI-generated lines: 1.5 hours. You must verify every assumption.
- Debugging 100 AI-generated lines: 2+ hours. You are debugging someone else's logic without access to their reasoning.
The productivity promise assumes you can trust the output. The reality is that AI shifts you from creator to janitor. You stop writing code and start reviewing code. The cognitive load does not decrease—it changes form. Instead of thinking "what should this do?", you think "did this do what I needed?" That second question is harder because you are reverse-engineering intent from output.
The Study That Changed the Conversation
METR recruited 16 experienced developers from large open-source projects. These projects averaged 22,000+ GitHub stars and over a million lines of code. These weren't junior developers. They were maintainers who had contributed to their repositories for years.
Developers provided 246 real issues from their own projects. Bug fixes, features, and refactors they would have done anyway. Issues were randomly assigned to either allow or disallow AI tools. When AI was permitted, developers used Cursor Pro with Claude 3.5/3.7 Sonnet.
The result: developers using AI took 19% longer to complete tasks. Not faster. Slower.
METR study: Experienced developers took 19% longer with AI tools
The Perception Gap Is Striking
Here's what makes this study remarkable. Before starting, developers expected AI to speed them up by 24%. After completing tasks with AI, they still believed it helped. They estimated a 20% speedup. This 39-percentage-point perception gap between reality and belief is one of the study's most significant findings.
39-point perception gap: Developers believed +20% faster, reality was -19% slower
They experienced the slowdown but didn't perceive it. This gap between measurement and perception suggests something deeper. We're bad at evaluating our own productivity.
I've observed this pattern throughout my career. Developers often conflate "feeling productive" with "being productive." The truth is, a tool that generates lots of output feels like progress. But the output requires extensive review and correction.
There's a psychological component at play. Watching an AI generate code feels like work is happening. You're engaged, making decisions about what to accept or reject, iterating on prompts. That engagement creates a sense of accomplishment. The clock shows you spent more time than if you'd written the code yourself.
The placebo effect extends to team dynamics. If everyone believes they're more productive with AI tools, questioning that belief feels like resistance. The actual data gets ignored because it contradicts the shared narrative. Entire organizations adopt tools that make them slower while believing they're moving faster.
Perception Gap Calculator
Measure the gap between what you think AI saves and what it actually costs:
Why Experts Get Slower
The study found something counterintuitive: developers were more likely to be slowed down on tasks where they had deep expertise. Researchers had participants rate their prior exposure to each task on a scale of 1-5.
This makes sense once you think about it. An expert already knows the codebase intimately. They navigate directly to the problem and implement a solution. Adding AI introduces context-switching, prompt engineering, and code review overhead the expert doesn't need.
Many developers in the study reported spending significant time cleaning up AI-generated code. When you already know the answer, having a machine propose a different answer isn't help. It's distraction.
The Cognitive Load Problem
METR identified "extra cognitive load and context-switching" as a key factor. This aligns with decades of research on developer productivity: interruptions and context switches have measurable costs.
Using an AI assistant isn't free. You have to:
- Formulate the prompt. Translating intent into effective prompts is a skill that takes time.
- Review the output. Every suggestion must be evaluated for correctness, style, and fit.
- Integrate the code. AI-generated code rarely drops in perfectly. It needs adaptation.
- Debug the result. When AI code fails, you're debugging logic generated by a system that doesn't actually understand your codebase.
For routine tasks where you already know what to type, this overhead exceeds the time savings. The tool designed to accelerate you becomes friction that slows you down.
Earlier Studies Showed the Opposite
This contradicts earlier, more optimistic research. MIT, Princeton, and the University of Pennsylvania found developers completed 26% more tasks with GitHub Copilot. A separate controlled experiment showed 55.8% faster completion.
The difference might be in who was studied. Earlier studies often used isolated coding tasks with developers unfamiliar with the codebase. METR studied experts working on their own repositories. They did real work they would have done anyway.
When you don't know the codebase, AI suggestions are valuable. When you know it intimately, those same suggestions become noise you must filter.
Methodological differences matter significantly in how we interpret these results. Studies showing large gains often measured task completion in isolated environments. Greenfield coding exercises without existing constraints. Clean-room experiments that don't reflect how most code is actually written. METR measured real maintenance work with all its complexity: existing conventions, historical decisions, implicit requirements that exist in every long-lived codebase. That's where developers spend the bulk of their time in professional environments.
What This Means for Teams
The implications are significant for engineering organizations making tooling decisions. According to Stack Overflow's December 2025 survey, developer satisfaction with AI tools dropped to 60% from 70%+ in 2023-2024, with only 3% "highly trusting" AI output. If your most experienced developers are slowed down by AI tools, forcing universal adoption may be counterproductive. The mandate to use AI everywhere could actually hurt team velocity.
The pattern I've observed: AI tools help most where knowledge is scarce. New team members onboarding. Developers working in unfamiliar languages. Anyone exploring a codebase for the first time.
But for maintainers who live in a codebase daily? The tools might subtract value. This doesn't mean AI coding assistants are useless. Their value isn't uniform. The industry's blanket productivity claims aren't holding up to scrutiny.
Consider making AI tools available but not mandatory. Let developers choose based on the task. For boilerplate or unfamiliar territory, AI helps. For complex refactoring in familiar code, it slows you down. Treating AI as one option rather than a universal solution produces better outcomes.
The Vendor Studies Problem
Most productivity studies showing massive gains come from vendors themselves. GitHub, Microsoft, and Google all published research showing their tools make developers faster. This is the same pattern I've seen with AI vendor claims across every domain.
Independent research tells a more nuanced and sobering story. GitClear's analysis of 211 million changed lines of code shows engineers producing roughly 10% more durable code since 2022. Not the 50%+ claims in vendor marketing materials.
When someone with a direct financial interest tells you their product doubles productivity, healthy skepticism is appropriate and warranted.
The more interesting question: do AI tools help where it matters most? If they accelerate greenfield development but slow maintenance, and maintenance is 70% of most codebases, net impact could be negative. The tool works as advertised. Context matters more than averages.
When AI Tools Actually Speed You Up
I'm not saying AI tools are always counterproductive. The key is matching the tool to the task.
The pattern: AI helps most where your knowledge is weakest. It hurts most where your expertise is strongest. For experienced developers maintaining codebases they know intimately—which describes most professional programming—the overhead often exceeds the benefit.
AI Tool Productivity Assessment Scorecard
Score each AI coding task honestly. High scores suggest net productivity loss.
| Dimension | Score 0 (AI Helps) | Score 1 (Neutral) | Score 2 (AI Hurts) |
|---|---|---|---|
| Codebase Familiarity | New to me / onboarding | Some context | I know it deeply |
| Task Complexity | Boilerplate / repetitive | Standard feature | Complex refactoring |
| Review Burden | Output obvious to verify | Some checking needed | Must review every line |
| Context Switching | Single focused task | Some prompt iteration | Heavy back-and-forth |
| Debug Risk | Errors easy to spot | Moderate complexity | Subtle bugs likely |
| Domain Expertise | Learning this area | Competent | Deep expert |
Should You Use AI for This Task?
| Your Situation | AI Likely Helps | AI Likely Hurts |
|---|---|---|
| Codebase familiarity | New to the repo, learning patterns | Maintainer who knows every corner |
| Task type | Boilerplate, config, test setup | Complex refactor, subtle bug fix |
| Solution clarity | Exploring options, prototyping | You already know exactly what to write |
| Language/framework | Unfamiliar syntax, new library | Your strongest language |
| Time pressure | Quick spike, throwaway code | Production code requiring review |
The Bottom Line
AI coding assistants aren't magic productivity multipliers. They're tools with tradeoffs. For some developers, some tasks, some codebases, they help. For others, they hurt.
The METR study is important. It's the first rigorous measurement of AI impact on experienced developers doing real work. The 19% slowdown should prompt organizations to evaluate actual productivity rather than assuming vendor claims are true.
The gap between what developers believe and what measurements show is the most important finding. We're collectively bad at evaluating our own productivity. That blind spot is being exploited by marketing. For patterns that actually work, see When AI Coding Actually Helps.
"Developers often conflate "feeling productive" with "being productive."
Sources
- METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity — The original randomized controlled trial
- GitClear: AI Copilot Code Quality Research 2025 — Analysis of 211 million lines of code showing 10% productivity gains but declining code quality
- MIT Technology Review: AI coding is now everywhere. But not everyone is convinced. — Industry overview and competing studies
- InfoWorld: AI coding tools can slow down seasoned developers by 19% — Analysis of study implications
Technology Assessment
Evaluating AI tool ROI requires measuring actual productivity, not vendor claims. Assessment from someone who's seen the gap between marketing and reality.
Get Assessment