As Joel Spolsky famously argued, rewriting software from scratch is almost always a mistake - Netscape's 3-year browser rewrite cost them the market to Internet Explorer. But sometimes rewrites are unavoidable. Having watched both outcomes, here's what the 10% that succeed do differently.
Limit scope to 6 months. Freeze the old system. Keep the same team. Three or more warning signs means don't start. Score your rewrite on the viability scorecard before committing resources.
If you've read The Rewrite Trap, you know I'm skeptical of "burn it down" projects. The fantasy of the clean slate usually ends in tears. But sometimes there's no alternative. Technology truly obsolete. Architecture fundamentally broken. Ecosystem dead.
When a rewrite is unavoidable, how do you join the minority that succeed? The Standish Group's CHAOS research shows only 29% of IT projects finish on time, on budget, and deliver expected value—and large rewrites fare worse. After observing dozens of these projects, I've seen patterns emerge.
The Scope Discipline
The single biggest predictor of rewrite success is scope. Not team size. Not budget. Scope.
Successful rewrites are smaller than you'd expect. The teams that win limit their ambition aggressively. They don't rewrite the whole system. They rewrite the critical 20% that causes 80% of the pain.
Questions that indicate healthy scope discipline:
- Can we ship in 6 months or less? Rewrites longer than 6 months accumulate scope creep, team turnover, and shifting requirements. If your honest estimate is 18 months, you're planning for failure.
- Can we freeze the old system? If you must maintain both systems in parallel, you've doubled your burden. Successful rewrites either freeze the old system or have a clear cutover date.
- What are we explicitly NOT rewriting? The question isn't what to include. It's what to exclude. A rewrite without a "not doing" list is a rewrite without boundaries.
The User Migration Strategy
Most failed rewrites obsess over technical architecture. Successful ones obsess over user migration.
How do users get from old to new? This question should drive your architecture, not the other way around. The best rewrites maintain parallel paths longer than engineers want but shorter than the business fears.
Patterns I've seen work:
Shadow mode first. Run the new system alongside the old, comparing outputs without affecting users. This is what Martin Fowler's Strangler Fig pattern looks like in practice—gradually replacing functionality while the old system still runs. This catches bugs before they become incidents and builds confidence that the new system actually works.
Here's what shadow mode looks like in code:
class ComparisonEngine:
def __init__(self, legacy_system, new_system):
self.legacy = legacy_system
self.new = new_system
self.divergences = []
def verify_output(self, input_data):
legacy_result = self.legacy.process(input_data)
new_result = self.new.process(input_data)
if not self.outputs_match(legacy_result, new_result):
self.log_divergence(input_data, legacy_result, new_result)
return legacy_result # Always return stable result
return new_result
def outputs_match(self, a, b):
# Define your equality criteria here
return a == b
def log_divergence(self, input_data, legacy, new):
self.divergences.append({
'input': input_data,
'legacy': legacy,
'new': new,
'timestamp': datetime.now()
})The key insight: always return the legacy result when outputs diverge. Shadow mode is for gathering data, not for risking production.
Gradual traffic shifting. 1% of users, then 5%, then 20%, then full cutover. Each stage is a checkpoint. Problems at 1% are recoverable. Problems at 100% are disasters.
Rollback always ready. Until the old system is decommissioned, keep the ability to return. This sounds obvious. It's frequently abandoned under schedule pressure. Don't abandon it.
The Team Continuity Factor
In my experience advising on these projects, team stability is the second-biggest predictor of success after scope.
The people who start the rewrite must finish it. When core team members leave mid-project, they take context that can't be documented. The replacement ramps up for months. Decisions get revisited. Timelines slip.
What this means practically:
- Small teams are better. Three senior engineers who stay are worth more than ten rotating through.
- Retention incentives matter. If key engineers might leave, address that before starting. A rewrite with 50% turnover is a rewrite that fails.
- Knowledge concentration is risky. If one person understands everything and they leave, the project dies. Ensure at least two people deeply understand every critical component.
The "Good Enough" Architecture
Failed rewrites often aim for perfection. Successful ones aim for "good enough."
The layer tax applies here: every abstraction has a cost. Rewrites that succeed resist the urge to build "the right way" when "a better way" is sufficient.
Avoid the second-system effect. As Joel Spolsky warned, freed from the constraints of the old system, architects tend to over-engineer. Every wish-list feature gets included. Every theoretical best practice gets implemented. The result is more complex than what you replaced.
The Feature Parity Trap
Here's the math that kills most rewrites:
The old system has 10 years of bug fixes. Those bug fixes are Chesterton's Fence—each one exists because someone hit a wall. The new system has zero.
The Math: To rewrite a 10-year-old system in 1 year, you must move 10x faster than the original team. You can't. Nobody can. The original team wasn't stupid—they just had 10 years to discover all the edge cases.
The Result: You will cut scope. You will ship an "MVP" that does 80% of what the old system did. The users will revolt because they needed that missing 20%. You just spent $2M to build a worse product.
This is why the Strangler Fig pattern works and Big Bang rewrites fail. Incremental replacement preserves the institutional knowledge encoded in bug fixes. Complete rewrites discard it.
Signs you're over-engineering:
- You're building for scale you don't have. The new system handles a billion users when you have ten thousand. This isn't wisdom; it's premature optimization.
- You're debating abstractions endlessly. Architecture discussions that run for weeks are architecture decisions that aren't shipping.
- You're adding "while we're at it" features. Every feature added is time subtracted from the core mission. The rewrite is not the time for wish-list items.
The Political Air Cover
Technical execution isn't enough. Rewrites die political deaths as often as technical ones.
Executive sponsorship must survive pressure. When the rewrite hits its first delay—and it will—someone must protect it from cancellation. That person needs organizational power and genuine commitment.
I've seen rewrites cancelled not because they were failing technically, but because:
- The sponsor changed jobs. New executive, new priorities. Suddenly the rewrite is "previous leadership's project."
- A competitor shipped something. Panic sets in. "We need features NOW, not in six months." Resources get pulled.
- Finance noticed the cost. Rewrites are expensive. If the business case wasn't solid, budget pressure kills the project.
Before starting, ensure your sponsor understands: this will be hard, it will take longer than hoped, there will be moments of doubt. Do they have the appetite for that fight?
Rewrite Viability Scorecard
Answer these questions honestly. Each "Good" answer is a green light; each "Warning" is a red flag:
When to Pull the Plug
Even with good planning, rewrites fail. Recognizing failure early saves more than pressing on.
Kill signals to watch for:
- Timeline has doubled. If your 6-month estimate is now 12 months, you've lost control of scope. The further you slip, the worse it gets.
- Core team members are leaving. One departure is manageable. Two is worrying. Three is fatal. Don't pretend you can replace institutional knowledge.
- Requirements keep changing. The business can't freeze what they want. Every change extends the timeline. A moving target can't be hit.
- The new system has its own debt. You started accumulating hacks and shortcuts. Congratulations: you're building the next legacy system.
I've watched teams ignore these signals and lose 18 months to rewrites that never shipped. The sunk cost fallacy is brutal—the more you've invested, the harder it is to walk away.
Kill Signal Framework
| Signal | Yellow Threshold | Red Threshold | Action |
|---|---|---|---|
| Scope creep | >15% growth | >25% growth | Terminate or re-scope from scratch |
| Timeline slip | >25% delay | >50% delay | Terminate project |
| Core team turnover | 1 departure | >30% departed | Pause and reassess viability |
| New system bugs | >5% of old system | >10% of old system | Pause migration, fix quality |
| Shadow mode divergence | >1% of requests | >5% of requests | Halt rollout, investigate |
Two yellow signals warrant an emergency review. One red signal means stop and seriously consider cancellation.
Admitting failure is painful. But a cancelled rewrite is recoverable. A completed rewrite that doesn't work is a catastrophe. This is why understanding what users actually need matters more than architectural purity.
The Hidden Success Factor: Documentation Discipline
The rewrites that succeed share one trait that rarely gets mentioned: obsessive documentation of decisions.
Why we chose this, not just what we chose. Six months into a rewrite, someone will question a foundational decision. If the reasoning isn't written down, you'll relitigate it. That relitigating costs weeks. The team that documents "we chose X because of constraints A, B, C" can answer questions without derailing progress.
What we explicitly didn't build. The "not doing" list is as important as the roadmap. When someone suggests a feature, you can point to the documented decision rather than having the same argument repeatedly. This protects scope better than any project management methodology.
Architecture decision records. Simple documents: context, decision, consequences. They take twenty minutes to write and save hundreds of hours of confusion. Every successful rewrite I've observed had these. Every failed rewrite I've observed relied on tribal knowledge that walked out the door.
The documentation isn't bureaucracy. It's institutional memory that survives team changes, protects scope, and accelerates onboarding. Rewrites are marathons, and marathons require the kind of discipline that doesn't feel heroic but determines outcomes.
Rewrite Viability Scorecard
Score each criterion 0-3 before approving any rewrite:
| Dimension | 0 (Fatal) | 1 (Risky) | 2 (Acceptable) | 3 (Ideal) |
|---|---|---|---|---|
| Scope clarity | "Rewrite everything" | Vague boundaries | Defined modules | |
| Timeline | >18 months | 12-18 months | 6-12 months | |
| Team stability | New team | Mixed team | Core team + 1-2 new | |
| Executive sponsor | None identified | Sympathetic manager | Director-level with budget | |
| Old system state | Must maintain both | Partial freeze | Full freeze agreed | |
| Rollback plan | None | "We'll figure it out" | Documented |
The Bottom Line
Successful rewrites aren't about superior engineering. They're about superior discipline: tight scope, stable teams, good-enough architecture, and political protection.
If you can't achieve these conditions, don't start. Incremental improvement beats a failed rewrite every time. But if you must rewrite, the 10% that succeed share these characteristics. Join them or join the 90%.
The question isn't "can we build something better?" You can. The question is "can we build something better under conditions that allow us to succeed?" That's harder. That's what separates the 10% from everyone else.
"Successful rewrites aren't about superior engineering. They're about superior discipline: tight scope, stable teams, good-enough architecture, and political protection."
Sources
- Joel on Software: Things You Should Never Do, Part I — Joel Spolsky's foundational essay on the Netscape disaster and rewrite risks
- Martin Fowler: Strangler Fig Application — The incremental migration pattern that successful rewrites often incorporate
- CHAOS Report 2015: The State of Project Success — Comprehensive research on IT project success rates showing that only 29% of projects are successful. Project size and complexity are key determinants of outcome.
Rewrite Assessment
Evaluating whether a rewrite can succeed requires honest assessment of scope, team, and politics. Advisory from someone who's seen both outcomes.
Get Assessment