According to Sackett et al.'s 2022 meta-analysis in the Journal of Applied Psychology, structured interviews have the highest predictive validity for job performance (r=.42). Unstructured interviews and LeetCode-style tests? Barely better than flipping a coin. Here's what actually predicts engineering success.
Use structured interviews (same questions, scoring rubrics, multiple interviewers). Add work samples that mirror actual job tasks. Automate the easy stuff so humans focus on judgment.
The 2022 Sackett study overturned decades of hiring orthodoxy. It found that structured interviews beat cognitive ability tests, job knowledge tests, and unstructured interviews for predicting job performance. The implications for technical hiring are significant.
Having hired engineers for 30 years, I've watched teams obsess over algorithm puzzles while missing candidates who would have been excellent. The best engineer I ever hired couldn't whiteboard a binary tree. But he could debug production at 3am, design systems that scaled, and mentor junior developers. The worst hire I made aced every technical question but couldn't work with anyone. This pattern (optimizing for the wrong signals) is something I've written about in Hiring Senior Engineers.
If you've read Technical Interviews Are Broken, you know the problem. LeetCode scores correlate 0.27 with performance —barely useful. But complaining about bad interviews is easy. Building better ones is harder. Here's what the research says actually works.
The LeetCode Divergence
Here's the statistical reality that makes algorithm interviews counterproductive:
LeetCode performance correlates with "Time Since Graduation," not "Engineering Capability."
The Graph: As years of experience go up, LeetCode scores typically go down. Senior engineers spend their time solving production problems, not memorizing dynamic programming patterns. The skills that make someone excellent at shipping software are different from the skills that make someone excellent at coding puzzles.
The Reality: By filtering for algorithm performance, you are systematically filtering out your senior engineers. You're optimizing for people who are good at homework, not people who are good at shipping software. This is a False Negative machine.
The inverse correlation is real: the candidates who ace your LeetCode rounds often struggle most with ambiguous production problems. The candidates who struggle with LeetCode often have the judgment and experience you actually need.
The Validity Hierarchy
The Sackett meta-analysis ranked hiring methods by predictive validity:
| Method | Validity (r) | Notes |
|---|---|---|
| Structured interviews | .42 | Highest predictor |
| Job knowledge tests | .40 | Domain-specific knowledge |
| Work sample tests | .33 | Actual work tasks |
| Cognitive ability | .31 | Lower than previously thought |
| Unstructured interviews | .19 | Nearly useless |
This contradicts the 1998 Schmidt & Hunter findings that had positioned cognitive ability tests (like algorithmic puzzles) as top predictors. The Sackett meta-analysis demonstrates that structured interviews consistently outperform.
Interview Method Effectiveness Comparison
See how different interview methods compare in predictive validity:
Interpretation: r = correlation with job performance. The gap between structured (.42) and unstructured (.19) interviews is the difference between useful signal and expensive theater.
What Makes Interviews Structured
Structure means consistency and job relevance. Every candidate answers the same questions, evaluated against the same criteria. No improvisation, no "vibe checks."
Key elements of structured interviews:
- Standardized questions. Same questions for every candidate. Questions derived from actual job requirements.
- Behavioral anchors. Define what "good" looks like before interviewing. Rate responses against specific criteria, not gut feel.
- Multiple interviewers. Different perspectives reduce individual bias. Calibrate ratings across interviewers.
- Documented scoring. Write down ratings and reasoning immediately. Don't rely on memory or "overall impression."
The discipline feels bureaucratic. It works anyway. Structure removes the variability that makes unstructured interviews unreliable.
Work Samples Over Whiteboard
Research from NC State and Microsoft found that whiteboard interviews have "uncanny resemblance to the Trier Social Stress Test"—a psychological procedure for inducing maximum stress. Participants performed 50% worse on whiteboards than coding privately.
Work samples (actual job tasks in realistic conditions—predict better:
What makes good work samples:
- Real problems. Use actual bugs from your codebase (anonymized). Ask candidates to debug them.
- Realistic environment. Let them use their preferred IDE, Google, documentation. That's how real work happens.
- Reasonable time. 2-4 hours maximum. Longer "homework" assignments exploit candidates.
- Clear evaluation criteria. Define what you're looking for before the exercise. Evaluate against those criteria.
Work sample ideas that test real skills:
- Code review exercise. Give them a PR with issues. Can they identify problems? Give constructive feedback?
- Debug a production issue. Here's a failing service and logs. Walk me through how you'd investigate.
- Design discussion. We need to build X. How would you approach it? What tradeoffs would you consider?
- Pair programming. Work together on a real task. How do they collaborate?
Behavioral Questions That Predict
Behavioral interviewing asks about past experiences as predictors of future performance. "Tell me about a time when..." questions work because past behavior predicts future behavior better than hypotheticals.
Effective behavioral questions for engineers:
- Debugging under pressure. "Tell me about a time you had to debug a critical production issue. Walk me through your process."
- Technical disagreement. "Describe a time you disagreed with a technical decision. How did you handle it?"
- Learning something new. "Tell me about a time you had to quickly learn a new technology or domain. How did you approach it?"
- Delivering under constraints. "Describe a project where you had to make significant tradeoffs. What did you choose and why?"
- Collaboration challenges. "Tell me about a difficult collaboration. What made it hard? How did you work through it?"
Listen for specificity. Vague answers ("I always try to communicate well") predict less than specific examples with concrete details.
The STAR method (Situation, Task, Action, Result) helps candidates structure answers, but don't be rigid about format. What matters is concrete detail: specific technologies, actual numbers, real outcomes. A candidate who says "we reduced latency by 40%" and can explain how is more credible than one who says "we made things faster." The best answers include what went wrong and what they learned—perfection narratives suggest either inexperience or dishonesty.
What Google Found
Google's HR research, led by Laszlo Bock, found that work sample tests (29%) and structured interviews (26%) were the best predictors of job performance. Brainteasers (once a Google interview staple) predicted nothing.
They also discovered their own false negative problem. Senior engineers reported that Google's interview process "sometimes turns away qualified people." The filter was too aggressive, rejecting good candidates who didn't perform well on arbitrary puzzles.
The fix wasn't easier puzzles. It was better assessments: structured interviews, work-relevant problems, and calibrated evaluation.
Practical Implementation
Transitioning from LeetCode to structured interviews requires discipline. Here's a phased approach:
Phase 1: Define what you're actually hiring for.
- List the specific skills needed for this role
- Prioritize: which skills matter most?
- For each skill, define what "good" looks like
Phase 2: Design your interview loop.
- Map each skill to an interview stage
- Create standardized questions for each stage
- Build scoring rubrics before interviewing anyone
Phase 3: Train your interviewers.
- Practice using the rubrics on example answers
- Calibrate: do interviewers rate the same answer similarly?
- Document common failure modes
Phase 4: Iterate based on data.
- Track which interview signals predict on-the-job performance
- Adjust your process based on what you learn
- Be willing to cut stages that don't predict
The Time Investment
Structured interviews take more preparation time than "let's whiteboard some algorithms." You need to design questions, train interviewers, calibrate scoring.
But the ROI is clear. A bad hire costs 1.5-2x their annual salary to replace. Better hiring accuracy saves money, even if each interview costs more to run.
Teams that resist structure usually argue "we can tell a good engineer when we see one." The data says otherwise. Unstructured interviews are barely better than random selection. Your intuition is less reliable than you think.
The real obstacle isn't time —it's ego. Admitting that your gut feel is unreliable feels like admitting incompetence. It's not. It's recognizing what Sackett's meta-analysis confirms: humans are bad at prediction under uncertainty. Structure compensates for our limitations. This same pattern of ego-driven decision making shows up everywhere in startups, as I explored in Founder Ego Kills Startups.
The Bottom Line
The research is clear: structured interviews and work samples predict job performance. LeetCode and whiteboard puzzles don't. The gap isn't small—it's the difference between useful signal and noise.
Switching to evidence-based hiring takes effort. You need standardized questions, scoring rubrics, interviewer training. It's more work than "let's see if they can invert a binary tree."
But hiring is the most important thing you do. Getting it right compounds. Getting it wrong compounds too: in the wrong direction. Invest in better interviews or accept that you're selecting for the wrong skills.
"The research is clear: structured interviews and work samples predict job performance. LeetCode and whiteboard puzzles don't. The gap isn't small—it's the difference between useful signal and noise."
Sources
- Sackett et al. 2022: Revisiting Meta-Analytic Estimates of Validity — Journal of Applied Psychology meta-analysis showing structured interviews outperform cognitive ability tests
- NC State/Microsoft: Technical Interviews Assess Anxiety, Not Skills — Research showing whiteboard interviews induce stress that degrades performance
- Google's HR Research on Interview Effectiveness — Findings on work samples and structured interviews vs brainteasers
- Interviewing.io: LeetCode Ratings and Interview Performance — Analysis showing 0.27 correlation between LeetCode scores and job performance
Better Hiring
Building teams that actually perform requires better hiring practices. Advisory on interview design that predicts success.
Let's Talk