
I’ve been thinking about a process I’ve used for resolving regressions that may be useful to share. I’ve never explicitly written the steps down before, but figured it was worth capturing—both for myself and others.
When a regression shows up, there are three questions that I’ve found you have to answer. Skipping any of them usually leads to confusion, wasted time, or a fix that doesn’t actually solve the real problem. But if you take the time to work through them, you can usually land on the right answer with confidence.
1. Can you reproduce the problem?
A lot of engineers want to jump straight into the code. That’s the fun part, right? Digging through logic, inspecting diffs, reasoning your way to a fix. But if you can’t reliably reproduce the issue, studying the code is usually a waste of time. You don’t even know where to look yet.
Reproducing the problem is the first real step. It’s not glamorous, and it can feel a little silly—especially when you’re trying the same steps over and over with tiny variations. But this is one of the most valuable things you can do when a bug shows up.
As engineers, we have a special vantage point. We know how the code works, and we often have a gut instinct about what kinds of conditions might trigger strange behavior. That gives us a real edge in uncovering subtle issues—so don’t think you’re above tapping on an iPad for hours or running the same test over and over. It’s our duty to chase it down.
Once you have a reliable repro, everything gets easier. You can try fixes, stress other paths, and most importantly, build real confidence that your solution works.
Some useful tricks:
- Adjust timing, inputs, or state to help provoke the bug
- Script setup steps or test data to save time
- Loop the behavior or stress threads to make edge cases more likely
2. What changed?
This step is often skipped. People jump into debugging without first understanding what changed. But the fastest way to track down a regression is to compare working code to broken code and see what’s different.
This question can feel sensitive. It gets close to specific contributions that may have introduced instability. I’ve seen plenty of cases where the discussion gets deflected into vague root causes or long-term issues—anything but the specific change. That’s understandable. We’ve all been there. But avoiding the question doesn’t help. It puts the fix at risk and slows everyone down.
Some go-to techniques:
- Review pull requests and diffs
- Trace from crash logs or error messages to recently changed code
- Use git bisect to find the breaking commit
- Try reverting the suspected change and see if the issue disappears
Once you find the change, test your theory. If undoing it makes the problem go away, you’re on the right track.
3. Why does it happen?
Knowing what changed isn’t enough. You need to understand why that change caused the issue. Otherwise, you’re just fixing symptoms, and might miss deeper problems.
This is where the real problem solving happens:
- Read documentation for the APIs or system behavior involved
- Think through the interaction between components or timing
- Build a mental model and prove it with experiments or targeted tests
You don’t want to ship a fix that works by accident. You want one that works because you actually understand the problem. That’s what prevents repeat issues and edge cases slipping through.
Wrapping up
These three questions — Can you reproduce it? What changed? Why does it happen? — have helped me find and fix bugs more reliably than anything else.
It’s easy to skip them under pressure. It’s tempting to merge the first thing that seems to work. But without answering all three, you’re flying blind. You might get lucky. Or you might end up wasting hours chasing your tail or shipping the wrong fix.