Debugging fundamentals8 min read

Binary Search for Bugs: Cutting Down Bisection Time with git bisect run

Automating git bisect with a script turns days of manual hunting into minutes. Here's how to build a reliable bisection harness for your codebase.

debugginggit bisectbinary searchregressionautomationCI

You have a bug. You know it wasn't there last week. Somewhere in the 200 commits since Tuesday, something broke. You could spend hours — or days — manually checking each likely suspect. Or you could let binary search do the work.

I've been on both sides. The manual side is misery: staring at diffs, cherry-picking candidates, rebuilding, testing, repeating. The automated side is beautiful: one command, a cup of coffee, and a commit hash. This is how you get from the first scenario to the second.

The War Story: A 7-Day Regression

Last year, our payment processing pipeline started throwing sporadic 500s on high-value transactions. Not every time — maybe 1 in 50. Enough to hurt revenue, not enough to reproduce easily. The last known good deploy was a week ago. Seven days, 150 commits, 12 authors.

I tried the obvious: look at recent changes to payment code. Nothing. Then I spent a day adding logging, running simulations. Nothing conclusive. That's when I remembered git bisect exists. I wrote a script that created a test transaction with the right parameters and checked the response code. It took 4 minutes to find the bad commit. The fix was a one-character typo in a currency conversion constant.

warning

git bisect is not a magic wand. It requires a good and a bad commit, and a test that can reliably tell the difference. If your test is flaky — it fails sometimes on good commits — bisection will give you a wrong answer.

Building the Bisection Harness

The core of automated bisection is a script that returns 0 for good and non-zero for bad. It must be deterministic. Here's the harness I used for the payment bug:

bisect-test.sh — a minimal bisection harness
#!/bin/bash
set -euo pipefail

# Build the project
make build

# Set up test database with fixed seed
./scripts/setup_test_db --seed=42

# Run the specific test for high-value transactions
./vendor/bin/phpunit tests/Payment/HighValueTest.php

Then invoke it with:

`git bisect start HEAD HEAD~150`

`git bisect run ./bisect-test.sh`

git will checkout the midpoint, run the script, and based on the exit code, narrow the range. It prints the first bad commit when done.

When Things Go Wrong: Flaky Tests and Build Failures

The first time I ran that script, it took 30 seconds per test. But the test was flaky — it depended on a random shuffle of test data. Some commits would pass 9 out of 10 times. Bisection gave me a wrong commit twice.

Solution: freeze the randomness. Use a fixed seed, and if your test framework allows, run the test multiple times and require a clean majority. Here's an improved version:

bisect-test.sh with majority voting for flaky tests
#!/bin/bash
set -euo pipefail

make build
./scripts/setup_test_db --seed=42

# Run test 3 times, succeed if at least 2 pass
pass=0
for i in 1 2 3; do
    if ./vendor/bin/phpunit tests/Payment/HighValueTest.php; then
        pass=$((pass + 1))
    fi
done

if [ $pass -ge 2 ]; then
    exit 0
else
    exit 1
fi
lightbulb

If a commit can't be tested (e.g., compilation error), `git bisect skip` marks it as untestable. git will then choose a nearby commit. But use this sparingly — too many skips confuse the binary search.

Speeding Up: Parallel Bisection with git bisect replay

For very large ranges (thousands of commits), sequential bisection can take a while. One trick: run multiple bisections in parallel on different machines, each starting from a different initial range, then combine results.

Use `git bisect log` to save the current state, and `git bisect replay` to restore it. You can checkpoint and distribute. But honestly, for most teams, a single machine with a fast test is good enough. If your test takes more than 5 minutes, optimize the test first.

Practical Tips for Reliable Bisection

  • arrow_rightKeep the test focused: test only the behavior that changed. Narrow the scope to a single function or API endpoint.
  • arrow_rightUse a dedicated test database or fixture that resets between runs. No shared state.
  • arrow_rightIf your project requires external services, mock them. Network calls introduce latency and flakiness.
  • arrow_rightSet a timeout on the test. If it hangs, kill it and exit non-zero.
  • arrow_rightAlways clean up between builds: `git clean -fdx` or use a fresh clone.

One more thing: if the bug is intermittent, binary search still works, but you need to run the test many times and use statistical confidence. That's a more advanced topic — for now, aim for deterministic reproduction.

The first time I used git bisect run, I felt stupid for not using it years earlier. It's like realizing you've been digging with a spoon when a backhoe is parked right next to you.

4 minutes

Time to find a regression across 150 commits using automated bisection

The Bigger Picture: Bisection as a Debugging Discipline

Binary search isn't just for git. You can bisect on any linear dimension: time (last week vs now), input size (large file vs small), configuration flags (feature on vs off). The principle is always the same: find the dividing line where behavior changes.

But git bisect is the most common application because commit history is the natural axis for regressions. Make it a habit: when you hear "this used to work", reach for bisect before you reach for blame.

Write the test once. Automate it. Then bisect with confidence.

Frequently asked questions

How do I use git bisect run with a script?

Write a script that returns exit code 0 for good commits and non-zero for bad ones. Then run `git bisect start <bad> <good>` and `git bisect run ./script.sh`. git will automatically checkout commits and run the script until the first bad commit is found.

What if my test is flaky?

Flaky tests break bisection. Fix flakiness first: use fixed random seeds, disable parallel execution, and ensure no external dependencies like network calls. If it's still flaky, run the test three times and take majority vote.

Can I bisect across hundreds of commits?

Yes, but it takes log2(N) steps. For 1000 commits, that's ~10 tests. If each test takes 1 minute, you're done in 10 minutes. Use `git bisect skip` if a commit can't be tested (e.g., build failure).

What's the difference between git bisect and binary search?

They're the same concept: git bisect is a binary search implementation for commit history. It repeatedly narrows the range by half, checking out the midpoint and asking you (or a script) to classify it as good or bad.