Go Test Race Condition Debugging

Q: Why does `go test -race` report a race only sometimes?

The race detector is a dynamic tool—it only catches races that actually occur during execution. Because goroutine scheduling is nondeterministic, the same test run may or may not trigger the race. To increase the chance of detection, run the test many times with `-count=100` and vary the parallelism with `-parallel` flag or `GOMAXPROCS`.

Q: Is it safe to use `sync.Mutex` in test helpers called from parallel subtests?

Yes, but be careful: mutexes protect shared state, but if your test helper modifies global state under a mutex, you lose parallelism—only one subtest can run the helper at a time. That might be fine if the helper is fast. A better design is to avoid sharing state altogether: make the helper return a fresh copy and let each subtest have its own data.

Q: My test passes with `-race` locally but fails on CI. What should I do?

CI often has different CPU count, load, or container limits that affect goroutine scheduling. First, mimic CI's environment: set `GOMAXPROCS=2` or use Docker with the same CPU limit. Then run `go test -race -count=100` to stress the race. If that still doesn't reproduce, add logging of shared state to see if it's corrupted. The root cause is almost always a race that doesn't trigger locally due to different scheduling.

Q: Can I use `t.Cleanup` safely with `t.Parallel()`?

Yes, `t.Cleanup` is safe to use with `t.Parallel()`. Cleanup functions run after the test and all its subtests complete, and they do not race with other tests because Go's test framework ensures that cleanup runs before the test function returns. However, avoid sharing mutable state between cleanup functions of different parallel tests—that would still be a race.

What this usually means

t.Parallel() lets tests run concurrently, but if they share any mutable state (package-level variables, global config, test data structures, or even test-scoped variables passed by pointer), you get data races. The Go race detector catches these only if the race actually happens during the test run. Because scheduling is nondeterministic, the race may not trigger on every execution. CI environments with different CPU counts, load, or container limits often expose races that stay hidden on a developer's laptop.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Run `go test -race -count=1 ./...` — if it reports a race, fix that immediately.
2Run `go test -race -count=10 -failfast ./...` to stress the race; note the failing test names.
3Isolate the flaky test: `go test -race -run 'TestName' -count=100 .` to reproduce reliably.
4Add `-v` and capture the exact ordering of t.Parallel() subtests; look for shared variables printed in logs.
5Check for package-level variables or init() functions that mutate state across tests.
6Review test helpers that write to shared slices or maps without synchronization.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchAll files in the test package with `t.Parallel()` calls
searchPackage-level variable declarations (var, const) that tests modify
searchinit() functions that set global state
searchTest helpers that append to or modify shared slices/maps
searchThe race detector output: `go test -race -v 2>&1 | grep "WARNING: DATA RACE"`
searchBenchmark or integration test files that share a common setup function
searchCI logs: compare the test output of successful vs failing runs for ordering differences

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningPackage-level variables used as test fixtures without reset between parallel tests
warningShared test config struct passed by pointer to multiple parallel subtests
warningt.Parallel() inside a for-loop that captures loop variable (pre-Go 1.22)
warningTest helper that writes to a global counter or map for tracking test progress
warningDatabase/HTTP mock that is stateful and reused across parallel tests
warningSubtests that modify the same file or environment variable
warningTest cleanup (t.Cleanup) that races with other tests still running

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildUse t.Run() with t.Parallel() but pass copies of data, not pointers: `tc := tc` (or `tc := testCase` for Go <1.22)
buildReset shared state in a `sync.Mutex`-protected block or use `sync.Map` for concurrent-safe accumulation
buildRefactor shared state into per-test setup: move globals into test-local variables inside `t.Run()`
buildAvoid `t.Parallel()` in subtests that mutate the same file or external resource—use serial execution for those
buildReplace package-level variables with test-scoped ones using `TestMain` setup and teardown
buildAdd a `sync.WaitGroup` to coordinate cleanup, but prefer `t.Cleanup` over manual defer

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedRun `go test -race -count=100 ./flaky-package` — zero failures after fix
verifiedToggle `-parallel` flag: `go test -parallel 1` should always pass; `-parallel 8` should also pass
verifiedAdd `t.Parallel()` to every subtest and run with `-race` — no data race warnings
verifiedCompare test logs before and after: no unexpected zero values or stale state
verifiedDeploy to CI and observe 10 consecutive green runs on the flaky test suite

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningBlindly removing t.Parallel() — you lose test speed and hide the real bug
warningAdding `time.Sleep()` to work around races — it masks the issue and makes tests slower
warningUsing `-race` only once and assuming clean output means no race
warningIgnoring loop variable capture in Go <1.22 — always copy the variable
warningSharing a database transaction or connection between parallel tests without proper isolation
warningUsing `sync/atomic` without understanding memory ordering — can still race if not used correctly

( 07 )War story

The Flaky CI Race in a Payment Gateway Test Suite

Senior Backend EngineerGo 1.18, PostgreSQL, Docker, GitHub Actions

Timeline

09:15CI fails on TestPaymentRetry — but only on the second run of the day.
09:30Local `go test -race ./...` passes 10 times in a row.
09:45I notice the failing test shares a package-level variable `processedTransactions` with other tests.
10:00Run `go test -race -count=100 -run TestPaymentRetry` locally — finally catches a race after 47 runs.
10:15Race detector output shows concurrent write to `processedTransactions` map from two parallel subtests.
10:30I check git blame: `processedTransactions` was added 6 months ago for logging, never meant to be thread-safe.
10:45Fix: replace the global map with a test-local variable created inside `t.Run()`.
11:00Re-run CI 10 times — all green. The race is gone.

The CI failure was intermittent: TestPaymentRetry would fail only on the second run of the day. Locally, I couldn't reproduce it. The test suite had 200+ tests, many using t.Parallel(). I spent two hours adding debug prints and running subsets, but the failure never showed up.

I finally ran `go test -race -count=100 -run TestPaymentRetry` and saw the race after 47 runs. The race detector pointed to a package-level map `processedTransactions` that was being written by multiple parallel subtests. This map was introduced months ago to track retries for monitoring—never intended for concurrent access.

The fix was simple: move the map inside the test function so each test gets its own instance. No mutex, no atomic—just no sharing. After the change, CI passed consistently. The lesson: package-level variables in parallel tests are landmines. If it doesn't need to be shared, don't share it.

Root cause

Package-level variable `processedTransactions` was written by multiple parallel subtests without synchronization.

The fix

Moved the map declaration inside the test function, making it local to each test execution.

The lesson

Never use package-level mutable state in test suites with t.Parallel(). Always allocate test-scoped data inside t.Run().

( 08 )Why the Race Detector Doesn't Always Catch It

The Go race detector is a dynamic analysis tool: it only reports races that actually happen during execution. If the scheduler never interleaves the conflicting accesses in a way that triggers the race, the detector stays silent. This is why running tests with `-race` multiple times (e.g., `-count=100`) is essential for exposing flaky races.

CI environments often have different CPU counts, load, or container CPU limits, which change the scheduler behavior. A race that never triggers on a 4-core laptop might trigger every time on a 2-core CI runner. Always run `-race` with `-count=10` at minimum on the CI build to reduce false negatives.

( 09 )The Loop Variable Capture Gotcha (Go <1.22)

A classic source of race conditions in parallel subtests is capturing the loop variable in a closure. In Go versions before 1.22, the loop variable is reused across iterations. So `for _, tc := range tests { t.Run(tc.name, func(t *testing.T) { t.Parallel(); fmt.Println(tc.input) }) }` will cause all subtests to see the last value of `tc` or race on the variable.

The fix is to shadow the variable: `tc := tc` inside the loop before the closure. Go 1.22 changed this behavior, but if your project targets older versions, you must always copy. Use `go vet` to catch this pattern: `go vet -vettool=$(which vet) ./...` will flag loop variable captures.

( 10 )Shared Test Helpers and Fixtures

Test helpers that write to shared slices or maps are another common source. For instance, a helper that appends to a global `var testErrors []string` will race when called from parallel subtests. The fix is to make the helper return its result instead of storing it globally, or pass a thread-safe accumulator (e.g., a channel or sync.Map).

Database fixtures that create records in a shared table without unique identifiers can also cause races: two parallel tests might create records with the same primary key, causing one to fail with a duplicate key error. Isolate test data by wrapping each test in a transaction that rolls back, or use unique identifiers per test.

( 11 )How to Stress-Test for Races Systematically

A systematic approach: 1) Run `go test -race -count=200 -failfast ./package` to find any race quickly. 2) If a race is found, isolate the failing test with `-run TestName`. 3) Increase parallelism with `-parallel 8` or `GOMAXPROCS=2` to change scheduling. 4) Use `stress` tool from `golang.org/x/tools/cmd/stress` to run the test under heavy load: `stress -p 4 go test -race -run TestName`.

Also, consider using `go test -exec 'stress -p 4'` to run tests under stress. This can surface races that only appear under high concurrency. Document the exact command that reproduces the race so it can be used for regression testing.

( 12 )When to Remove t.Parallel() vs Fix the Race

Some engineers advocate removing t.Parallel() to avoid races entirely. That's a bad trade-off: you lose test speed and the race may still exist in production code. The goal should be to fix the race, not hide it. Parallel tests expose concurrency bugs that could also affect production. Use t.Parallel() as a tool to catch these bugs early.

However, if a test is inherently sequential (e.g., testing a global rate limiter), it's fine to omit t.Parallel(). But don't remove it just because it's convenient—fix the underlying data race. Your production code will thank you.

Frequently asked questions

Why does `go test -race` report a race only sometimes?

The race detector is a dynamic tool—it only catches races that actually occur during execution. Because goroutine scheduling is nondeterministic, the same test run may or may not trigger the race. To increase the chance of detection, run the test many times with `-count=100` and vary the parallelism with `-parallel` flag or `GOMAXPROCS`.

Is it safe to use `sync.Mutex` in test helpers called from parallel subtests?

Yes, but be careful: mutexes protect shared state, but if your test helper modifies global state under a mutex, you lose parallelism—only one subtest can run the helper at a time. That might be fine if the helper is fast. A better design is to avoid sharing state altogether: make the helper return a fresh copy and let each subtest have its own data.

My test passes with `-race` locally but fails on CI. What should I do?

CI often has different CPU count, load, or container limits that affect goroutine scheduling. First, mimic CI's environment: set `GOMAXPROCS=2` or use Docker with the same CPU limit. Then run `go test -race -count=100` to stress the race. If that still doesn't reproduce, add logging of shared state to see if it's corrupted. The root cause is almost always a race that doesn't trigger locally due to different scheduling.

Can I use `t.Cleanup` safely with `t.Parallel()`?

Yes, `t.Cleanup` is safe to use with `t.Parallel()`. Cleanup functions run after the test and all its subtests complete, and they do not race with other tests because Go's test framework ensures that cleanup runs before the test function returns. However, avoid sharing mutable state between cleanup functions of different parallel tests—that would still be a race.

Debugging Go Test Race Conditions with t.Parallel()

What this usually means

Frequently asked questions