Flaky Tests in CI — Debugging Guide | Buglyst Learn

What this usually means

Flaky tests are tests that sometimes pass and sometimes fail without any code change. They are caused by non-deterministic behaviour: race conditions between async operations, reliance on wall-clock time, shared mutable state between tests, test execution order dependencies, or external service availability. CI environments make flakiness worse because they are slower, have different timing characteristics, and run tests in different orders than local machines.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

1Identify the flaky test. Run it in isolation 20 times. Does it fail? If yes, it is individually flaky. If no, it is order-dependent.
2Check if the test involves time (`setTimeout`, `setInterval`, `Date.now()`). Time-based tests are the most common source of flakiness.
3Check if the test makes network calls. External dependencies (APIs, databases) introduce latency variance and transient failures.
4Check if the test shares state with other tests. Global variables, database rows, or file system state that is not cleaned up between tests causes order-dependence.
5Check the CI timing. Flaky tests often fail more in CI because CI machines are slower — race conditions that are invisible locally become visible.

( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

searchThe flaky test file — read the test logic and look for async gaps, time dependencies, and shared state
searchTest framework configuration — random test ordering, parallel execution settings, timeout values
searchCI job logs — compare passing and failing runs of the same test, look for timing differences
searchTest setup and teardown (`beforeEach`/`afterEach`) — is state properly reset between tests?
searchMock configurations — are mocks reset between tests? Are they simulating time correctly?
searchCI runner specs — CPU, memory, and disk compared to local development machine

( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

warningRace condition: the test asserts before an async operation completes
warningTime dependency: the test uses `new Date()` or `Date.now()` and expects a specific value
warningOrder dependency: test B passes only if test A runs first and leaves the system in a specific state
warningShared mutable state: a global variable or singleton is modified by one test and affects another
warningExternal service: an API call, database query, or file system operation fails intermittently
warningResource exhaustion: CI runs tests in parallel and hits file descriptor or memory limits
warningClock drift: CI machine's clock is slightly different, causing time-based assertions to fail

( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

buildUse fake timers (`jest.useFakeTimers()`, `vi.useFakeTimers()`) to control time in tests deterministically
buildMock external services instead of calling real APIs in tests — use MSW, nock, or similar
buildRun tests in random order locally to surface order dependencies: `jest --randomize` or `vitest --sequence.random`
buildClean up all shared state in `afterEach` — database rows, files, global variables, module caches
buildWait for async operations properly: use `waitFor`, `findBy`, or explicit await on promises
buildAdd retry logic only for the test runner's built-in retry (e.g. Jest `jest.retryTimes(2)`), not custom logic inside tests

Practice these patterns on Buglyst

The Flaky RetryMediumReliability

arrow_forward

Browse all practice labs

( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

verifiedRun the test 100 times in a loop locally. It should pass all 100 times.
verifiedRun the full test suite in random order 5 times. No tests should fail.
verifiedRun tests in CI 3 consecutive times without code changes. All runs should pass.
verifiedCheck that tests do not depend on system time by running them with a different system clock.
verifiedMonitor flaky test rate over the next week — it should trend to zero.

( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

warningAdding `await sleep(1000)` instead of waiting for the actual condition
warningDisabling or skipping the flaky test instead of fixing it
warningRunning tests in a fixed order and relying on that order
warningUsing real network calls in unit tests
warningNot investigating flaky tests because 'it passed on retry' — every flaky test hides a real bug

Related debugging guides

Flaky tests in CI: how to debug and fix intermittent test failures

What this usually means