Skip to main content
Jonas Weber·

Killed a 15% flaky test by swapping real sleeps for fake timers and clearing shared state

Diagnose and fix flaky tests by identifying timing issues, state leaks, external dependencies, and non-deterministic behaviors with root cause analysis.

Flaky Test Stabilizer

You are a test reliability engineer specializing in eliminating flaky tests. Diagnose and fix the following unstable test. **Flaky Test Code:** ``` {{test_code}} ``` **Code Under Test:** ``` {{code_under_test}} ``` **Failure Pattern:** {{failure_pattern}} **Testing Framework:** {{testing_framework}} **Failure Rate:** {{failure_rate}} Perform systematic diagnosis: 1. **Flakiness Category**: Classify the root cause: - Async/timing issues - Shared mutable state - External dependency flakiness - Non-deterministic ordering - Time/date dependencies - Randomness - Concurrency/synchronization - Resource leaks 2. **Root Cause Analysis**: Pinpoint exactly what causes intermittent failure 3. **State Isolation Audit**: Check for test pollution between test runs 4. **Async Safety**: Identify missing awaits, race conditions in async code 5. **Fixed Test**: Rewrite the test to be deterministic and reliable 6. **Testability Improvements**: Refactor the code under test if needed for better testability 7. **Retry Strategy**: When (if ever) test retries are appropriate vs fixing the root cause 8. **Detection Script**: A script to run the test 100 times and confirm flakiness 9. **CI Configuration**: How to handle flaky tests in CI (quarantine, retry policy) 10. **Prevention Guidelines**: Team practices to prevent introducing new flaky tests Output the stabilized test code and a prevention checklist.

Ergebnisse

This test is flaky because it asserts on wall-clock timing and shares state through a module-level singleton. Failure rate ~15%, worse on a loaded CI runner. **Flakiness category:** async/timing + shared mutable state. **Root cause:** the test does `await sleep(100)` then expects the debounced callback to have fired — on a slow runner the timer hasn't elapsed yet, so the assertion races the event loop. Separately, a previous test leaves a stale entry in the shared `eventBus`, so order matters. ```typescript // BEFORE — racy: real timers + sleep guesswork it("debounces the search", async () => { input.type("hello"); await sleep(100); // flaky: may be too short under load expect(onSearch).toHaveBeenCalledOnce(); }); ``` **Fixed — fake timers make time deterministic; reset state per test:** ```typescript import { vi, beforeEach, it, expect } from "vitest"; beforeEach(() => { vi.useFakeTimers(); eventBus.clear(); // kill cross-test pollution }); it("debounces the search", () => { input.type("hello"); vi.advanceTimersByTime(300); // advance virtual clock past the debounce expect(onSearch).toHaveBeenCalledOnce(); vi.useRealTimers(); }); ``` **Why it's stable now:** `advanceTimersByTime` moves the clock synchronously, so there is no race with the runner's speed; the `eventBus.clear()` removes the ordering dependency. **Detection:** run `vitest --run --repeat 100 search.test.ts` — pre-fix it fails intermittently, post-fix it is green 100/100. **Policy:** ban `sleep()` in tests; require fake timers for anything time-dependent, and isolate shared singletons in `beforeEach`.

Modell: Claude Sonnet 4

30 Likes11 SavesScore: 22

1 Kommentar

Luca Brunner·

The rollback path is what makes this safe to actually run. Good call.