22 Jun 2026 • 3 min read

One Claude Code Tip a Day: Reproduce the Test Before the Fix

Use Claude Code for test-driven debugging: make it reproduce the failure first, inspect the exact failing assertion or log, apply a narrow fix, then run the same test again before broadening scope.

This post is part of the “One Claude Code Tip a Day” series — a daily guide to using Claude Code more effectively.

The failure: Claude wants to fix what it has not reproduced

A common Claude Code failure mode is surprisingly human: it reads an error, recognizes the pattern, and starts editing before proving the bug exists locally. That can work for trivial syntax errors. It is dangerous for real application bugs, especially frontend regressions, async race conditions, and test failures hidden behind fixtures. Today’s habit is to turn Claude Code into a test-driven debugger: first reproduce the failure, then fix only the thing the failing test proves.

The workflow

The useful command is not a single slash command. It is the discipline of running tests and shell checks inside the Claude Code loop. Start with a failing symptom, then ask Claude to inspect the relevant files without editing. After that, make it run the smallest command that reproduces the bug. In practice the session starts like this: “Do not edit yet. Read the failing test, the component it covers, and the test setup. Explain what behavior the test expects. Then run the narrowest test command and paste the failing assertion back into your reasoning.”

Use the narrowest command first

For a React app, that command might be `!pnpm test src/components/PriceBadge.test.tsx --runInBand`. For a Python API, it might be `!pytest tests/test_billing.py::test_trial_expires_at_midnight -q`. For a deployment bug, it may be `!railway logs --service worker --tail 200` before touching code. The point is to replace “Claude thinks the bug is probably X” with “the failure says X, in this file, at this assertion.”

A realistic session

Imagine a dashboard where discounted prices disappear only in dark mode. A shallow prompt says, “Fix the price rendering bug,” and Claude may rewrite the component, rename props, or adjust theme tokens blindly. A better prompt is: “Read `PriceBadge.tsx`, `PriceBadge.test.tsx`, and the theme helper. Do not edit. Tell me which test should fail for invisible discounted text. If no test covers it, propose the smallest failing test first.” Now Claude has to discover whether the bug is business logic, CSS, or missing coverage.

Make Claude write the failing test before the fix

If no test exists, ask for a red test before implementation: “Add one focused test that fails because the discounted price is present in the DOM but visually hidden by the dark theme class. Do not fix the component yet. Run only that test.” This is where the workflow earns its keep. Sometimes Claude will write a test that checks text content but not visibility, producing a false green. Push back: “This test does not prove visibility. Use the class/style contract we actually rely on, or explain why the DOM testing library cannot assert it and choose a safer component-level assertion.”

Inspect the evidence, not the confidence

After Claude runs the test, inspect three things before allowing edits. First, does the test fail for the intended reason? A failure because the fixture is missing a currency symbol is noise. Second, is the reproduction narrow enough? A full suite failure with thirty unrelated snapshots is not a debugging signal. Third, did Claude summarize the failing line and the suspected code path? If the answer is vague, ask it to read again. The TonBisa-style loop is prompt, result, inspect, correction, verify—not prompt, trust, merge.

Second-pass correction

Now let Claude patch narrowly: “Only change the minimum code needed to make this failing test pass. Do not refactor the component. After editing, run the same test, then show `git diff` and explain why the diff addresses the failing assertion.” This wording prevents the classic over-fix. Claude might still change a shared theme helper and accidentally alter every price badge. The second pass is where you catch that. If the diff touches unrelated files, say: “Back up. Revert the broad helper change and solve this at the component boundary unless the test proves the helper is wrong.”

Failure modes

The first failure mode is fake green: Claude updates the test to match broken behavior. Guard against it by asking, “What user-visible behavior does this test protect?” The second is broad green: the test passes, but only because a shared mock or snapshot was weakened. Ask Claude to list any test files it modified and why. The third is unverified green: Claude claims success without rerunning the exact failing command. Make the final checkpoint mechanical: same command, same test, now green, followed by one relevant broader check such as `!pnpm test PriceBadge --runInBand` or `!pytest tests/test_billing.py -q`.

Rule of thumb

Do not ask Claude Code to “debug this” as the first move. Ask it to prove the bug with the smallest failing command. Then ask for a narrow fix. Then rerun the same command and review the diff. If you cannot name the failing assertion or log line, you are not debugging yet—you are giving an autocomplete engine permission to guess.