Mocking External APIs in Tests: Lessons from a Real Outage

I once deployed a service that failed for 45 minutes because a third‑party API added a single field to its JSON response. Our test suite passed with flying colors — we mocked the API perfectly, but the mock returned the old schema. The real API returned an extra field that our deserializer couldn't handle, and the whole pipeline crashed.

That was my introduction to stub drift. Since then I've learned that mocking external APIs is not just about intercepting HTTP calls — it's about maintaining a faithful representation of a system you don't control. Let me walk you through the non‑obvious techniques I now use.

The problem with hand‑written mocks

Most developers start by hand‑writing a mock response. In Node.js with nock, it looks like this:

A typical hand‑written mock with nock

nock('https://api.example.com')
  .get('/users/42')
  .reply(200, {
    id: 42,
    name: 'Alice'
  });

The test passes. But what if the real API now returns a `created_at` field? Or what if the `name` field is sometimes null? Your mock is a snapshot of a moment in time, and unless you actively verify it against the real API, you're flying blind.

Hand‑written mocks are brittle and rot quickly. The only way to keep them honest is to run a contract test that periodically fetches the real API and compares the schema.

warning

Never rely solely on hand‑written mocks for critical paths. They are better than nothing, but they give a false sense of security if not validated externally.

Record and replay: the pragmatic middle ground

Instead of guessing the response shape, record real traffic once and replay it in tests. Tools like Polly.JS (for Node) and VCR (for Ruby) capture HTTP interactions and store them as cassette files. Your test uses the cassette, not a hand‑crafted stub.

Here's an example with Polly.JS:

Using Polly.JS to record and replay API responses

const { Polly } = require('@pollyjs/core');
const NodeHttpAdapter = require('@pollyjs/adapter-node-http');
const FSPersister = require('@pollyjs/persister-fs');

const polly = new Polly('get-user', {
  adapters: [NodeHttpAdapter],
  persister: FSPersister
});

// First run: records real response into ./cassettes/
// Subsequent runs: replay from cassette
const res = await fetch('https://api.example.com/users/42');
await polly.stop();

Now your test uses the actual response from the API. When the API changes, you delete the cassette and re‑record. This catches drift immediately — your tests will fail until you update the cassette.

The trade‑off: cassettes can be large and you must commit them to version control. Also, you need to run a recording session against the real API, which may require credentials or a sandbox environment.

Contract testing with Pact

If you control both the consumer and provider, Pact is the gold standard. The consumer (your service) writes a contract (a Pact file) that describes the expected interactions. The provider (the external API) runs the contract against itself during its CI. If the provider changes something that breaks the contract, the provider's build fails.

This shifts the responsibility: the API provider knows exactly what your service expects. But it only works if the external team is willing to run your Pact tests in their pipeline. For third‑party APIs, that's rarely an option.

The 45‑minute outage

14:00Third‑party payment API adds a new optional field `processor_fee` to the charge response.
14:02Our service receives a charge webhook, tries to deserialize the JSON. The model does not have a `processor_fee` field, but the deserializer is strict and throws an error on unknown fields.
14:05All subsequent webhook events fail. Alert triggers.
14:30On‑call engineer identifies the issue: the mock in tests still returns the old schema.
14:45Deploy a fix that ignores unknown fields. Service recovers.

Lesson

Our mocks were never validated against the real API. If we had a periodic smoke test that called the actual endpoint, we would have caught the new field during development, not after deployment.

Strict matching vs. loose matching

Many mocking libraries default to loose matching: they ignore extra headers, query parameters, and even request bodies. That's dangerous. A test might pass even though your code sends a malformed request, because the mock doesn't check the payload.

Always configure your mocks to use strict matching. In WireMock, you can use `withRequestBody` and `withHeader` to enforce exact matches. In nock, use `body` and `query` to match precisely.

Strict matching in nock — the mock will only match if headers and body are exactly as specified

nock('https://api.example.com', {
  reqheaders: {
    'Authorization': 'Bearer valid_token'
  }
})
  .post('/charges', {
    amount: 1000,
    currency: 'usd'
  })
  .reply(200, { id: 'ch_123' });

If the real API expects a different header or body format, your test will (correctly) fail. This forces you to keep your code and mock in sync.

lightbulb

Use a custom matcher that rejects unknown fields in the request body. This way, if your code sends extra data that the real API would ignore, the mock still fails — alerting you to potential issues.

Smoke tests: the safety net

No matter how good your mocks are, you need a test that calls the real API. Not in every CI run — that would be slow and flaky — but as a periodic smoke test (e.g., nightly) or a manual sanity check before deployment.

A smoke test is a simple script that hits the external endpoint and validates the response schema. If it fails, you know your mocks are stale.

A simple smoke test that validates the API response structure

// smoke-test.js
const response = await fetch('https://api.example.com/users/42');
if (!response.ok) throw new Error('API not reachable');
const body = await response.json();
if (typeof body.id !== 'number') throw new Error('id field missing or wrong type');
console.log('Smoke test passed');

Run this in a scheduled job or as a pre‑deploy step. It's not a replacement for unit tests — it's a reality check.

Handling errors and edge cases

Most tutorials show the happy path. But external APIs fail in many ways: 429 rate limits, 503 service unavailable, timeouts, or malformed responses. Your mocks must cover these too.

For each error scenario, write a test that mocks the error and asserts your code behaves correctly (retries, falls back, logs, etc.). Use tools like WireMock's `delay` to simulate slow responses.

Mocking a slow response to test timeout handling

nock('https://api.example.com')
  .get('/users/42')
  .delay(5000)  // simulate slow response
  .reply(200, {});

// Your code should timeout and retry

The test that never fails is the test that lies to you.

Tooling recommendations

arrow_rightWireMock — Java, runs as a standalone server, supports recording, strict matching, and fault injection.
arrow_rightnock — Node.js, intercepts HTTP at the library level, easy to use, but no built‑in recording (use Polly.JS alongside).
arrow_rightPolly.JS — Node.js, records and replays HTTP interactions, works with any HTTP library.
arrow_rightPact — for contract testing between services, supports many languages.
arrow_rightMountebank — multi‑protocol, supports TCP and SMTP mocks as well.

If you're starting fresh, I'd pair nock with Polly.JS for recording, or use WireMock if you want a more robust solution. Don't forget the smoke test.

61%

of teams using mocks do not validate them against real APIs (2023 State of Testing Report)

Final thoughts

Mocking external APIs is necessary for fast, reliable unit tests. But it's also a source of brittleness if not managed carefully. The key takeaways: record real responses when possible, use strict matching, validate your mocks with a smoke test, and never assume the mock is correct forever.

That 45‑minute outage taught me a lesson I won't forget. Since then, every service I build has a contract test or a nightly smoke test that calls the real API. It's a small investment that pays for itself the first time an API changes without notice.

Frequently asked questions

What is stub drift?

Stub drift occurs when your mock or stub no longer matches the actual external API behavior (e.g., new fields, changed endpoints, different error codes). Tests still pass because the mock is static, but production breaks.

Should I mock external APIs in unit tests?

Yes, for unit tests you should mock all external dependencies to keep tests fast and isolated. But you must also have integration tests that call the real API (or a sandbox) to catch drift.

Which is better: WireMock or nock?

WireMock (Java) and nock (Node.js) are both excellent. WireMock runs as a separate server, which is more realistic for HTTP interactions. nock intercepts at the HTTP layer in Node. Choose based on your stack and need for recording/playback.

How do I test error responses from an external API?

Mock the error responses explicitly: 4xx, 5xx, timeouts, malformed JSON, etc. WireMock lets you stub responses with specific status codes and bodies. nock can reply with an error or delay. Always test that your code handles these gracefully.

Mocking External APIs in Tests: Why stubbing HTTP is harder than it looks