What this usually means
Apollo Federation's gateway composes a unified schema from multiple subgraphs, but at runtime each subgraph is responsible for resolving only its own fields. When a subgraph fails to resolve an entity reference (via `__resolveReference`), the gateway cannot stitch the data together. This is almost never a network issue alone — it's usually a contract mismatch: the `@key` directive in the supergraph schema doesn't match the subgraph's `__resolveReference` implementation, or the subgraph doesn't know about the entity keys it's supposed to extend. Other common causes include authentication context not being forwarded correctly (the gateway strips headers unless explicitly configured) and version skew between subgraph and gateway libraries.
The first ten minutes — establish facts before touching code.
- 1Enable Apollo Federation tracing on the gateway: set `APOLLO_GRAPH_REF` and `APOLLO_KEY` environment variables, then check `Apollo Studio` > `Traces` for the exact subgraph query that failed.
- 2Run `rover subgraph check <graph-ref> --name <subgraph-name> --schema ./schema.graphql` to validate schema composition. If this passes, the error is runtime, not schema.
- 3Inspect the gateway's response headers: if you see `graphql-federation-include-keys: true`, the gateway is sending entity keys — verify the subgraph receives them.
- 4Add a health check endpoint on the subgraph that echoes back the `@key` fields it expects. Compare with what the gateway sends.
- 5Check subgraph logs for `__resolveReference` calls: if there are none, the gateway isn't calling it — likely a schema composition issue.
The specific files, logs, configs, and dashboards that usually own this bug.
- searchGateway logs: `grep 'RESOLUTION_FAILURE' /var/log/gateway/access.log`
- searchSubgraph logs: `journalctl -u subgraph-service -f | grep 'resolveReference'`
- searchApollo Studio performance tab for subgraph latency and error rates
- searchSchema registry diff: `rover supergraph fetch <graph-ref> > current.graphql` and compare with the composed schema in CI
- searchGateway config file (YAML/JSON) for `@apollo/gateway` or `@apollo/subgraph` version and `experimental_didResolveReference` callback
- searchSubgraph's `package.json` or `Gemfile` for the federation library version (e.g., `@apollo/subgraph`, `apollo-federation`)
- searchNetwork proxy logs (e.g., Envoy, Nginx) for request/response body size — large entity lists can trigger timeouts
Practical causes, not theory. These are the things you will actually find.
- warning`__resolveReference` not implemented or returns null for valid keys
- warningSchema composition succeeds but `@key` fields differ between subgraph and gateway (e.g., `@key(fields: "id")` vs `@key(fields: "id tenantId")`)
- warningGateway doesn't forward authentication context (e.g., `Authorization` header) to subgraph entity requests
- warningSubgraph library version mismatch: `@apollo/subgraph` v2.4 vs v2.5 changed internal `__resolveReference` signature
- warningEntity keys contain null values or are serialized incorrectly (e.g., JSON number vs string)
- warningSubgraph returns an error in a list entity resolution — gateway treats the entire list as failed
- warningRequest timeout too short: subgraph takes >5s to resolve a batch of entities, gateway aborts
Concrete fix directions. Pick the one that matches your root cause.
- buildImplement `__resolveReference` with proper null-checking: `if (!reference.id) throw new GraphQLError('Missing key')`
- buildAlign `@key` directives across all subgraphs that extend a type — use `rover subgraph check` to enforce
- buildConfigure gateway to forward headers: set `buildService({ url }) => { return new RemoteGraphQLDataSource({ url, willSendRequest({ request, context }) { request.http.headers.set('authorization', context.authToken); } }); }`
- buildUpgrade all subgraph and gateway packages to the same Apollo Federation version (e.g., 2.5.x)
- buildUse `@apollo/subgraph`'s `buildSubgraphSchema` to auto-generate `__resolveReference` from resolvers
- buildAdd a `@shareable` directive to fields that multiple subgraphs can resolve, avoiding conflict
- buildIncrease gateway timeout: set `requestTimeout: 10000` in the `ApolloGateway` constructor
A fix you cannot prove is a guess. Close the loop.
- verifiedRun a query that crosses subgraphs: `{ user(id: "1") { name reviews { body } } }` — if it returns data, the fix works
- verifiedCheck Apollo Studio for zero `RESOLUTION_FAILURE` errors in the last hour
- verifiedDeploy the fix to a canary subgraph and simulate load with k6: `k6 run --vus 10 --duration 30s script.js` — confirm no errors
- verifiedUnit test the `__resolveReference` function: mock the reference and assert it returns the entity
- verifiedIntegration test with the gateway: start subgraph and gateway locally, send a federated query using `gq` or `curl`
- verifiedVerify schema composition still passes: `rover subgraph check` returns `PASS`
Things that make this bug worse or harder to find.
- warningDon't ignore `@external` directives — they tell the subgraph which fields come from other subgraphs; missing them causes silent nulls
- warningDon't assume local gateway matches production: test with the exact same `@apollo/gateway` version
- warningDon't patch `__resolveReference` in a hotfix without updating the corresponding `@key` in the schema — they must be in sync
- warningDon't batch entity requests over HTTP/1.1 without connection pooling — it will cause socket exhaustion
- warningDon't set `requestTimeout` too high (e.g., 60s) — it masks underlying performance issues and degrades user experience
Production Outage: User Profile Shows Null Reviews
Timeline
- 14:32PagerDuty alerts: User profile page shows null for reviews and ratings. P1 incident declared.
- 14:35Check Apollo Studio: `user.reviews` field has 100% error rate with `RESOLUTION_FAILURE`
- 14:40Review subgraph health check passes. Gateway logs show `UNAUTHENTICATED` for entity resolution requests.
- 14:45Check gateway config: `willSendRequest` doesn't forward `Authorization` header to subgraphs
- 14:50Hotfix: add header forwarding to gateway. Deploy to canary. Error rate drops to 0% for canary users.
- 15:00Full rollout to production. Error rate returns to normal. Postmortem scheduled.
- 15:15Root cause confirmed: a previous deployment added authentication to the gateway but missed the header forwarding config.
I was on-call when the alert came in: user profiles showed null for reviews. Our React app displayed 'No reviews yet' even for users with hundreds. The product manager was furious. I opened Apollo Studio and saw `user.reviews` had a 100% error rate — all `RESOLUTION_FAILURE`. The gateway was failing to resolve the `User` entity in the reviews subgraph.
First, I checked the reviews subgraph health — it was green. I looked at the gateway logs and found `UNAUTHENTICATED` errors from the reviews subgraph. That was strange because the public API didn't require auth for reading reviews. I realized the subgraph was now enforcing auth on all endpoints because of a recent security update. But the gateway wasn't sending the auth header.
I opened the gateway config and saw that the `willSendRequest` function only forwarded headers for non-entity requests. Entity resolution requests (internal calls to `__resolveReference`) were missing the `Authorization` header. I added `request.http.headers.set('authorization', context.authToken)` to the entity request path. Deployed to canary — errors vanished. Full rollout fixed it. The lesson: always audit header forwarding when adding auth to subgraphs.
Root cause
Gateway's `willSendRequest` did not forward the `Authorization` header to subgraph entity resolution requests. The reviews subgraph rejected unauthenticated requests with `UNAUTHENTICATED`, causing the gateway to return `RESOLUTION_FAILURE`.
The fix
Updated `ApolloGateway` config to include `willSendRequest` that sets 'authorization' header from context on every request, including entity resolution.
The lesson
Apollo Federation's entity resolution requests are internal and don't automatically inherit client headers. You must explicitly configure header forwarding. Always test auth scenarios with cross-subgraph queries.
When a gateway receives a query that spans subgraphs, it decomposes the query into per-subgraph queries. For the reviews subgraph to resolve `User.reviews`, it first needs a `User` entity. The gateway asks the users subgraph for the `User` object (via `__resolveReference`), then passes that reference to the reviews subgraph's `__resolveReference`. If either subgraph's `__resolveReference` fails, the field returns null.
The `__resolveReference` function receives a `reference` object that contains the `@key` fields. For example, if `User` has `@key(fields: "id")`, the reference is `{ __typename: "User", id: "123" }`. The function must return the entity or null. Common mistakes include missing `__typename`, incorrect key field names, or not handling composite keys (e.g., `@key(fields: "id organizationId")`). The gateway silently treats null returns as 'not found', not as an error.
Schema composition (done by `rover` or `Apollo Gateway`) checks that `@key` directives are consistent across subgraphs. But it does not validate the actual resolver logic. A common pitfall: the `@key` in the schema says `fields: "id"`, but the subgraph's `__resolveReference` expects `fields: "id tenantId"`. Composition passes because the gateway sees the `@key` from the extending subgraph, but the extending subgraph's `__resolveReference` may require additional fields not in the reference.
Another runtime issue is the `@external` directive. If a subgraph references a field from another subgraph without declaring it `@external`, the gateway may still compose, but the subgraph won't have the field in its schema, causing reference resolution to fail. Always run `rover subgraph check` with `--routing-url` to simulate real gateway behavior.
Apollo Federation is a rapidly evolving spec. The `@apollo/gateway` and `@apollo/subgraph` packages must be on compatible versions. For example, `@apollo/gateway@2.4` requires `@apollo/subgraph@2.4` or later. Mixing v1 subgraphs with a v2 gateway will cause composition failures. Check the release notes for breaking changes: v2.5 changed how `@requires` and `@provides` directives are resolved.
Use `rover supergraph check` to validate compatibility. If you see 'Unknown directive `@key`' or 'Field type mismatch' in composition, the versions are likely mismatched. Run `npm list @apollo/subgraph` on each subgraph and `npm list @apollo/gateway` on the gateway. Keep them in sync across all services.
The `ApolloGateway` constructor accepts an `experimental_didResolveReference` callback that lets you inspect entity resolution attempts. You can log the reference and the result. For example: `experimental_didResolveReference({ reference, subgraphName, result }) { console.log(`Resolved ${reference.__typename}:${reference.id} via ${subgraphName} -> ${JSON.stringify(result)}`); }`.
This is invaluable for diagnosing intermittent failures. You can see if the reference is malformed (e.g., missing fields) or if the subgraph returns an error. Warning: the callback is experimental and may change. Also, be aware of performance impact — don't log every resolution in production. Use a sampling rate or only enable it during debugging.
Frequently asked questions
Why does my subgraph return a valid entity but the gateway still shows `RESOLUTION_FAILURE`?
The gateway expects the entity to have a `__typename` field matching the type name in the schema. If your `__resolveReference` returns an object without `__typename`, the gateway treats it as invalid. Also, if the entity contains fields that are `@external` in the subgraph (i.e., provided by another subgraph), the gateway may reject the response if those fields are not null. Use `apollo.federation.reference` conversion functions to ensure proper format.
How do I test entity resolution locally without a gateway?
You can construct a reference object manually and call your `__resolveReference` resolver directly. In a test file, create a mock reference: `{ __typename: 'User', id: '1' }` and invoke the resolver. Also, use `rover subgraph introspect` to see the subgraph's schema, then simulate a gateway query by sending a GraphQL request to the subgraph with the `_entities` query: `query { _entities(representations: [{ __typename: "User", id: "1" }]) { ... on User { name } } }`. This bypasses the gateway.
What does `UNAUTHENTICATED` mean in entity resolution?
Entity resolution requests from the gateway to the subgraph are regular HTTP requests. If the subgraph requires authentication (e.g., via an `Authorization` header), it will reject the request with `UNAUTHENTICATED`. This is almost always due to the gateway not forwarding the auth token. Configure `willSendRequest` to forward headers from the gateway context to subgraph requests.
My schema composes fine locally but fails in CI. What's different?
Likely version mismatch. CI might be using a different version of `rover` or the `@apollo/rover-cli` package. Also, CI may have different environment variables (e.g., `APOLLO_KEY`, `APOLLO_GRAPH_REF`). Use `rover supergraph compose --config ./supergraph.yaml` locally and in CI, compare the output schemas. If they differ, check for `@tag` or `@inaccessible` directives that might be filtering fields in production.
Can I use federation with non-Node.js subgraphs?
Yes, Apollo Federation is language-agnostic. The subgraph only needs to expose a GraphQL endpoint that follows the federation spec. You must implement `_service` and `_entities` queries. Many languages have libraries: `federation-jvm`, `federation-rs`, `graphql-federation-python`. The tricky part is ensuring `__resolveReference` works correctly — verify with the `_entities` query test mentioned above.