What this usually means
Node.js relies on the operating system's DNS resolver (libc's getaddrinfo) or its own DNS implementation (since Node 10, the internal resolver uses c-ares by default). When you see ENOTFOUND, EAI_AGAIN, or timeouts, the root cause is typically one of: (1) the DNS server is unreachable or misconfigured in /etc/resolv.conf, (2) the resolver library (c-ares vs libc) behaves differently under load or with certain record types, (3) network filters or firewall rules block UDP/53, (4) the system's DNS cache is poisoned or stale, or (5) the hostname simply doesn't exist (typo or DNS record not propagated). In containerized environments, a common pitfall is that /etc/resolv.conf points to a stub resolver (e.g., systemd-resolved on 127.0.0.53) that Node.js's c-ares library cannot handle correctly.
The first ten minutes — establish facts before touching code.
- 1Check the exact error message: `node -e "require('dns').resolve4('example.com', console.log)"`
- 2Compare with OS resolver: `dig example.com @8.8.8.8` vs `getent hosts example.com`
- 3Inspect /etc/resolv.conf: `cat /etc/resolv.conf` — look for 'nameserver' lines and any 'search' domains
- 4Test with Node.js's `--dns-result-order=ipv4first` flag to rule out IPv6 issues: `node --dns-result-order=ipv4first app.js`
- 5Set the environment variable `NODE_OPTIONS="--dns-result-order=ipv4first"` and restart the process
- 6Use `strace -e network -p <PID>` to see actual syscalls if the process is running
The specific files, logs, configs, and dashboards that usually own this bug.
- search/etc/resolv.conf — nameserver order and search domains
- search/etc/nsswitch.conf — hosts line to see resolution order (e.g., 'files dns')
- searchSystemd-resolved status: `resolvectl status` or `systemd-resolve --status`
- searchNode.js process environment: `cat /proc/<PID>/environ | tr '\0' '\n' | grep -i dns`
- searchApplication logs for stack traces with 'getaddrinfo' or 'ENOTFOUND'
- searchNetwork packet capture: `tcpdump -i any port 53 -vv -X` to see DNS queries and responses
- searchContainer DNS config: `docker inspect <container> | jq '.[0].HostConfig.Dns'` or Kubernetes Pod's /etc/resolv.conf
Practical causes, not theory. These are the things you will actually find.
- warning/etc/resolv.conf points to a stub resolver (127.0.0.53) that c-ares cannot handle—Node.js sends queries but gets no response
- warningIPv6 DNS resolution is attempted first and fails, causing timeout before falling back to IPv4
- warningThe DNS server is unreachable due to firewall rules blocking UDP/53 or a misconfigured security group
- warning`search` domain in resolv.conf causes Node.js to try multiple domain combinations, delaying the actual lookup
- warningNode.js's internal DNS cache is stale (the dns module caches results by default for 100ms, but only for sequential lookups)
- warningThe hostname is a private/internal name not resolvable by public DNS, and the system is not using the correct internal DNS server
Concrete fix directions. Pick the one that matches your root cause.
- buildSwitch to libc resolver by setting `--dns-result-order=ipv4first` and environment variable `NODE_DNS_RESOLVER=libc` (Node >= 17) or use the `dns` module with `dns.setServers()` pointing directly to a known working DNS server
- buildIf using systemd-resolved, configure Node.js to use the stub resolver properly by ensuring /etc/resolv.conf is a symlink to /run/systemd/resolve/stub-resolv.conf (not /usr/lib/systemd/resolv.conf)
- buildExplicitly set DNS servers in the application using `dns.setServers(['8.8.8.8', '1.1.1.1'])` at startup
- buildIn Docker, set `--dns` flag to a reliable public DNS like 8.8.8.8 or the host's DNS
- buildDisable IPv6 DNS resolution entirely with `--dns-result-order=ipv4first`
- buildIncrease DNS timeout with `dns.resolve4(hostname, { timeout: 5000 }, callback)` if using the promise API
A fix you cannot prove is a guess. Close the loop.
- verifiedRun `node -e "require('dns').resolve4('google.com', (e,a)=>console.log(e||a))"` and confirm it returns an IP under 100ms
- verifiedCheck that `dig google.com` from the same host resolves correctly—if not, fix OS DNS first
- verifiedFor containerized apps, exec into the container and run the same Node.js DNS test as above
- verifiedMonitor DNS query timings: `curl -o /dev/null -s -w "%{time_namelookup}\n" https://google.com`
- verifiedDeploy the fix to a canary instance and watch error rates for ENOTFOUND drop
- verifiedUse `node --trace-events-enabled -e "..."` and analyze the trace for dns events
Things that make this bug worse or harder to find.
- warningDon't blindly restart the Node.js process without checking /etc/resolv.conf first—it often reloads the same bad config
- warningDon't assume `nslookup` works means Node.js will work—Node uses c-ares by default, which may behave differently than glibc's resolver
- warningDon't set `dns.setServers()` inside a hot code path—it's not thread-safe and can cause race conditions
- warningDon't ignore the search domain in resolv.conf—a long search list can cause cascading timeouts
- warningDon't apply a blanket IPv4 fix without understanding if IPv6 is actually needed—some services require IPv6
- warningDon't patch DNS in application code if the real issue is infrastructure—fix the DNS server or network config
Production Outage: Node.js Microservices Can't Resolve Internal Service Names After Container Restart
Timeline
- 09:15PagerDuty alert: 'High error rate on payment-service' — 5xx responses spiking
- 09:17Check logs: ENOTFOUND for 'user-service.internal.example.com' from payment-service
- 09:20SSH into an ECS host: `docker exec <payment-container> nslookup user-service.internal.example.com` works!
- 09:25Run `node -e "require('dns').resolve4('user-service.internal.example.com', console.log)"` inside container — hangs for 10 seconds then ENOTFOUND
- 09:30Check /etc/resolv.conf inside container: nameserver 127.0.0.53 (systemd-resolved stub)
- 09:35Check host's resolvectl: stub is working, but container's c-ares can't handle it
- 09:40Restart container with --dns=8.8.8.8 — error persists because application hardcoded DNS? No.
- 09:45Set env NODE_DNS_RESOLVER=libc and restart container — DNS works immediately
- 09:50Errors drop to zero. Root cause confirmed.
The alert came in at 9:15 AM. Payment-service was returning 500 errors for every request that involved looking up user-service. I checked the logs and saw a wall of 'getaddrinfo ENOTFOUND user-service.internal.example.com' stack traces. This was a microservice DNS resolution failure — classic.
I SSH'd into the ECS host and exec'd into the payment container. Running `nslookup` worked fine. That's when I knew Node.js's DNS resolver was the difference. I ran the same lookup using Node's dns module — it hung for 10 seconds then failed. The container's /etc/resolv.conf pointed to 127.0.0.53, the systemd-resolved stub resolver. Node's c-ares library doesn't handle the stub's non-standard responses properly.
I set the environment variable NODE_DNS_RESOLVER=libc (available since Node 17) and restarted the container. Resolution worked instantly. The fix was to tell Node to use the OS's libc resolver instead of its own. We also updated our ECS task definitions to set this environment variable for all Node.js containers. The incident lasted 35 minutes total, and the fix was a single environment variable.
Root cause
Node.js 18 uses c-ares as its default DNS resolver. The container's /etc/resolv.conf pointed to systemd-resolved's stub resolver (127.0.0.53), which c-ares could not properly communicate with due to its non-standard response format. This caused DNS queries to hang and eventually return ENOTFOUND.
The fix
Set environment variable NODE_DNS_RESOLVER=libc to force Node.js to use the system's libc resolver (getaddrinfo) instead of c-ares. Alternatively, configure the container to use a standard DNS server (e.g., 8.8.8.8) directly.
The lesson
Never assume Node.js DNS works exactly like the shell's resolver. Always test with the exact Node.js runtime. In containerized environments, be aware of the resolver chain — the stub resolver (systemd-resolved) can cause issues with non-glibc programs like Node's c-ares. When in doubt, switch to libc resolver or point to a standard DNS server.
Node.js has two DNS resolver implementations. The default (since Node 10) uses the c-ares library, an asynchronous C library that runs DNS queries in the thread pool. The alternative is to use the libc system calls (getaddrinfo) via the `dns` module's `lookup()` function. The c-ares library is asynchronous and non-blocking, but it has its own quirks: it parses /etc/resolv.conf directly, it does not support all NSS (Name Service Switch) mechanisms like systemd-resolved or LDAP, and it has a different timeout/retry behavior than glibc.
To check which resolver your Node.js version uses, run `node -e "console.log(process.features.cares)"` — if true, c-ares is the default. You can switch to libc by setting the environment variable `NODE_DNS_RESOLVER=libc` (Node >= 17) or by using the `dns.lookup()` function which always uses libc. This is critical in environments with non-standard DNS configurations.
Many modern Linux distributions (Ubuntu 16.04+, Fedora, etc.) use systemd-resolved to manage DNS. It listens on 127.0.0.53 and provides a stub resolver. However, the stub's behavior is not fully compliant with the traditional DNS protocol expected by c-ares. Specifically, when c-ares sends a query for an A record, systemd-resolved may respond with a non-standard answer that c-ares misinterprets as a failure. This leads to ENOTFOUND even though the hostname is resolvable via `dig` or `getent`.
The fix is to either use the libc resolver (which calls getaddrinfo and works with the stub) or to change the container's /etc/resolv.conf to point directly to a standard DNS server (e.g., 8.8.8.8). In Docker, you can specify `--dns 8.8.8.8` at container runtime. In Kubernetes, you can configure Pod DNS policy to use `ClusterFirstWithHostNet` or override `dnsConfig`.
When a DNS query times out, c-ares will retry based on the `timeout` and `attempts` configuration in /etc/resolv.conf. The default timeout is 5 seconds per attempt, with 2 attempts by default. This means a single failed lookup can take up to 10 seconds before returning ENOTFOUND. If your application has a short request timeout (e.g., 2 seconds), it will timeout before the DNS resolver even finishes.
To reduce DNS timeout impact, configure the application to use a shorter DNS timeout: `dns.resolve4(hostname, { timeout: 2000 }, callback)`. Alternatively, use `dns.lookup()` which uses libc and respects the system's resolver configuration, often with faster timeouts. You can also set the `NODE_OPTIONS` environment variable to include `--dns-result-order=ipv4first` to avoid waiting for IPv6 lookups that may fail.
Node.js's dns module has a built-in cache that caches results for a very short time (100ms by default). This cache is per-process and is not shared. If you have multiple Node.js processes, each has its own cache. In production, this cache is often too short to be useful, but it can cause issues if you have a burst of requests for the same hostname immediately after a cache invalidation (or if the cache is poisoned).
To inspect the cache, there is no direct API, but you can monitor the number of DNS queries via OS tools like `tcpdump`. You can disable caching by setting `dns.setServers()` which also clears the cache. For high-performance applications, consider using an external DNS cache like `dns-cache` or a local DNS proxy (e.g., `dnsmasq`) to reduce lookup latency.
Frequently asked questions
Why does `nslookup` work but Node.js `dns.resolve4` fails?
This is usually because `nslookup` uses the system's resolver (glibc/getaddrinfo) while Node.js by default uses c-ares. c-ares parses /etc/resolv.conf directly and may not handle stub resolvers (like systemd-resolved on 127.0.0.53) correctly. It can also have different timeout/retry behavior. To fix, either set `NODE_DNS_RESOLVER=libc` to force Node to use the system resolver, or configure a standard DNS server in /etc/resolv.conf.
What does ENOTFOUND mean exactly?
ENOTFOUND means the hostname could not be resolved to an IP address. It indicates that the DNS resolver completed the query (or timed out) and did not receive a positive answer. It does not distinguish between 'hostname does not exist' and 'DNS server unreachable' — both result in ENOTFOUND. Check the error details: if it includes 'getaddrinfo', it came from libc; if it's from the dns module, it's from c-ares.
How do I change the DNS servers Node.js uses without touching /etc/resolv.conf?
You can call `dns.setServers(['8.8.8.8', '1.1.1.1'])` at the start of your application. This overrides the system's DNS servers for all subsequent `dns.resolve*` calls. Note that this does not affect `dns.lookup()` which always uses libc. Also, this call clears the internal DNS cache. If you need per-request control, you can pass a `server` option to `dns.resolve4()` (Node >= 18) or use a custom resolver.
Does Node.js support DNS over HTTPS (DoH)?
Node.js does not natively support DNS over HTTPS in the built-in dns module as of Node 20. However, you can use third-party packages like `dohjs` or `node-doh` to implement DoH. Alternatively, configure a local DNS proxy (like `dnsmasq` or `stubby`) that supports DoH and point Node.js to that proxy. The environment variable `NODE_DNS_RESOLVER=libc` will then use the system resolver, which could be configured to use DoH via systemd-resolved.
Why does DNS resolution work on one machine but not another in the same cluster?
Differences in /etc/resolv.conf are the most common cause. Check the nameserver entries — one machine might use a stub resolver (127.0.0.53) while another uses a corporate DNS server. Also check the `search` domain list, which can cause additional lookups and delays. OS updates or container orchestration tools (like Docker or Kubernetes) can modify resolv.conf. Ensure all machines have consistent DNS configuration, and consider using a dedicated DNS service like CoreDNS or Route53.