LEARN · DEBUGGING GUIDE

AWS RDS Connection Refused: Security Group Debug Guide

When your app can't connect to RDS and you get 'connection refused', it's almost never a database crash—it's a security group or network ACL blocking the port. Here's how to confirm and fix it in minutes.

IntermediateCloud9 min read

What this usually means

Connection refused when RDS is available means a network layer is dropping the SYN packet or the RDS instance is not listening on the expected port/interface. The most common cause is a missing or incorrect security group inbound rule—either the source CIDR is wrong, the port is misconfigured, or the rule references a security group that doesn't match. However, it can also be a network ACL (NACL) blocking traffic, a VPC peering route missing, or the RDS instance being in a different VPC than the client. Sometimes the issue is the client itself—like a local firewall or an incorrect hostname resolving to the wrong IP. The key is to systematically isolate which layer is rejecting the connection.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1Run 'telnet <rds-endpoint> 3306' (or 5432 for PostgreSQL) from the client. If it hangs or says 'Connection refused', the TCP handshake is being blocked or the DB isn't listening.
  • 2Check RDS console: ensure the instance is 'Available' and the endpoint/DNS is correct. Copy the endpoint and test from a different client in the same VPC if possible.
  • 3Inspect the RDS security group inbound rules: look for a rule allowing traffic on the DB port from the client's IP or security group. Common mistake: source is set to 0.0.0.0/0 but the port is wrong, or source is a security group ID that doesn't match the client's security group.
  • 4VPC Flow Logs: enable flow logs for the RDS ENI (or subnet). Look for 'REJECT' records for traffic from client IP to RDS IP on the DB port. If you see 'ACCEPT', the problem is likely at the application layer.
  • 5Check NACL rules for the RDS subnet: NACLs are stateless, so both inbound and outbound rules must allow ephemeral ports (1024-65535) for return traffic. A missing outbound rule can cause 'Connection refused' even if inbound is open.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • searchAWS Console → RDS → Databases → {instance} → Connectivity & security → Security groups (note the security group ID)
  • searchAWS Console → VPC → Security Groups → {rds-sg} → Inbound rules (check source CIDR/group, port, protocol)
  • searchAWS Console → VPC → Network ACLs → {subnet's NACL} → Inbound/Outbound rules
  • searchVPC Flow Logs (CloudWatch Logs group) → query for 'REJECT' between client IP and RDS IP
  • searchClient machine: /var/log/syslog or /var/log/messages (for local firewall blocks like iptables)
  • searchRDS error logs: in RDS console under Logs & events → check for 'connection refused' or 'no incoming connections' messages
  • searchApplication configuration: check the hostname/endpoint string, port number, and SSL mode (if using TLS, ensure RDS certificate is trusted)
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningSecurity group inbound rule missing or misconfigured: wrong port (e.g., 3306 vs 5432), wrong source (0.0.0.0/0 vs specific IP/group), or rule order (implicit deny after allow)
  • warningNetwork ACL blocking inbound or outbound traffic: NACL inbound allows port 5432 but outbound denies ephemeral ports (return traffic), causing half-open connections
  • warningClient not in the same VPC or no VPC peering/transit gateway route: RDS is in VPC-A, client in VPC-B, and routes don't point to the correct target
  • warningRDS instance in 'inaccessible-encryption-credentials' state or stopped (but shows Available due to caching)
  • warningClient's local firewall (iptables, firewalld, Windows Firewall) blocking outbound connections or inbound responses
  • warningDNS resolution pointing to an outdated IP (e.g., after a failover or modification), especially if using a read replica endpoint
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildAdd the correct inbound rule to the RDS security group: source = client's security group ID (or /32 CIDR), port = DB port, protocol = TCP. Remove any overly permissive rules and rely on security group references.
  • buildUpdate NACL rules: ensure inbound allows DB port from client subnet, and outbound allows ephemeral ports (1024-65535) to client subnet. Remember NACLs are stateless.
  • buildIf client is in a peered VPC, verify VPC peering route tables: add routes in both VPCs pointing to the peering connection. Also check that the RDS security group allows traffic from the client's security group in the peer VPC (security group cross-reference works across peered VPCs if both are in the same region).
  • buildRestart the client application or flush DNS cache after verifying the RDS endpoint resolves correctly: use 'nslookup <rds-endpoint>' to check.
  • buildFor local firewall issues: run 'sudo iptables -L -n' and look for DROP/REJECT rules on outbound or input chains. Add a rule to allow traffic: 'sudo iptables -A OUTPUT -d <rds-ip> -p tcp --dport 5432 -j ACCEPT'.
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedRun 'telnet <rds-endpoint> 5432' from the client; it should return a blank screen or a banner (e.g., PostgreSQL's 'FATAL: no pg_hba.conf entry'). That means TCP connection succeeds.
  • verifiedCheck VPC Flow Logs: after fix, you should see 'ACCEPT' records for the TCP handshake (SYN, SYN-ACK, ACK).
  • verifiedTest from a different client in the same subnet: if it works, the issue is specific to the original client (local firewall, DNS, etc.).
  • verifiedUse 'nc -zv <rds-endpoint> 5432' (netcat) to test connectivity; exit code 0 indicates success.
  • verifiedFrom the RDS side, enable 'Enhanced Monitoring' and check 'RDS child processes' for any connection attempts being rejected by the DB engine itself (e.g., pg_hba.conf).
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningDon't assume 'connection refused' means the DB is down; always check network first. Restarting RDS unnecessarily causes downtime.
  • warningDon't use 0.0.0.0/0 as source in security groups unless absolutely necessary; it's a security risk and often masks the real issue. Prefer specific security group IDs.
  • warningDon't forget that NACLs are stateless: if you add an inbound allow, you must add an outbound allow for return traffic. Many engineers miss this.
  • warningDon't rely solely on ping: ICMP is not TCP; a ping success doesn't guarantee DB port connectivity.
  • warningDon't ignore the RDS parameter group: if the DB is configured to listen only on localhost (e.g., PostgreSQL's listen_addresses), it will refuse remote connections even if security groups are correct.
( 07 )War story

Security Group Cross-Reference Fail After VPC Peering

Senior Backend EngineerAWS RDS PostgreSQL, EC2, VPC Peering, Node.js

Timeline

  1. 09:15PagerDuty alert: production API returns 503 errors. Users can't log in.
  2. 09:17Check RDS console: instance 'prod-db' is 'Available'. CPU 5%, connections 0.
  3. 09:20SSH into EC2 app server, run 'telnet prod-db.xxx.us-east-1.rds.amazonaws.com 5432' → hangs then 'Connection refused'.
  4. 09:25Review RDS security group inbound rules: allow TCP 5432 from sg-app (the app server's security group). Looks correct.
  5. 09:30Check NACL for RDS subnet: inbound allows 5432 from 10.0.0.0/16, outbound allows all traffic. Seems fine.
  6. 09:35VPC Flow Logs: noticing REJECT records for traffic from app server IP to RDS IP on port 5432. Source: sg-app, but destination is a different security group ID.
  7. 09:40Realize that the EC2 instance was moved to a new security group 'sg-app-v2' during a deployment earlier, but the RDS inbound rule still references the old 'sg-app'.
  8. 09:42Update RDS inbound rule: change source to 'sg-app-v2'. Immediately see telnet succeed.
  9. 09:45Application recovers. All requests succeed.

Monday morning, 9:15 AM. Our Node.js API starts throwing 503 errors. Users can't log in. My first instinct—check RDS. The database shows 'Available', no spikes. But the app logs are full of 'connection refused' to the PostgreSQL writer endpoint. I SSH into an app server and try telnet: hangs and dies. Classic network block.

I pull up the RDS security group. There's an inbound rule for TCP 5432 from 'sg-app'—the security group we've always used for app servers. Should work. NACLs look fine. But VPC Flow Logs show REJECT. Something is off. Then I notice the source security group ID in the flow log doesn't match 'sg-app'. I check the EC2 instance—it's using 'sg-app-v2'. A deployment earlier this morning swapped the security group, but nobody updated the RDS rule.

I update the RDS inbound rule to reference 'sg-app-v2'. Telnet now connects instantly. The app recovers within minutes. The lesson: security group references are not dynamic—if you change the client's security group, you must update all dependent rules. Also, VPC Flow Logs saved us from chasing the wrong layer.

Root cause

RDS security group inbound rule referenced an outdated security group ID (sg-app) that no longer matched the EC2 instance's security group (sg-app-v2) after a deployment change.

The fix

Updated the RDS security group inbound rule source from 'sg-app' to 'sg-app-v2'.

The lesson

Always use consistent security group naming and automate updates via IaC (Terraform/CloudFormation) to prevent drift. VPC Flow Logs are the fastest way to pinpoint which security group is blocking traffic.

( 08 )Why Security Group References Break and How to Debug

Security group references in inbound rules evaluate the source security group's current member instances at the time of the connection. If you change the security group on a client instance, the old rule still exists but no longer matches any instances. This is a common source of 'connection refused' after infrastructure changes.

To debug: compare the security group ID in the RDS inbound rule with the actual security group attached to the client instance. Use 'aws ec2 describe-instances --instance-ids <id> --query 'Reservations[0].Instances[0].SecurityGroups'' to verify. If they differ, update the rule. Also check if the client instance is in a different VPC—security group references only work within the same VPC unless using VPC peering with cross-reference enabled (but that's limited).

( 09 )NACL Stateless Gotchas: The Ephemeral Port Trap

Network ACLs are stateless, meaning you must explicitly allow return traffic. When a client connects to RDS on port 5432, the response comes from RDS on an ephemeral port (1024-65535) back to the client. If your NACL outbound rules for the RDS subnet only allow specific ports (e.g., 80, 443), the SYN-ACK packet gets dropped, resulting in a 'connection refused' or timeout.

Fix: ensure the NACL outbound rule for the RDS subnet allows TCP traffic on ephemeral ports (1024-65535) to the client subnet. Similarly, the client subnet's NACL inbound must allow ephemeral ports from RDS. Use 'aws ec2 describe-network-acls' to inspect rules and add a rule with rule number lower than the deny-all default (e.g., 100) for outbound ephemeral traffic.

( 10 )VPC Peering and Route Table Debugging

When clients are in a different VPC (e.g., via peering), the RDS security group must allow traffic from the client's security group or CIDR. Additionally, the route tables in both VPCs must have routes pointing to the peering connection. A missing route causes packets to be dropped at the VPC boundary, manifesting as 'connection refused'.

Check route tables: 'aws ec2 describe-route-tables --filters Name=vpc-id,Values=<vpc-id>' and look for a route with target 'pcx-*' to the other VPC's CIDR. Also ensure that the RDS security group inbound rule includes the client's security group ID (cross-account/VPC referencing works if both are in the same region and you have permissions).

( 11 )Client-Side Firewalls and DNS Misresolution

Sometimes the problem isn't AWS but the client itself. A local firewall (iptables, firewalld, Windows Firewall) can block outbound connections to the RDS port. On Linux, run 'sudo iptables -L -n' and check the OUTPUT chain for DROP rules. Also, the RDS endpoint DNS might resolve to an IP that's no longer valid (e.g., after a failover).

Flush DNS cache: 'sudo systemd-resolve --flush-caches' or 'ipconfig /flushdns' on Windows. Then verify with 'nslookup <rds-endpoint>'. Compare the resolved IP with the RDS console's endpoint address. If they differ, update your application configuration.

( 12 )Using VPC Flow Logs for Pinpoint Diagnosis

VPC Flow Logs capture metadata about traffic reaching the RDS ENI. To diagnose 'connection refused', look for records where 'action' is 'REJECT' for traffic from the client IP to the RDS IP on the DB port. This indicates a security group or NACL block. If you see 'ACCEPT', the TCP handshake completed, and the refusal is coming from the DB engine itself (e.g., pg_hba.conf, user authentication).

Enable flow logs on the RDS subnet or ENI: 'aws ec2 create-flow-logs --resource-type Subnet --resource-ids <subnet-id> --traffic-type ALL --log-group-name /aws/vpc/flow-logs --deliver-logs-permission-arn <role-arn>'. Then query CloudWatch Logs Insights: 'fields @timestamp, srcAddr, dstAddr, srcPort, dstPort, action | filter dstAddr like '<rds-ip>' and dstPort = 5432 and action = 'REJECT' | sort @timestamp desc | limit 20'.

Frequently asked questions

Why does telnet to RDS say 'Connection refused' even though the security group looks correct?

Possible reasons: (1) The security group inbound rule references a source security group that doesn't match the client's current security group. (2) A network ACL is blocking return traffic (stateless). (3) The client is in a different VPC without proper routing. (4) The RDS instance is configured to listen only on localhost (check parameter group 'listen_addresses' for PostgreSQL, 'bind-address' for MySQL). (5) A local firewall on the client is blocking outbound traffic. Use VPC Flow Logs to confirm where the packet is dropped.

How do I check if a security group rule is actually being applied?

The best way is to use VPC Flow Logs: enable them for the RDS subnet and look for REJECT actions. Alternatively, you can temporarily add a rule allowing all traffic (0.0.0.0/0) on the DB port—if connections succeed, your original rule is wrong. But be careful: this opens the database to the internet. A safer method is to launch a temporary EC2 instance in the same security group as the RDS and test connectivity from there.

Can a security group rule be correct but still block traffic due to rule order?

Security group rules are evaluated holistically—there is no priority order; all rules are considered and the most permissive applies. However, the implicit deny at the end means if no rule matches, traffic is denied. So rule order doesn't matter. NACLs, on the other hand, are evaluated in ascending rule number order, with the first matching rule applied. So for NACLs, rule order matters.

How do I fix 'Connection refused' when the RDS is in a different VPC (peered)?

Ensure: (1) VPC peering connection is in 'active' state. (2) Route tables in both VPCs have routes to the other VPC's CIDR via the peering connection. (3) RDS security group inbound rule allows traffic from the client's security group (or CIDR). Note: security group cross-references across peered VPCs are supported only if both VPCs are in the same region and you have the appropriate permissions. If not, use the client's CIDR as source. (4) NACLs on both sides allow the traffic.

What does 'Connection refused' vs 'Connection timed out' indicate?

'Connection refused' means the TCP SYN packet reached the destination but was actively rejected (RST packet sent). This usually means the port is closed on the RDS instance (e.g., security group dropping inbound, DB not listening, or NACL blocking and sending RST). 'Connection timed out' means the SYN packet was silently dropped (no response), likely due to a firewall or ACL that doesn't send RST—often a network ACL or an internet gateway issue. Both can be caused by security groups, but timeout is more common with NACLs or missing routes.