LEARN · DEBUGGING GUIDE

MySQL 'Too Many Connections' Error — Root Cause Diagnosis & Fix

MySQL's 'Too many connections' error isn't just about max_connections being too low. The real culprits are often leaked connections, long-running queries, or misconfigured connection pools that exhaust the thread stack.

IntermediateDatabase9 min read

What this usually means

The error means MySQL's thread pool has no available threads to accept a new connection. Each connection consumes a thread and memory (typically 256KB–2MB per thread). When the count reaches max_connections, the server refuses new connections outright. But the root cause isn't always too many legitimate users. Common hidden causes include: connection leaks where application code fails to close connections; connection pool misconfiguration (e.g., pool size > max_connections); long-running queries that hold connections open; or a DDoS-like burst of rapid connections that exhausts the thread stack before authentication completes. Additionally, if the thread_cache_size is small, threads are destroyed after disconnect, causing high connection churn that spikes the count.

( 01 )Fast diagnosis

The first ten minutes — establish facts before touching code.

  • 1ssh to the MySQL host and run: mysql -u root -p -e "SHOW STATUS LIKE 'Threads_connected';" — this tells you the current connection count immediately.
  • 2Check max_connections setting: mysql -e "SHOW VARIABLES LIKE 'max_connections';" — default is 151, often too low.
  • 3Identify who is connected: mysql -e "SELECT user, host, count(*) FROM information_schema.processlist GROUP BY user, host ORDER BY count(*) DESC LIMIT 10;" — find top consumers.
  • 4Check for idle transactions: mysql -e "SELECT * FROM information_schema.innodb_trx WHERE TIME_TO_SEC(TIMEDIFF(NOW(), trx_started)) > 60;" — long-running transactions hold connections.
  • 5Review MySQL error log: tail -100 /var/log/mysql/error.log | grep -i "too many connections" — confirms the error and may show offending IPs.
  • 6Use tcpdump to capture connection rate: sudo tcpdump -i any port 3306 -c 1000 | wc -l — if >100 connections/sec, it's a burst.
( 02 )Where to look

The specific files, logs, configs, and dashboards that usually own this bug.

  • search/var/log/mysql/error.log — MySQL error log for 'Too many connections' warnings and connection abort messages.
  • searchinformation_schema.processlist — real-time list of all active connections, user, host, command, state, and time.
  • searchinformation_schema.innodb_trx — shows any long-running transactions that might be holding connections open.
  • searchApplication server logs — look for connection pool metrics (e.g., HikariCP pool active/idle count) that could indicate leaks.
  • searchMySQL configuration: /etc/mysql/my.cnf or /etc/my.cnf — check max_connections, thread_cache_size, connect_timeout, wait_timeout.
  • searchMonitoring dashboards (e.g., Prometheus, Datadog) — graph of Threads_connected over time to see patterns.
( 03 )Common root causes

Practical causes, not theory. These are the things you will actually find.

  • warningConnection pool size in application (e.g., HikariCP maximumPoolSize) set higher than MySQL max_connections, causing exhaustion during traffic spikes.
  • warningApplication code fails to close connections properly (leak) — e.g., missing try-with-resources or finally block that calls connection.close().
  • warningLong-running queries or transactions (hours) that hold connections open, reducing effective pool capacity.
  • warningAbnormal connection burst from a misconfigured health checker or a DDoS attack that exhausts the thread stack.
  • warningMySQL max_connections set too low for the expected number of concurrent users (default 151 is often insufficient).
  • warningThread_cache_size set to 0, causing threads to be destroyed after each disconnect, increasing overhead and connection count fluctuations.
( 04 )Fix patterns

Concrete fix directions. Pick the one that matches your root cause.

  • buildImmediate relief: mysql -e "SET GLOBAL max_connections=500;" — increase the limit without restart (but temporary; update config file permanently).
  • buildKill idle connections: mysql -e "SELECT CONCAT('KILL ', id, ';') FROM information_schema.processlist WHERE command='Sleep' AND time > 300;" — then execute the generated KILL statements.
  • buildFix connection pool config: In application YAML/properties, ensure maximumPoolSize ≤ max_connections minus a buffer (e.g., 10 connections for admin).
  • buildAdd connection leak detection: Enable HikariCP leakDetectionThreshold (e.g., 30000ms) in application config to log stack traces of leaked connections.
  • buildOptimize long-running queries: Add indexes, rewrite queries, or break large transactions into smaller chunks to reduce connection hold time.
  • buildImplement connection throttling: Use a proxy like ProxySQL or HAProxy to queue connections instead of hitting MySQL directly.
( 05 )How to verify

A fix you cannot prove is a guess. Close the loop.

  • verifiedAfter changes, monitor Threads_connected: mysql -e "SHOW STATUS LIKE 'Threads_connected';" — should stay well below max_connections.
  • verifiedRun load test with expected peak traffic: e.g., ab -n 10000 -c 200 http://app/endpoint — verify no connection errors.
  • verifiedCheck application logs for connection acquisition timeouts: they should disappear.
  • verifiedVerify connection pool metrics: HikariCP exposes /actuator/health — pool active should not reach maximumPoolSize consistently.
  • verifiedSet up alerting on Threads_connected > 80% of max_connections to catch future issues early.
  • verifiedRun a leak detection test: use JMeter to simulate traffic and check Threads_connected doesn't increase indefinitely after traffic stops.
( 06 )Mistakes to avoid

Things that make this bug worse or harder to find.

  • warningBouncing the MySQL server as a fix — it resets connections temporarily but doesn't address root cause; connections will spike again.
  • warningBlindly increasing max_connections to a huge number (e.g., 10000) — each connection consumes RAM; on a 8GB server, 10000 connections can cause OOM.
  • warningSetting wait_timeout too low (e.g., 10 seconds) — may kill legitimate idle connections prematurely, causing application errors.
  • warningIgnoring the thread_cache_size — setting it to 0 forces thread recreation, adding latency and connection churn.
  • warningNot adding connection leak detection in application — without it, leaks go unnoticed until production outage.
  • warningAssuming the error is always a capacity issue — check for connection bursts from automated scripts or monitoring tools that reconnect too fast.
( 07 )War story

Midnight Connection Meltdown from a Misconfigured Pool

Senior Backend Engineer (on-call)MySQL 8.0, Spring Boot 2.7 with HikariCP, AWS RDS db.r5.large, Kubernetes (EKS)

Timeline

  1. 02:17PagerDuty alert: 'Too many connections' error from payment-service
  2. 02:20SSH to bastion, run SHOW STATUS LIKE 'Threads_connected' → 450 (max_connections=500)
  3. 02:22Run processlist query: top consumer is 'payment-worker' with 300 connections, all idle 'Sleep' state
  4. 02:25Check payment-worker logs: HikariPool-1 is at maximumPoolSize=300, but pool is idle
  5. 02:28Run SHOW VARIABLES LIKE 'wait_timeout' → 28800 (8 hours)
  6. 02:30Immediate fix: SET GLOBAL max_connections=1000; kill idle connections older than 1 hour
  7. 02:35Threads_connected drops to 150; service recovers.
  8. 09:00Root cause identified: code path missing connection.close() in catch block for payment retry logic.
  9. 09:30Deploy fix with leakDetectionThreshold=30s and max_connections increased to 800 in RDS parameter group.

I got the alert at 2:17 AM — payment service was throwing MySQL 'Too many connections' errors. The first thing I did was SSH into the bastion and run SHOW STATUS LIKE 'Threads_connected'. It was 450 out of 500 max_connections. I ran the processlist query and saw that the 'payment-worker' service had 300 connections, all in 'Sleep' state with zero time. That told me these were idle connections that never got closed. The application logs confirmed: HikariPool-1 was reporting pool Active: 0, Idle: 300 — the pool was full but all connections were idle. The wait_timeout was 28800 seconds (8 hours), so these leaked connections would hang around for hours.

I immediately increased max_connections to 1000 and killed all idle connections older than 1 hour using a KILL command batch. That dropped Threads_connected to 150, and the service recovered within a minute. I also temporarily set wait_timeout to 300 seconds to flush remaining idle connections faster. The next morning, I dug into the code and found a code path in the payment retry logic where a connection was opened inside a try block but the finally block had a condition that skipped close() if an exception occurred. That was a classic leak. We added HikariCP's leakDetectionThreshold=30000ms to catch future leaks in logs.

The permanent fix was threefold: increase max_connections to 800 in the RDS parameter group, add the leak detection threshold in the application config, and fix the missing close() in the catch block. We also set up a CloudWatch alarm on Threads_connected at 80% of max_connections. The lesson: always use connection leak detection in production, and never assume default pool sizes are safe.

Root cause

Connection leak in payment-worker: a code path in retry logic opened a connection but did not close it in the catch block, causing connections to accumulate as idle.

The fix

Fixed the missing close() in the catch block; increased max_connections to 800; added HikariCP leakDetectionThreshold=30s.

The lesson

Always enable connection leak detection in production connection pools. Monitor Threads_connected proactively with alerts at 80% of max_connections.

( 08 )Thread Stack Exhaustion vs. Connection Limit

MySQL uses one thread per connection. The 'Too many connections' error can also be triggered by thread stack exhaustion if the stack size is large and max_connections is set high. On Linux, default stack size is 256KB per thread. For 1000 connections, that's 256MB of stack space alone, plus other memory. If ulimit -s is 8MB (common), 1000 threads would need 8GB of stack — that can exceed available memory and cause the server to refuse connections even before hitting max_connections. Check thread_stack variable: SHOW VARIABLES LIKE 'thread_stack' — default 262144 (256KB). If you increase max_connections, also consider reducing thread_stack or upgrading RAM.

To diagnose thread exhaustion, check the MySQL error log for 'Out of memory' or 'Can't create thread' messages. Also check OS limits: cat /proc/$(pidof mysqld)/limits | grep 'max user processes'. If max user processes is low, the OS may prevent new threads. The fix: increase ulimit -u in /etc/security/limits.conf for the mysql user.

( 09 )Connection Pool Leak Detection in Practice

HikariCP's leakDetectionThreshold logs a stack trace when a connection is held longer than the threshold. Enable it in application.properties: spring.datasource.hikari.leak-detection-threshold=30000 (30 seconds). This logs a warning like 'Connection leak detection triggered' with the stack trace of the thread that borrowed the connection. It does NOT close the connection — it only logs. For production, set it to a value that allows long-running queries but catches leaks (e.g., 10x the slowest query).

But beware: if you set it too low (e.g., 1 second), it will spam logs with false positives from legitimate slow queries. Best practice: start with 30 seconds, monitor logs, then adjust. Also, enable pool metrics via Micrometer: spring.datasource.hikari.metrics=true — then graph hikaricp_connections_active, hikaricp_connections_idle, and hikaricp_connections_pending. A rising active count with falling idle count under no load indicates a leak.

( 10 )Burst Connection Rate and the connect_timeout Trap

A rapid connection burst (e.g., from a misconfigured health check that reconnects every 100ms) can exhaust the thread stack before max_connections is reached. Each connection attempt consumes a thread during the handshake phase. If the handshake is slow (e.g., DNS resolution or SSL), threads can pile up. To diagnose, use tcpdump to see SYN rate: sudo tcpdump -i any port 3306 -c 1000 and check for many SYN packets without ACK.

Mitigation: reduce connect_timeout (default 10 seconds) to 2–5 seconds to drop slow handshakes faster. Also set back_log (default 80) to a higher value (e.g., 500) to queue pending connections. But back_log is limited by the OS net.core.somaxconn (default 128). Increase both: sysctl -w net.core.somaxconn=1024 and add to /etc/sysctl.conf. In MySQL config: back_log=500. This allows MySQL to queue connections instead of refusing them during a burst.

( 11 )Long-Running Transactions as Connection Holders

A long-running transaction keeps the connection active even if the application is idle. This reduces effective pool capacity. Use information_schema.innodb_trx to find transactions older than 60 seconds: SELECT * FROM information_schema.innodb_trx WHERE TIME_TO_SEC(TIMEDIFF(NOW(), trx_started)) > 60. Kill them with KILL QUERY or KILL CONNECTION (KILL CONNECTION kills the connection, KILL QUERY kills only the query but the transaction may continue).

To prevent, set innodb_lock_wait_timeout (default 50 seconds) to a lower value (e.g., 10 seconds) to abort long waits. Also set session wait_timeout and interactive_timeout to reasonable values (e.g., 300 seconds). But be careful: setting them too low can break long-running batch jobs. Use separate connection pools for batch and OLTP workloads.

( 12 )Using ProxySQL to Throttle Connections

ProxySQL is a MySQL proxy that can queue connections when the backend reaches max_connections. Install ProxySQL, configure mysql_servers with max_connections=500 (backend limit), and set mysql_users. ProxySQL will hold connections in its own queue (mysql_connection_max_age) and refuse new connections with a friendly error instead of a MySQL crash. It also provides connection multiplexing: multiple frontend connections can share backend connections, reducing total backend connections.

Configuration: in ProxySQL admin, set mysql-max_connections to a value lower than MySQL's max_connections (e.g., 450). Then set mysql-connection_max_age_ms to 30000 (30 seconds) to recycle connections. ProxySQL also logs connection errors, making it easier to spot patterns. This is especially useful for legacy apps that cannot fix connection leaks quickly.

Frequently asked questions

What's the difference between 'Too many connections' and 'Connection refused'?

'Too many connections' (errno 1040) means MySQL has reached its max_connections limit and refuses new connections gracefully. 'Connection refused' (errno 111) means the MySQL process is not listening on the port (e.g., service down, or firewall blocking). Check with 'mysqladmin ping' and 'netstat -tlnp | grep 3306'.

Can I recover from 'Too many connections' without restarting MySQL?

Yes, you can connect as root (even when max_connections is reached) because root has a reserved connection slot (extra_connections). Run 'mysql -u root -p' and either increase max_connections dynamically: 'SET GLOBAL max_connections=1000;' or kill idle connections: 'SHOW PROCESSLIST;' then 'KILL id;'. After that, update the config file permanently.

What is the default max_connections in MySQL 8.0?

The default is 151. However, on some cloud instances (e.g., AWS RDS), the default may be higher. Check with 'SHOW VARIABLES LIKE 'max_connections';'. If you have many concurrent users, increase it — but monitor memory usage as each connection uses about 2MB.

How do I find which application is causing the most connections?

Run 'SELECT user, host, count(*) FROM information_schema.processlist GROUP BY user, host ORDER BY count(*) DESC;' This shows the count per user and host. If you have multiple apps using the same MySQL user, differentiate by host (each app server has a different IP). You can also check the 'info' column for running queries.

What is thread_cache_size and does it affect 'Too many connections'?

thread_cache_size determines how many threads MySQL caches for reuse after a connection closes. Default is 8. If set to 0, threads are destroyed after each disconnect, causing overhead. A higher cache reduces the need to create new threads, which can help prevent spikes in thread count during connection churn. Set it to a value like 100 to smooth out connection bursts.