CRITICALAWSCloud

Application hit RDS max_connections during traffic spike

awsrdsdatabaseconnectionslambda

Symptoms

Application logs show 'FATAL: too many connections for role app'
RDS DatabaseConnections metric is flat at the maximum
Users see 5xx responses during peak hours

Root Cause

Each pod opens its own connection pool and traffic doubled
Long-lived Lambda concurrency spikes created new connections faster than RDS could recycle
No connection pooler (RDS Proxy or PgBouncer) in front of the database

Diagnosis

Check Performance Insights → Top SQL for hanging sessions
SELECT count(*) FROM pg_stat_activity GROUP BY state;
Look at CloudWatch DatabaseConnections and CPUUtilization

Fix

Introduce RDS Proxy in front of the database:

aws rds create-db-proxy \
  --db-proxy-name app-proxy \
  --engine-family POSTGRESQL \
  --auth AuthScheme=SECRETS,SecretArn=$SECRET \
  --role-arn $ROLE_ARN

Lower the application pool size per pod and raise RDS max_connections in the parameter group

Kill stuck sessions: `SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND state_change < now() - interval '10 min';`

Prevention

Load test the pool size × replica count against max_connections
Adopt RDS Proxy for all serverless workloads
Alert at 70% of max_connections