CRITICALAWSCloud

Application hit RDS max_connections during traffic spike

awsrdsdatabaseconnectionslambda
Symptoms
  • Application logs show 'FATAL: too many connections for role app'
  • RDS DatabaseConnections metric is flat at the maximum
  • Users see 5xx responses during peak hours
Root Cause
  • Each pod opens its own connection pool and traffic doubled
  • Long-lived Lambda concurrency spikes created new connections faster than RDS could recycle
  • No connection pooler (RDS Proxy or PgBouncer) in front of the database
Diagnosis
  • Check Performance Insights → Top SQL for hanging sessions
  • SELECT count(*) FROM pg_stat_activity GROUP BY state;
  • Look at CloudWatch DatabaseConnections and CPUUtilization
Fix
  • Introduce RDS Proxy in front of the database:
  • aws rds create-db-proxy \
      --db-proxy-name app-proxy \
      --engine-family POSTGRESQL \
      --auth AuthScheme=SECRETS,SecretArn=$SECRET \
      --role-arn $ROLE_ARN
    
  • Lower the application pool size per pod and raise RDS max_connections in the parameter group
  • Kill stuck sessions: `SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND state_change < now() - interval '10 min';`
  • Prevention
    • Load test the pool size × replica count against max_connections
    • Adopt RDS Proxy for all serverless workloads
    • Alert at 70% of max_connections