Database Issues
Database Issues
Troubleshooting guide for PostgreSQL, Redis, and other database-related problems.
PostgreSQL Issues
Issue: Connection Refused
Symptoms
- "Connection refused" errors
- Applications cannot connect to database
- Health checks failing
- Port not listening
Cause
- PostgreSQL service not running
- Port conflict
- Firewall blocking connections
- pg_hba.conf misconfiguration
- Max connections exceeded
Solution
# Check if PostgreSQL is running docker ps | grep postgres docker logs postgres --tail 50 # Verify port is listening docker exec postgres pg_isready -h localhost -p 5432 # Check max connections docker exec postgres psql -c "SHOW max_connections;" docker exec postgres psql -c "SELECT count(*) FROM pg_stat_activity;" # Kill idle connections docker exec postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '10 minutes';" # Check pg_hba.conf docker exec postgres cat /var/lib/postgresql/data/pg_hba.conf # Restart PostgreSQL docker restart postgres
Prevention
- Set appropriate max_connections
- Implement connection pooling (PgBouncer)
- Monitor connection count
- Configure connection timeouts
Issue: Slow Queries
Symptoms
- API response times increasing
- Database CPU high
- Lock wait timeouts
- Application timeouts
Cause
- Missing indexes
- N+1 query patterns
- Large table scans
- Lock contention
- Suboptimal query plans
Solution
# Enable slow query logging docker exec postgres psql -c "ALTER SYSTEM SET log_min_duration_statement = 1000;" docker exec postgres psql -c "SELECT pg_reload_conf();" # Find slow queries docker exec postgres psql -c "SELECT query, calls, mean_time, total_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;" # Analyze query plan docker exec postgres psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = 'test@example.com';" # Check for missing indexes docker exec postgres psql -c "SELECT relname, seq_scan, idx_scan, n_live_tup FROM pg_stat_user_tables WHERE seq_scan > idx_scan ORDER BY seq_scan DESC LIMIT 10;" # Create missing index docker exec postgres psql -c "CREATE INDEX CONCURRENTLY idx_users_email ON users(email);" # Update statistics docker exec postgres psql -c "ANALYZE;"
Prevention
- Use query analyzers in development
- Implement database monitoring
- Regular ANALYZE runs
- Review query patterns regularly
Issue: Disk Space Exhaustion
Symptoms
- "No space left on device" errors
- Write operations failing
- WAL files growing
- Transaction log full
Cause
- Unvacuumed tables (bloat)
- WAL retention too high
- Large tables without partitioning
- Orphaned temp files
- Backup files accumulating
Solution
# Check disk usage docker exec postgres du -sh /var/lib/postgresql/data/* # Check table bloat docker exec postgres psql -c "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC LIMIT 10;" # Run VACUUM docker exec postgres psql -c "VACUUM (VERBOSE, ANALYZE);" # Full vacuum for severe bloat (locks table!) docker exec postgres psql -c "VACUUM FULL verbose;" # Check WAL size docker exec postgres psql -c "SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));" # Clean up WAL files docker exec postgres psql -c "CHECKPOINT;" # Remove old backups find /backups -name "*.sql.gz" -mtime +7 -delete
Prevention
- Configure autovacuum properly
- Set WAL retention limits
- Partition large tables
- Monitor disk space with alerts
Issue: Replication Lag
Symptoms
- Read replicas showing stale data
- Replication slots growing
- WAL accumulating on primary
- Standby disconnections
Cause
- Network latency
- Replica overwhelmed
- Long-running transactions
- WAL generation spike
- Disk I/O bottleneck on replica
Solution
# Check replication status docker exec postgres psql -c "SELECT client_addr, state, sent_lsn, write_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;" # Check replication slots docker exec postgres psql -c "SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes FROM pg_replication_slots;" # Remove inactive slots (careful!) docker exec postgres psql -c "SELECT pg_drop_replication_slot('inactive_slot');" # Check for long transactions docker exec postgres psql -c "SELECT pid, now() - xact_start AS duration, query FROM pg_stat_activity WHERE xact_start IS NOT NULL ORDER BY xact_start LIMIT 10;" # Terminate blocking transaction docker exec postgres psql -c "SELECT pg_terminate_backend(<pid>);"
Prevention
- Monitor replication lag
- Set up alerts for lag thresholds
- Use synchronous_commit wisely
- Ensure replica has adequate resources
Issue: Database Corruption
Symptoms
- Checksum verification failures
- "could not read block" errors
- Inconsistent query results
- Crash recovery loops
Cause
- Hardware failure (disk/memory)
- Power loss during write
- Kernel bugs
- Filesystem corruption
- Out-of-memory killer
Solution
# Check for corruption docker exec postgres pg_isready docker exec postgres psql -c "SELECT count(*) FROM pg_class;" # Verify checksums (if enabled) docker exec postgres pg_checksums --check -D /var/lib/postgresql/data # Attempt repair with pg_resetwal (DANGEROUS - last resort) # Backup first! docker exec postgres pg_resetwal -D /var/lib/postgresql/data # Restore from backup pg_restore -d dbname backup.dump # Point-in-time recovery # Configure recovery.conf with recovery_target_time
Prevention
- Enable data checksums
- Use ECC memory
- Configure proper fsync settings
- Regular backups with verification
Redis Issues
Issue: Redis Not Responding
Symptoms
- Connection timeouts
- Commands hanging
- BUSY error responses
- High latency
Cause
- Redis blocked by slow command
- Memory exhaustion
- Persistence I/O blocking
- Too many connections
- Network issues
Solution
# Check Redis status docker exec redis redis-cli PING # Get server info docker exec redis redis-cli INFO # Check slow log docker exec redis redis-cli SLOWLOG GET 10 # Check connected clients docker exec redis redis-cli CLIENT LIST # Kill problematic client docker exec redis redis-cli CLIENT KILL ID <client-id> # Check memory usage docker exec redis redis-cli INFO memory # Force background save (if AOF blocking) docker exec redis redis-cli BGSAVE
Prevention
- Avoid blocking commands (KEYS *)
- Use SCAN instead of KEYS
- Configure maxmemory and eviction policy
- Monitor slow log
Issue: Redis Memory Exhaustion
Symptoms
- "OOM command not allowed" errors
- Eviction happening
- Write operations failing
- Keys disappearing unexpectedly
Cause
- No maxmemory limit set
- Inappropriate eviction policy
- Memory leak in application
- Large key values
- Too many connections
Solution
# Check memory status docker exec redis redis-cli INFO memory # Find big keys docker exec redis redis-cli --bigkeys # Analyze memory docker exec redis redis-cli MEMORY DOCTOR # Set memory limit docker exec redis redis-cli CONFIG SET maxmemory 2gb docker exec redis redis-cli CONFIG SET maxmemory-policy allkeys-lru # Delete large/unused keys docker exec redis redis-cli DEL large_key # Clear all data (careful!) docker exec redis redis-cli FLUSHALL ASYNC
Prevention
- Set maxmemory limit
- Configure appropriate eviction policy
- Monitor memory usage
- Implement key expiration (TTL)
Issue: Redis Persistence Failures
Symptoms
- Data lost after restart
- "Background saving error"
- Disk space exhausted
- AOF corruption
Cause
- Disk full
- Fork failure (not enough memory)
- Permission issues
- Corrupted AOF file
- Slow disk I/O
Solution
# Check persistence status docker exec redis redis-cli INFO persistence # Check last save status docker exec redis redis-cli LASTSAVE # Force RDB save docker exec redis redis-cli BGSAVE # Check AOF status docker exec redis redis-cli INFO persistence | grep aof # Repair corrupted AOF docker exec redis redis-check-aof --fix /data/appendonly.aof # Verify RDB file docker exec redis redis-check-rdb /data/dump.rdb
Prevention
- Monitor disk space
- Set vm.overcommit_memory=1
- Use SSD for persistence
- Regular backup verification
Issue: Redis Cluster Split Brain
Symptoms
- Nodes showing different cluster views
- Writes succeeding on multiple masters
- Data inconsistency
- Failover loops
Cause
- Network partition
- Node failure
- Insufficient quorum configuration
- Clock drift
- Unreliable network
Solution
# Check cluster status docker exec redis redis-cli CLUSTER INFO # View cluster nodes docker exec redis redis-cli CLUSTER NODES # Check for failed nodes docker exec redis redis-cli CLUSTER NODES | grep fail # Force failover docker exec redis redis-cli CLUSTER FAILOVER # Reset node (remove from cluster) docker exec redis redis-cli CLUSTER RESET # Fix cluster slot issues docker exec redis redis-cli CLUSTER FIX
Prevention
- Configure proper quorum (min 3 masters)
- Use reliable network
- Monitor cluster health
- Set appropriate timeouts
Issue: Redis Key Expiration Not Working
Symptoms
- Expired keys still accessible
- Memory not being freed
- TTL showing unexpected values
- Keys accumulating
Cause
- Passive expiration only
- Not enough memory pressure
- Clock issues
- Key being accessed (resetting TTL)
- EXPIRE command failing silently
Solution
# Check key TTL docker exec redis redis-cli TTL key_name # Force active expiration docker exec redis redis-cli DEBUG SLEEP 0 # Check expiration configuration docker exec redis redis-cli CONFIG GET hz docker exec redis redis-cli CONFIG GET activerehashing # Manually expire keys docker exec redis redis-cli EXPIRE key_name 1 # Scan and delete expired patterns docker exec redis redis-cli --scan --pattern "temp:*" | xargs docker exec redis redis-cli DEL
Prevention
- Set appropriate hz value (10-100)
- Use volatile-ttl or volatile-lru eviction
- Monitor key count trends
- Implement key naming conventions with TTL
General Database Issues
Issue: Connection Pool Exhaustion
Symptoms
- "Too many connections" errors
- Connection timeouts
- Application hanging waiting for connection
- Database rejecting new connections
Cause
- Connection leak in application
- Pool size too small
- Long-running transactions
- No connection timeout
- Spike in traffic
Solution
# PostgreSQL - check connections docker exec postgres psql -c "SELECT usename, count(*) FROM pg_stat_activity GROUP BY usename;" # Kill idle connections docker exec postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '5 minutes';" # Redis - check connections docker exec redis redis-cli CLIENT LIST | wc -l # Increase pool size # In application config: pool: max: 20 min: 5 acquireTimeout: 30000 idleTimeout: 10000 # Implement PgBouncer for PostgreSQL docker run -d --name pgbouncer \ -e DATABASE_URL="postgres://user:pass@postgres:5432/db" \ -p 6432:6432 \ pgbouncer/pgbouncer
Prevention
- Use connection pooling
- Set appropriate pool sizes
- Implement connection timeouts
- Monitor active connections
Issue: Backup/Restore Failures
Symptoms
- Backup jobs failing
- Restore taking too long
- Incomplete backup files
- Space exhaustion during backup
Cause
- Insufficient disk space
- Network timeout
- Lock contention
- Large database size
- Permission issues
Solution
# PostgreSQL backup docker exec postgres pg_dump -Fc dbname > backup.dump # Parallel backup for large DBs docker exec postgres pg_dump -Fc -j 4 dbname > backup.dump # Compressed backup docker exec postgres pg_dump dbname | gzip > backup.sql.gz # Verify backup pg_restore --list backup.dump # Restore with progress pv backup.sql.gz | gunzip | docker exec -i postgres psql dbname # Redis backup docker exec redis redis-cli BGSAVE docker cp redis:/data/dump.rdb ./redis-backup.rdb
Prevention
- Schedule backups during low-traffic periods
- Verify backups regularly
- Use incremental backups for large DBs
- Monitor backup duration and size