Skip to main content

Database Issues

Database Issues

Troubleshooting guide for PostgreSQL, Redis, and other database-related problems.


PostgreSQL Issues

Issue: Connection Refused

Symptoms

  • "Connection refused" errors
  • Applications cannot connect to database
  • Health checks failing
  • Port not listening

Cause

  1. PostgreSQL service not running
  2. Port conflict
  3. Firewall blocking connections
  4. pg_hba.conf misconfiguration
  5. Max connections exceeded

Solution

# Check if PostgreSQL is running docker ps | grep postgres docker logs postgres --tail 50 # Verify port is listening docker exec postgres pg_isready -h localhost -p 5432 # Check max connections docker exec postgres psql -c "SHOW max_connections;" docker exec postgres psql -c "SELECT count(*) FROM pg_stat_activity;" # Kill idle connections docker exec postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '10 minutes';" # Check pg_hba.conf docker exec postgres cat /var/lib/postgresql/data/pg_hba.conf # Restart PostgreSQL docker restart postgres

Prevention

  • Set appropriate max_connections
  • Implement connection pooling (PgBouncer)
  • Monitor connection count
  • Configure connection timeouts

Issue: Slow Queries

Symptoms

  • API response times increasing
  • Database CPU high
  • Lock wait timeouts
  • Application timeouts

Cause

  1. Missing indexes
  2. N+1 query patterns
  3. Large table scans
  4. Lock contention
  5. Suboptimal query plans

Solution

# Enable slow query logging docker exec postgres psql -c "ALTER SYSTEM SET log_min_duration_statement = 1000;" docker exec postgres psql -c "SELECT pg_reload_conf();" # Find slow queries docker exec postgres psql -c "SELECT query, calls, mean_time, total_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;" # Analyze query plan docker exec postgres psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = 'test@example.com';" # Check for missing indexes docker exec postgres psql -c "SELECT relname, seq_scan, idx_scan, n_live_tup FROM pg_stat_user_tables WHERE seq_scan > idx_scan ORDER BY seq_scan DESC LIMIT 10;" # Create missing index docker exec postgres psql -c "CREATE INDEX CONCURRENTLY idx_users_email ON users(email);" # Update statistics docker exec postgres psql -c "ANALYZE;"

Prevention

  • Use query analyzers in development
  • Implement database monitoring
  • Regular ANALYZE runs
  • Review query patterns regularly

Issue: Disk Space Exhaustion

Symptoms

  • "No space left on device" errors
  • Write operations failing
  • WAL files growing
  • Transaction log full

Cause

  1. Unvacuumed tables (bloat)
  2. WAL retention too high
  3. Large tables without partitioning
  4. Orphaned temp files
  5. Backup files accumulating

Solution

# Check disk usage docker exec postgres du -sh /var/lib/postgresql/data/* # Check table bloat docker exec postgres psql -c "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC LIMIT 10;" # Run VACUUM docker exec postgres psql -c "VACUUM (VERBOSE, ANALYZE);" # Full vacuum for severe bloat (locks table!) docker exec postgres psql -c "VACUUM FULL verbose;" # Check WAL size docker exec postgres psql -c "SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));" # Clean up WAL files docker exec postgres psql -c "CHECKPOINT;" # Remove old backups find /backups -name "*.sql.gz" -mtime +7 -delete

Prevention

  • Configure autovacuum properly
  • Set WAL retention limits
  • Partition large tables
  • Monitor disk space with alerts

Issue: Replication Lag

Symptoms

  • Read replicas showing stale data
  • Replication slots growing
  • WAL accumulating on primary
  • Standby disconnections

Cause

  1. Network latency
  2. Replica overwhelmed
  3. Long-running transactions
  4. WAL generation spike
  5. Disk I/O bottleneck on replica

Solution

# Check replication status docker exec postgres psql -c "SELECT client_addr, state, sent_lsn, write_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;" # Check replication slots docker exec postgres psql -c "SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes FROM pg_replication_slots;" # Remove inactive slots (careful!) docker exec postgres psql -c "SELECT pg_drop_replication_slot('inactive_slot');" # Check for long transactions docker exec postgres psql -c "SELECT pid, now() - xact_start AS duration, query FROM pg_stat_activity WHERE xact_start IS NOT NULL ORDER BY xact_start LIMIT 10;" # Terminate blocking transaction docker exec postgres psql -c "SELECT pg_terminate_backend(<pid>);"

Prevention

  • Monitor replication lag
  • Set up alerts for lag thresholds
  • Use synchronous_commit wisely
  • Ensure replica has adequate resources

Issue: Database Corruption

Symptoms

  • Checksum verification failures
  • "could not read block" errors
  • Inconsistent query results
  • Crash recovery loops

Cause

  1. Hardware failure (disk/memory)
  2. Power loss during write
  3. Kernel bugs
  4. Filesystem corruption
  5. Out-of-memory killer

Solution

# Check for corruption docker exec postgres pg_isready docker exec postgres psql -c "SELECT count(*) FROM pg_class;" # Verify checksums (if enabled) docker exec postgres pg_checksums --check -D /var/lib/postgresql/data # Attempt repair with pg_resetwal (DANGEROUS - last resort) # Backup first! docker exec postgres pg_resetwal -D /var/lib/postgresql/data # Restore from backup pg_restore -d dbname backup.dump # Point-in-time recovery # Configure recovery.conf with recovery_target_time

Prevention

  • Enable data checksums
  • Use ECC memory
  • Configure proper fsync settings
  • Regular backups with verification

Redis Issues

Issue: Redis Not Responding

Symptoms

  • Connection timeouts
  • Commands hanging
  • BUSY error responses
  • High latency

Cause

  1. Redis blocked by slow command
  2. Memory exhaustion
  3. Persistence I/O blocking
  4. Too many connections
  5. Network issues

Solution

# Check Redis status docker exec redis redis-cli PING # Get server info docker exec redis redis-cli INFO # Check slow log docker exec redis redis-cli SLOWLOG GET 10 # Check connected clients docker exec redis redis-cli CLIENT LIST # Kill problematic client docker exec redis redis-cli CLIENT KILL ID <client-id> # Check memory usage docker exec redis redis-cli INFO memory # Force background save (if AOF blocking) docker exec redis redis-cli BGSAVE

Prevention

  • Avoid blocking commands (KEYS *)
  • Use SCAN instead of KEYS
  • Configure maxmemory and eviction policy
  • Monitor slow log

Issue: Redis Memory Exhaustion

Symptoms

  • "OOM command not allowed" errors
  • Eviction happening
  • Write operations failing
  • Keys disappearing unexpectedly

Cause

  1. No maxmemory limit set
  2. Inappropriate eviction policy
  3. Memory leak in application
  4. Large key values
  5. Too many connections

Solution

# Check memory status docker exec redis redis-cli INFO memory # Find big keys docker exec redis redis-cli --bigkeys # Analyze memory docker exec redis redis-cli MEMORY DOCTOR # Set memory limit docker exec redis redis-cli CONFIG SET maxmemory 2gb docker exec redis redis-cli CONFIG SET maxmemory-policy allkeys-lru # Delete large/unused keys docker exec redis redis-cli DEL large_key # Clear all data (careful!) docker exec redis redis-cli FLUSHALL ASYNC

Prevention

  • Set maxmemory limit
  • Configure appropriate eviction policy
  • Monitor memory usage
  • Implement key expiration (TTL)

Issue: Redis Persistence Failures

Symptoms

  • Data lost after restart
  • "Background saving error"
  • Disk space exhausted
  • AOF corruption

Cause

  1. Disk full
  2. Fork failure (not enough memory)
  3. Permission issues
  4. Corrupted AOF file
  5. Slow disk I/O

Solution

# Check persistence status docker exec redis redis-cli INFO persistence # Check last save status docker exec redis redis-cli LASTSAVE # Force RDB save docker exec redis redis-cli BGSAVE # Check AOF status docker exec redis redis-cli INFO persistence | grep aof # Repair corrupted AOF docker exec redis redis-check-aof --fix /data/appendonly.aof # Verify RDB file docker exec redis redis-check-rdb /data/dump.rdb

Prevention

  • Monitor disk space
  • Set vm.overcommit_memory=1
  • Use SSD for persistence
  • Regular backup verification

Issue: Redis Cluster Split Brain

Symptoms

  • Nodes showing different cluster views
  • Writes succeeding on multiple masters
  • Data inconsistency
  • Failover loops

Cause

  1. Network partition
  2. Node failure
  3. Insufficient quorum configuration
  4. Clock drift
  5. Unreliable network

Solution

# Check cluster status docker exec redis redis-cli CLUSTER INFO # View cluster nodes docker exec redis redis-cli CLUSTER NODES # Check for failed nodes docker exec redis redis-cli CLUSTER NODES | grep fail # Force failover docker exec redis redis-cli CLUSTER FAILOVER # Reset node (remove from cluster) docker exec redis redis-cli CLUSTER RESET # Fix cluster slot issues docker exec redis redis-cli CLUSTER FIX

Prevention

  • Configure proper quorum (min 3 masters)
  • Use reliable network
  • Monitor cluster health
  • Set appropriate timeouts

Issue: Redis Key Expiration Not Working

Symptoms

  • Expired keys still accessible
  • Memory not being freed
  • TTL showing unexpected values
  • Keys accumulating

Cause

  1. Passive expiration only
  2. Not enough memory pressure
  3. Clock issues
  4. Key being accessed (resetting TTL)
  5. EXPIRE command failing silently

Solution

# Check key TTL docker exec redis redis-cli TTL key_name # Force active expiration docker exec redis redis-cli DEBUG SLEEP 0 # Check expiration configuration docker exec redis redis-cli CONFIG GET hz docker exec redis redis-cli CONFIG GET activerehashing # Manually expire keys docker exec redis redis-cli EXPIRE key_name 1 # Scan and delete expired patterns docker exec redis redis-cli --scan --pattern "temp:*" | xargs docker exec redis redis-cli DEL

Prevention

  • Set appropriate hz value (10-100)
  • Use volatile-ttl or volatile-lru eviction
  • Monitor key count trends
  • Implement key naming conventions with TTL

General Database Issues

Issue: Connection Pool Exhaustion

Symptoms

  • "Too many connections" errors
  • Connection timeouts
  • Application hanging waiting for connection
  • Database rejecting new connections

Cause

  1. Connection leak in application
  2. Pool size too small
  3. Long-running transactions
  4. No connection timeout
  5. Spike in traffic

Solution

# PostgreSQL - check connections docker exec postgres psql -c "SELECT usename, count(*) FROM pg_stat_activity GROUP BY usename;" # Kill idle connections docker exec postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '5 minutes';" # Redis - check connections docker exec redis redis-cli CLIENT LIST | wc -l # Increase pool size # In application config: pool: max: 20 min: 5 acquireTimeout: 30000 idleTimeout: 10000 # Implement PgBouncer for PostgreSQL docker run -d --name pgbouncer \ -e DATABASE_URL="postgres://user:pass@postgres:5432/db" \ -p 6432:6432 \ pgbouncer/pgbouncer

Prevention

  • Use connection pooling
  • Set appropriate pool sizes
  • Implement connection timeouts
  • Monitor active connections

Issue: Backup/Restore Failures

Symptoms

  • Backup jobs failing
  • Restore taking too long
  • Incomplete backup files
  • Space exhaustion during backup

Cause

  1. Insufficient disk space
  2. Network timeout
  3. Lock contention
  4. Large database size
  5. Permission issues

Solution

# PostgreSQL backup docker exec postgres pg_dump -Fc dbname > backup.dump # Parallel backup for large DBs docker exec postgres pg_dump -Fc -j 4 dbname > backup.dump # Compressed backup docker exec postgres pg_dump dbname | gzip > backup.sql.gz # Verify backup pg_restore --list backup.dump # Restore with progress pv backup.sql.gz | gunzip | docker exec -i postgres psql dbname # Redis backup docker exec redis redis-cli BGSAVE docker cp redis:/data/dump.rdb ./redis-backup.rdb

Prevention

  • Schedule backups during low-traffic periods
  • Verify backups regularly
  • Use incremental backups for large DBs
  • Monitor backup duration and size


Back to Agent Issues | Troubleshooting Home