Database Issues

Troubleshooting guide for PostgreSQL, Redis, and other database-related problems.

PostgreSQL Issues

Issue: Connection Refused

Symptoms

"Connection refused" errors
Applications cannot connect to database
Health checks failing
Port not listening

Cause

PostgreSQL service not running
Port conflict
Firewall blocking connections
pg_hba.conf misconfiguration
Max connections exceeded

Solution

# Check if PostgreSQL is running
docker ps | grep postgres
docker logs postgres --tail 50

# Verify port is listening
docker exec postgres pg_isready -h localhost -p 5432

# Check max connections
docker exec postgres psql -c "SHOW max_connections;"
docker exec postgres psql -c "SELECT count(*) FROM pg_stat_activity;"

# Kill idle connections
docker exec postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '10 minutes';"

# Check pg_hba.conf
docker exec postgres cat /var/lib/postgresql/data/pg_hba.conf

# Restart PostgreSQL
docker restart postgres

Prevention

Set appropriate max_connections
Implement connection pooling (PgBouncer)
Monitor connection count
Configure connection timeouts

Issue: Slow Queries

Symptoms

API response times increasing
Database CPU high
Lock wait timeouts
Application timeouts

Cause

Missing indexes
N+1 query patterns
Large table scans
Lock contention
Suboptimal query plans

Solution

# Enable slow query logging
docker exec postgres psql -c "ALTER SYSTEM SET log_min_duration_statement = 1000;"
docker exec postgres psql -c "SELECT pg_reload_conf();"

# Find slow queries
docker exec postgres psql -c "SELECT query, calls, mean_time, total_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"

# Analyze query plan
docker exec postgres psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = 'test@example.com';"

# Check for missing indexes
docker exec postgres psql -c "SELECT relname, seq_scan, idx_scan, n_live_tup FROM pg_stat_user_tables WHERE seq_scan > idx_scan ORDER BY seq_scan DESC LIMIT 10;"

# Create missing index
docker exec postgres psql -c "CREATE INDEX CONCURRENTLY idx_users_email ON users(email);"

# Update statistics
docker exec postgres psql -c "ANALYZE;"

Prevention

Use query analyzers in development
Implement database monitoring
Regular ANALYZE runs
Review query patterns regularly

Issue: Disk Space Exhaustion

Symptoms

"No space left on device" errors
Write operations failing
WAL files growing
Transaction log full

Cause

Unvacuumed tables (bloat)
WAL retention too high
Large tables without partitioning
Orphaned temp files
Backup files accumulating

Solution

# Check disk usage
docker exec postgres du -sh /var/lib/postgresql/data/*

# Check table bloat
docker exec postgres psql -c "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC LIMIT 10;"

# Run VACUUM
docker exec postgres psql -c "VACUUM (VERBOSE, ANALYZE);"

# Full vacuum for severe bloat (locks table!)
docker exec postgres psql -c "VACUUM FULL verbose;"

# Check WAL size
docker exec postgres psql -c "SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));"

# Clean up WAL files
docker exec postgres psql -c "CHECKPOINT;"

# Remove old backups
find /backups -name "*.sql.gz" -mtime +7 -delete

Prevention

Configure autovacuum properly
Set WAL retention limits
Partition large tables
Monitor disk space with alerts

Issue: Replication Lag

Symptoms

Read replicas showing stale data
Replication slots growing
WAL accumulating on primary
Standby disconnections

Cause

Network latency
Replica overwhelmed
Long-running transactions
WAL generation spike
Disk I/O bottleneck on replica

Solution

# Check replication status
docker exec postgres psql -c "SELECT client_addr, state, sent_lsn, write_lsn, replay_lsn, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;"

# Check replication slots
docker exec postgres psql -c "SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes FROM pg_replication_slots;"

# Remove inactive slots (careful!)
docker exec postgres psql -c "SELECT pg_drop_replication_slot('inactive_slot');"

# Check for long transactions
docker exec postgres psql -c "SELECT pid, now() - xact_start AS duration, query FROM pg_stat_activity WHERE xact_start IS NOT NULL ORDER BY xact_start LIMIT 10;"

# Terminate blocking transaction
docker exec postgres psql -c "SELECT pg_terminate_backend(<pid>);"

Prevention

Monitor replication lag
Set up alerts for lag thresholds
Use synchronous_commit wisely
Ensure replica has adequate resources

Issue: Database Corruption

Symptoms

Checksum verification failures
"could not read block" errors
Inconsistent query results
Crash recovery loops

Cause

Hardware failure (disk/memory)
Power loss during write
Kernel bugs
Filesystem corruption
Out-of-memory killer

Solution

# Check for corruption
docker exec postgres pg_isready
docker exec postgres psql -c "SELECT count(*) FROM pg_class;"

# Verify checksums (if enabled)
docker exec postgres pg_checksums --check -D /var/lib/postgresql/data

# Attempt repair with pg_resetwal (DANGEROUS - last resort)
# Backup first!
docker exec postgres pg_resetwal -D /var/lib/postgresql/data

# Restore from backup
pg_restore -d dbname backup.dump

# Point-in-time recovery
# Configure recovery.conf with recovery_target_time

Prevention

Enable data checksums
Use ECC memory
Configure proper fsync settings
Regular backups with verification

Redis Issues

Issue: Redis Not Responding

Symptoms

Connection timeouts
Commands hanging
BUSY error responses
High latency

Cause

Redis blocked by slow command
Memory exhaustion
Persistence I/O blocking
Too many connections
Network issues

Solution

# Check Redis status
docker exec redis redis-cli PING

# Get server info
docker exec redis redis-cli INFO

# Check slow log
docker exec redis redis-cli SLOWLOG GET 10

# Check connected clients
docker exec redis redis-cli CLIENT LIST

# Kill problematic client
docker exec redis redis-cli CLIENT KILL ID <client-id>

# Check memory usage
docker exec redis redis-cli INFO memory

# Force background save (if AOF blocking)
docker exec redis redis-cli BGSAVE

Prevention

Avoid blocking commands (KEYS *)
Use SCAN instead of KEYS
Configure maxmemory and eviction policy
Monitor slow log

Issue: Redis Memory Exhaustion

Symptoms

"OOM command not allowed" errors
Eviction happening
Write operations failing
Keys disappearing unexpectedly

Cause

No maxmemory limit set
Inappropriate eviction policy
Memory leak in application
Large key values
Too many connections

Solution

# Check memory status
docker exec redis redis-cli INFO memory

# Find big keys
docker exec redis redis-cli --bigkeys

# Analyze memory
docker exec redis redis-cli MEMORY DOCTOR

# Set memory limit
docker exec redis redis-cli CONFIG SET maxmemory 2gb
docker exec redis redis-cli CONFIG SET maxmemory-policy allkeys-lru

# Delete large/unused keys
docker exec redis redis-cli DEL large_key

# Clear all data (careful!)
docker exec redis redis-cli FLUSHALL ASYNC

Prevention

Set maxmemory limit
Configure appropriate eviction policy
Monitor memory usage
Implement key expiration (TTL)

Issue: Redis Persistence Failures

Symptoms

Data lost after restart
"Background saving error"
Disk space exhausted
AOF corruption

Cause

Disk full
Fork failure (not enough memory)
Permission issues
Corrupted AOF file
Slow disk I/O

Solution

# Check persistence status
docker exec redis redis-cli INFO persistence

# Check last save status
docker exec redis redis-cli LASTSAVE

# Force RDB save
docker exec redis redis-cli BGSAVE

# Check AOF status
docker exec redis redis-cli INFO persistence | grep aof

# Repair corrupted AOF
docker exec redis redis-check-aof --fix /data/appendonly.aof

# Verify RDB file
docker exec redis redis-check-rdb /data/dump.rdb

Prevention

Monitor disk space
Set vm.overcommit_memory=1
Use SSD for persistence
Regular backup verification

Issue: Redis Cluster Split Brain

Symptoms

Nodes showing different cluster views
Writes succeeding on multiple masters
Data inconsistency
Failover loops

Cause

Network partition
Node failure
Insufficient quorum configuration
Clock drift
Unreliable network

Solution

# Check cluster status
docker exec redis redis-cli CLUSTER INFO

# View cluster nodes
docker exec redis redis-cli CLUSTER NODES

# Check for failed nodes
docker exec redis redis-cli CLUSTER NODES | grep fail

# Force failover
docker exec redis redis-cli CLUSTER FAILOVER

# Reset node (remove from cluster)
docker exec redis redis-cli CLUSTER RESET

# Fix cluster slot issues
docker exec redis redis-cli CLUSTER FIX

Prevention

Configure proper quorum (min 3 masters)
Use reliable network
Monitor cluster health
Set appropriate timeouts

Issue: Redis Key Expiration Not Working

Symptoms

Expired keys still accessible
Memory not being freed
TTL showing unexpected values
Keys accumulating

Cause

Passive expiration only
Not enough memory pressure
Clock issues
Key being accessed (resetting TTL)
EXPIRE command failing silently

Solution

# Check key TTL
docker exec redis redis-cli TTL key_name

# Force active expiration
docker exec redis redis-cli DEBUG SLEEP 0

# Check expiration configuration
docker exec redis redis-cli CONFIG GET hz
docker exec redis redis-cli CONFIG GET activerehashing

# Manually expire keys
docker exec redis redis-cli EXPIRE key_name 1

# Scan and delete expired patterns
docker exec redis redis-cli --scan --pattern "temp:*" | xargs docker exec redis redis-cli DEL

Prevention

Set appropriate hz value (10-100)
Use volatile-ttl or volatile-lru eviction
Monitor key count trends
Implement key naming conventions with TTL

General Database Issues

Issue: Connection Pool Exhaustion

Symptoms

"Too many connections" errors
Connection timeouts
Application hanging waiting for connection
Database rejecting new connections

Cause

Connection leak in application
Pool size too small
Long-running transactions
No connection timeout
Spike in traffic

Solution

# PostgreSQL - check connections
docker exec postgres psql -c "SELECT usename, count(*) FROM pg_stat_activity GROUP BY usename;"

# Kill idle connections
docker exec postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '5 minutes';"

# Redis - check connections
docker exec redis redis-cli CLIENT LIST | wc -l

# Increase pool size
# In application config:
pool:
  max: 20
  min: 5
  acquireTimeout: 30000
  idleTimeout: 10000

# Implement PgBouncer for PostgreSQL
docker run -d --name pgbouncer \
  -e DATABASE_URL="postgres://user:pass@postgres:5432/db" \
  -p 6432:6432 \
  pgbouncer/pgbouncer

Prevention

Use connection pooling
Set appropriate pool sizes
Implement connection timeouts
Monitor active connections

Issue: Backup/Restore Failures

Symptoms

Backup jobs failing
Restore taking too long
Incomplete backup files
Space exhaustion during backup

Cause

Insufficient disk space
Network timeout
Lock contention
Large database size
Permission issues

Solution

# PostgreSQL backup
docker exec postgres pg_dump -Fc dbname > backup.dump

# Parallel backup for large DBs
docker exec postgres pg_dump -Fc -j 4 dbname > backup.dump

# Compressed backup
docker exec postgres pg_dump dbname | gzip > backup.sql.gz

# Verify backup
pg_restore --list backup.dump

# Restore with progress
pv backup.sql.gz | gunzip | docker exec -i postgres psql dbname

# Redis backup
docker exec redis redis-cli BGSAVE
docker cp redis:/data/dump.rdb ./redis-backup.rdb

Prevention

Schedule backups during low-traffic periods
Verify backups regularly
Use incremental backups for large DBs
Monitor backup duration and size

Back to Agent Issues | Troubleshooting Home

Database Issues

Database Issues

PostgreSQL Issues

Issue: Connection Refused

Symptoms

Cause

Solution

Prevention

Issue: Slow Queries

Symptoms

Cause

Solution

Prevention

Issue: Disk Space Exhaustion

Symptoms

Cause

Solution

Prevention

Issue: Replication Lag

Symptoms

Cause

Solution

Prevention

Issue: Database Corruption

Symptoms

Cause

Solution

Prevention

Redis Issues

Issue: Redis Not Responding

Symptoms

Cause

Solution

Prevention

Issue: Redis Memory Exhaustion

Symptoms

Cause

Solution

Prevention

Issue: Redis Persistence Failures

Symptoms

Cause

Solution

Prevention

Issue: Redis Cluster Split Brain

Symptoms

Cause

Solution

Prevention

Issue: Redis Key Expiration Not Working

Symptoms

Cause

Solution

Prevention

General Database Issues

Issue: Connection Pool Exhaustion

Symptoms

Cause

Solution

Prevention

Issue: Backup/Restore Failures

Symptoms

Cause

Solution

Prevention

Related Documentation