Kubernetes Troubleshooting
Kubernetes Troubleshooting
Common Kubernetes issues and solutions for the LLM platform.
Pod Issues
Pod Not Starting
Check pod status:
kubectl get pods -n agents kubectl describe pod <pod-name> -n agents kubectl logs <pod-name> -n agents
Common causes:
- Image pull error
# Check image kubectl describe pod <pod-name> -n agents | grep Image # Fix: Use correct image kubectl set image deployment/<name> container=correct-image -n agents
- Insufficient resources
# Check resources kubectl describe nodes # Fix: Adjust resource requests kubectl edit deployment <name> -n agents
- Failed health checks
# Check probe configuration kubectl describe pod <pod-name> -n agents | grep -A 10 Liveness # Fix: Adjust probe timings
Service Issues
Service Not Accessible
# Check service kubectl get svc -n agents kubectl describe svc <service-name> -n agents # Check endpoints kubectl get endpoints <service-name> -n agents # Test from within cluster kubectl run test --rm -it --image=curlimages/curl -- sh curl http://<service-name>.<namespace>.svc.cluster.local
Deployment Issues
Deployment Stuck
# Check deployment status kubectl rollout status deployment/<name> -n agents # Check events kubectl get events -n agents --sort-by='.lastTimestamp' # Rollback if needed kubectl rollout undo deployment/<name> -n agents
Network Issues
Pod to Pod Communication
# Test connectivity kubectl exec -it <pod-1> -n agents -- ping <pod-2-ip> # Check network policy kubectl get networkpolicies -n agents # Check CNI logs kubectl logs -n kube-system <cni-pod>
Storage Issues
PVC Not Binding
# Check PVC status kubectl get pvc -n agents # Check PV availability kubectl get pv # Describe PVC for events kubectl describe pvc <pvc-name> -n agents
Resource Limits
OOMKilled
Problem: Pod killed due to out of memory.
# Check pod events kubectl describe pod <pod-name> -n agents # Increase memory limits kubectl patch deployment <name> -n agents -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","resources":{"limits":{"memory":"2Gi"}}}]}}}}'