User:Jobo~labswiki/runbook test/Kubernetes/Troubleshooting
Decision Tree
kubectl get pod
Scenario 1: The pod is pending
STEP: kubectl describe pod <pod-name>
DECISION POINT - are you hitting resource or quota limits
- If yes, go to 1.1
- If no but pod is waiting for volumes, go to 1.2
- If something else go to 1.3
1.1 Hitting resource limits
STEP: Check and lower resource requests and limits
1.2 The pod waiting for Persistent Volumes
STEP: Remove PVC, they are not suppoeted in production cluster yet
1.3 Something else
STEP: kubectl describe pod -o wide
DECISION POINT - Is the Pod scheduled to a node?
- If yes, go to 1.3.1
- If no, go to 1.3.2
1.3.1 Pod is scheduled to a node
STEP: There might be an issue with the Kubelet. Inform ServiceOps
1.3.2 Pod is not is scheduled to a node
STEP: There might be an issue with the Scheduler. Inform ServiceOps
Scenario 2: The pod is runnig
DECISION POINT - The pod is running?
- If yes, go to 2.1
- If no, go to 2.2
2.1 Pod is running
STEP: kubectl logs <pod-name>
DECISION POINT - Can you see application logs?
- If yes, go to 2.1.1
- If no, go to 2.1.2
2.1.1 Application log available
Fix application issue causing the error log
2.1.2 Application log unavailable
STEP:kubectl get pod
DECISION POINT - Is the Pod in ImagePullBack Off?
- If yes, go to 2.1.2.1
- If no, go to 2.1.2.2
2.1.2.1 Application in ImagePullBack Off
STEP: Check if is the image name correct, if not fix image name STEP: Check if the image tag is correct, if not fix image tag STEP: Check if the image registry is correct, if not fix the image registry
2.1.2.2 The Pod not in ImagePullBack Off
STEP: Check if the Pod is in CrashLoopBack Off. STEP: Check application logs and fix crashing application