Jump to content

User:Jobo~labswiki/runbook test/Kubernetes/Troubleshooting

From Wikitech

Decision Tree

kubectl get pod

Scenario 1: The pod is pending

STEP: kubectl describe pod <pod-name>

DECISION POINT - are you hitting resource or quota limits

  • If yes, go to 1.1
  • If no but pod is waiting for volumes, go to 1.2
  • If something else go to 1.3

1.1 Hitting resource limits

STEP: Check and lower resource requests and limits

1.2 The pod waiting for Persistent Volumes

STEP: Remove PVC, they are not suppoeted in production cluster yet

1.3 Something else

STEP: kubectl describe pod -o wide

DECISION POINT - Is the Pod scheduled to a node?

  • If yes, go to 1.3.1
  • If no, go to 1.3.2

1.3.1 Pod is scheduled to a node

STEP: There might be an issue with the Kubelet. Inform ServiceOps

1.3.2 Pod is not is scheduled to a node

STEP: There might be an issue with the Scheduler. Inform ServiceOps

Scenario 2: The pod is runnig

DECISION POINT - The pod is running?

  • If yes, go to 2.1
  • If no, go to 2.2

2.1 Pod is running

STEP: kubectl logs <pod-name>

DECISION POINT - Can you see application logs?

  • If yes, go to 2.1.1
  • If no, go to 2.1.2

2.1.1 Application log available

Fix application issue causing the error log

2.1.2 Application log unavailable

STEP:kubectl get pod

DECISION POINT - Is the Pod in ImagePullBack Off?

  • If yes, go to 2.1.2.1
  • If no, go to 2.1.2.2

2.1.2.1 Application in ImagePullBack Off

STEP: Check if is the image name correct, if not fix image name STEP: Check if the image tag is correct, if not fix image tag STEP: Check if the image registry is correct, if not fix the image registry

2.1.2.2 The Pod not in ImagePullBack Off

STEP: Check if the Pod is in CrashLoopBack Off. STEP: Check application logs and fix crashing application