Jump to content

Portal:Toolforge/Admin/Runbooks/IngressPodMisplaced

From Wikitech
The procedures in this runbook require admin permissions to complete.

The IngressPodMisplaced alert fires when a Toolforge ingress-nginx pod is running on a non-ingress worker.

This issue generally happens if the ingress-nginx pods need to be replaced for whatever reason, as the ingress pods are sized so that only one of them fits on a single worker.

Debugging

Check where the pods are running:

user@tools-bastion-NN:~ $ kubectl get pod -n ingress-nginx-gen2 -o wide
NAME                                             READY   STATUS    RESTARTS   AGE   IP                NODE                   NOMINATED NODE   READINESS GATES
ingress-nginx-gen2-controller-6967c4b878-9h9wv   1/1     Running   0          15d   192.168.166.63    tools-k8s-ingress-8    <none>           <none>
ingress-nginx-gen2-controller-6967c4b878-hc4zv   1/1     Running   0          15d   192.168.36.120    tools-k8s-worker-105   <none>           <none>
ingress-nginx-gen2-controller-6967c4b878-z5wtz   1/1     Running   0          15d   192.168.254.210   tools-k8s-ingress-9    <none>           <none>

Common issues

The simple fix is to delete the pod running on a non-ingress worker, at which point Kubernetes should re-create it on the correct node:

user@tools-bastion-NN:~ $ kubectl sudo delete pod -n ingress-nginx-gen2 ingress-nginx-gen2-controller-6967c4b878-hc4zv

If the replacement pod also gets scheduled on an incorrect node, you need to investigate further.

Support contacts

Old incidents