Jump to content

Portal:Toolforge/Admin/Runbooks/IstioGatewayPodMisplaced

From Wikitech
The procedures in this runbook require admin permissions to complete.

The IstioGatewayPodMisplaced alert fires when a Toolforge Istio gateway pod is running on a non-gateway worker.

This issue generally happens if the gateway pods need to be replaced for whatever reason, as the pods are sized so that only one of them fits on a single worker.

The related IngressPodMisplaced alert would fire for the old ingress-nginx deployment for the same reasons until it is decomissioned.

Debugging

Check where the pods are running:

user@tools-bastion-NN:~ $ kubectl get pod -n istio-gateway -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP                NODE                           NOMINATED NODE   READINESS GATES
toolforge-istio-85f7bff487-54fjg   1/1     Running   0          35m   192.168.88.132    toolsbeta-test-k8s-gateway-1   <none>           <none>
toolforge-istio-85f7bff487-cdm96   1/1     Running   0          36m   192.168.179.3     toolsbeta-test-k8s-worker-12   <none>           <none>
toolforge-istio-85f7bff487-hbbf2   1/1     Running   0          35m   192.168.234.197   toolsbeta-test-k8s-gateway-2   <none>           <none>

Common issues

The simple fix is to delete the pod running on a non-gateway worker, at which point Kubernetes should re-create it on the correct node:

user@tools-bastion-NN:~ $ kubectl sudo delete pod -n istio-gateway toolforge-istio-85f7bff487-cdm96

If the replacement pod also gets scheduled on an incorrect node, you need to investigate further.

Support contacts

Old incidents