From Wikitech
Jump to navigation Jump to search

Kubernetes has a lot of moving parts and offers wide configuration options. So misconfiguration can happen. This page should help to troubleshoot some error cases.

Most of the time, errors in deployments are caught by helmfile. If a new deployment is unable to become ready, helmfile will roll back automatically after a timeout (currently 300 seconds). If your deployment takes a long time and fails after a timeout, then it is helpful to start another SSH session and troubleshoot the service components. Make sure to make yourself familiar with the usage of kubectl.

The following troubleshooting flowchart should give a rough guideline on how to find errors in Kubernetes service deployments. The flowchart is intended for the production Kubernetes platform, not Toolforge.

Production Kubernetes troubleshooting flowchart

Additional Resources