Portal:Toolforge/Admin/Runbooks/HarborDown
Appearance
This happens when the prometheus host is not able to do an https request to the harbor instance.
The procedures in this runbook require admin permissions to complete.
Error / Incident
This usually comes in the form of an alert in alertmanager.
There you will get which project (tools, toolsbeta, ...) is the one it's failing for, and the url of the harbor instance that fails.
Note that this request goes through the proxies and it might be an issue there instead of just harbor.
Debugging
- You can ssh to the harbor instance directly and check there how/if it's running (use the instance of the project the alert is from):
me@local$ ssh tools-harbor-1.tools.eqiad1.wikimedia.cloud me@tools-harbor-1$ sudo -i root@tools-harbor-1:~# cd /srv/ops/harbor/ root@tools-harbor-1:/srv/ops/harbor# docker-compose ps Name Command State Ports ---------------------------------------------------------------------------------------------------------------- harbor-core /harbor/entrypoint.sh Up (healthy) harbor-exporter /harbor/entrypoint.sh Up harbor-jobservice /harbor/entrypoint.sh Up (healthy) harbor-log /bin/sh -c /usr/local/bin/ ... Up (healthy) 127.0.0.1:1514->10514/tcp harbor-portal nginx -g daemon off; Up (healthy) nginx nginx -g daemon off; Up (healthy) 0.0.0.0:80->8080/tcp, 0.0.0.0:9090->9090/tcp redis redis-server /etc/redis.conf Up (healthy) registry /home/harbor/entrypoint.sh Up (healthy) registryctl /home/harbor/start.sh Up (healthy)
- You can try to restart/start it again, with
docker-compose restart
anddocker-compose up -d
. - You can also check the logs of each component with
docker logs harbor-portal
, where harbor-portal is the name of the component. - Logs may also be found in /var/log/harbor/
Common issues
Add new issues here when you encounter them!
Issue 1
...
Related information
Old incidents
- T354714 - Trove DB filled disk and caused toolforge-build to fail as a result