Portal:Toolforge/Admin/Harbor/maintain-harbor
maintain-harbor is a cli application used to perform house-keeping tasks on our Harbor installation. Currently, these house-keeping tasks include:
- Deleting harbor tool projects with no repositories
- Creating or updating harbor tool projects image retention policies
- Stale harbor image cleanup for core Toolforge repositories like jobs-api and builds-api
The purpose of the above tasks is to keep the size of our harbor installation manageable and without these, our harbor installation will quickly overgrow the storage available to its host VM.
The code for maintain-harbor currently lives in wikimedia gitlab
Key concepts
Harbor Tool projects
These are harbor projects that are used to store the images generated from a buildservice build while authenticated as a particular toolforge tool. This harbor tool project is named after the toolforge tool whose images it stores.
Harbor image retention policies
These are harbor objects that Harbor uses to decide how to handle the images in a harbor tool project's repositories. For example an image retention policy can be used to tell harbor to delete all except the latest 5 images in each of the repositories of a harbor tool project.
Harbor Image Immutability policies
Harbor image immutability policies are harbor objects used to prevent the deletion of the images in a harbor tool project's repositories. One important thing to remember about these is that they supersede image retention policies and when configured for a project ensure that all affected images cannot be deleted, not even by an image retention policy.
Delete harbor tool projects with no repositories
This task helps to delete harbor tool projects when all their repositories and images are deleted. This prevents unnecessary bloat.
Create or update Harbor tool project's image retention policy
When a harbor tool project is first created, it has no image retention policy configured. It is the responsibility of this task to create such a policy for each project and to update the policies of already existing projects if we decide to make changes to existing image retention policies.
Stale harbor image cleanup for core Toolforge repositories
We use Harbor to store the images of core toolforge repositories like builds-api for CI/CD purposes. These are grouped under the toolforge harbor project and have image immutability policy configured hence can't be cleaned up using a simple harbor image retention policy. This task is used to clean up these images by disabling the toolforge harbor project's image immutability policy, then running its own custom image cleanup algorithm for each repository in the project before enabling the immutability policy again.
How it's setup
To perform the housekeeping tasks implemented by maintain-harbor, you interact with maintain-harbor on the command line as a CLI. maintain-harbor is currently deployed as three separate kubernetes cronjobs in the maintain-harbor
namespace performing the three tasks listed above.
If you are a toolforge admin, you can view these cronjobs by doing the following:
user@ubuntu:~/Desktop/maintain-harbor$ ssh login.toolforge.org
...
user@tools-bastion-NN:~$ kubectl sudo get cronjobs -n maintain-harbor
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mh--delete-empty-tool-projects-cron 23 8 * * * False 0 10h 33d
mh--delete-stale-toolforge-artifacts-cron 0 8 * * 3 False 0 5d10h 33d
mh--manage-image-retention-cron 23 10 * * * False 0 8h 33d
user@tools-k8s-control-7:~$
The maintain-harbor cronjobs are currently named accordingly:
mh--delete-empty-tool-projects-cron
mh--delete-stale-toolforge-artifacts-cron
mh--manage-image-retention-cron
Common tasks
Deploy maintain-harbor
To deploy maintain-harbor,
- ssh into cloudcumin host and deploy using the toolforge component deploy cookbook:The above command deploys the latest code in the maintain-harbor repository as kubernetes cronjobs in the toolsbeta cluster. To deploy to tools cluster, use
user@ubuntu:~$ ssh cloudcumin1001.eqiad.wmnet ... user@cloudcumin1001:~$ sudo cookbook wmcs.toolforge.component.deploy --cluster-name toolsbeta --component maintain-harbor
--cluster-name tools
Checking the logs
We don't currently have a way to persistent the logs since we started using kubernetes cronjobs directly to deploy maintain-harbor. You can follow the task T383081 for any updates.
So the only way to see the logs is to use kubectl logs ...
on the cronjob pod while it exists (you can restart the cronjob to force a run).
Manual execution
If you wish to manually perform any of the maintain-harbor tasks without using kubernetes cronjobs to schedule it:
- clone the github repository:
user@ubuntu:~$ git clone https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor.git $HOME/maintain-harbor
- After this you need to create a python virtual environment using the
$HOME/maintain-harbor/requirements.txt
file. For information about creating one, see https://docs.python.org/3/tutorial/venv.html - Next step is to create a
$HOME/maintain-harbor/.env
config file. You can see the expected config file format by running:user@ubuntu:~$ cd $HOME/maintain-harbor && $HOME/maintain-harbor/venv/bin/python3 -m src.maintain_harbor --show-config maintain_harbor_environment=toolsbeta maintain_harbor_auth_password=my_password maintain_harbor_auth_username=my_username maintain_harbor_base_harbor_api=https://harbor.domain.name/api/v2.0 maintain_harbor_do_retentions=true maintain_harbor_toolforge_repo_artifact_limit=2 user@ubuntu:~/maintain-harbor$
- To run a single maintain-harbor task, execute the following:
user@ubuntu:~$ cd $HOME/maintain-harbor && $HOME/maintain-harbor/venv/bin/python3 -m src.maintain_harbor --config $HOME/.env delete-empty-tool-projects