Jump to content

Portal:Toolforge/Admin/Harbor/maintain-harbor

From Wikitech

maintain-harbor is a cli application used to perform house-keeping tasks on our Harbor installation. Currently, these house-keeping tasks include:

The purpose of the above tasks is to keep the size of our harbor installation manageable and without these, our harbor installation will quickly overgrow the storage available to its host VM.

The code for maintain-harbor currently lives in wikimedia gitlab

Key concepts

Harbor Tool projects

These are harbor projects that are used to store the images generated from a buildservice build while authenticated as a particular toolforge tool. This harbor tool project is named after the toolforge tool whose images it stores.

Harbor image retention policies

These are harbor objects that Harbor uses to decide how to handle the images in a harbor tool project's repositories. For example an image retention policy can be used to tell harbor to delete all except the latest 5 images in each of the repositories of a harbor tool project.

Harbor Image Immutability policies

Harbor image immutability policies are harbor objects used to prevent the deletion of the images in a harbor tool project's repositories. One important thing to remember about these is that they supersede image retention policies and when configured for a project ensure that all affected images cannot be deleted, not even by an image retention policy.

Delete harbor tool projects with no repositories

This task helps to delete harbor tool projects when all their repositories and images are deleted. This prevents unnecessary bloat.

Create or update Harbor tool project's image retention policy

When a harbor tool project is first created, it has no image retention policy configured. It is the responsibility of this task to create such a policy for each project and to update the policies of already existing projects if we decide to make changes to existing image retention policies.

Stale harbor image cleanup for core Toolforge repositories

We use Harbor to store the images of core toolforge repositories like builds-api for CI/CD purposes. These are grouped under the toolforge harbor project and have image immutability policy configured hence can't be cleaned up using a simple harbor image retention policy. This task is used to clean up these images by disabling the toolforge harbor project's image immutability policy, then running its own custom image cleanup algorithm for each repository in the project before enabling the immutability policy again.

How it's setup

To perform the housekeeping tasks implemented by maintain-harbor, you interact with maintain-harbor on the command line as a CLI. maintain-harbor is currently deployed as three separate kubernetes cronjobs in the maintain-harbor namespace performing the three tasks listed above.

If you are a toolforge admin, you can view these cronjobs by doing the following:

user@ubuntu:~/Desktop/maintain-harbor$ ssh login.toolforge.org
...
user@tools-bastion-NN:~$ kubectl sudo get cronjobs -n maintain-harbor
NAME                                        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mh--delete-empty-tool-projects-cron         23 8 * * *    False     0        10h             33d
mh--delete-stale-toolforge-artifacts-cron   0 8 * * 3     False     0        5d10h           33d
mh--manage-image-retention-cron             23 10 * * *   False     0        8h              33d
user@tools-k8s-control-7:~$

The maintain-harbor cronjobs are currently named accordingly:

  • mh--delete-empty-tool-projects-cron
  • mh--delete-stale-toolforge-artifacts-cron
  • mh--manage-image-retention-cron

Common tasks

Deploy maintain-harbor

To deploy maintain-harbor,

  • ssh into cloudcumin host and deploy using the toolforge component deploy cookbook:
    user@ubuntu:~$ ssh cloudcumin1001.eqiad.wmnet
    ...
    user@cloudcumin1001:~$ sudo cookbook wmcs.toolforge.component.deploy --cluster-name toolsbeta --component maintain-harbor
    
    The above command deploys the latest code in the maintain-harbor repository as kubernetes cronjobs in the toolsbeta cluster. To deploy to tools cluster, use --cluster-name tools

Checking the logs

We don't currently have a way to persistent the logs since we started using kubernetes cronjobs directly to deploy maintain-harbor. You can follow the task T383081 for any updates.

So the only way to see the logs is to use kubectl logs ... on the cronjob pod while it exists (you can restart the cronjob to force a run).

Manual execution

If you wish to manually perform any of the maintain-harbor tasks without using kubernetes cronjobs to schedule it:

  • clone the github repository:
    user@ubuntu:~$ git clone https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor.git $HOME/maintain-harbor
    
  • After this you need to create a python virtual environment using the $HOME/maintain-harbor/requirements.txt file. For information about creating one, see https://docs.python.org/3/tutorial/venv.html
  • Next step is to create a $HOME/maintain-harbor/.env config file. You can see the expected config file format by running:
    user@ubuntu:~$ cd $HOME/maintain-harbor && $HOME/maintain-harbor/venv/bin/python3 -m src.maintain_harbor --show-config
    maintain_harbor_environment=toolsbeta
    maintain_harbor_auth_password=my_password
    maintain_harbor_auth_username=my_username
    maintain_harbor_base_harbor_api=https://harbor.domain.name/api/v2.0
    maintain_harbor_do_retentions=true
    maintain_harbor_toolforge_repo_artifact_limit=2
    
    user@ubuntu:~/maintain-harbor$
    
  • To run a single maintain-harbor task, execute the following:
    user@ubuntu:~$ cd $HOME/maintain-harbor && $HOME/maintain-harbor/venv/bin/python3 -m src.maintain_harbor --config $HOME/.env delete-empty-tool-projects