Portal:Toolforge/Admin/Harbor/maintain-harbor
maintain-harbor is a tool used to perform house-keeping tasks on our Harbor installation. Currently, these house-keeping tasks include:
- Deleting harbor tool projects with no repositories
- Creating or updating harbor tool projects image retention policies
- Stale harbor image cleanup for core Toolforge repositories like jobs-api and builds-api
The purpose of the above tasks is to keep the size of our harbor installation manageable and without these, our harbor installation will quickly overgrow the storage available to its host VM.
The code for maintain-harbor currently lives in wikimedia gitlab
Key concepts
Harbor Tool projects
These are harbor projects that are used to store the images generated from a buildservice build while authenticated as a particular toolforge tool. This harbor tool project is named after the toolforge tool whose images it stores.
Harbor image retention policies
These are harbor objects that Harbor uses to decide how to handle the images in a harbor tool project's repositories. For example an image retention policy can be used to tell harbor to delete all except the latest 5 images in each of the repositories of a harbor tool project.
Harbor Image Immutability policies
Harbor image immutability policies are harbor objects used to prevent the deletion of the images in a harbor tool project's repositories. One important thing to remember about these is that they supersede image retention policies and when configured for a project ensure that all affected images cannot be deleted, not even by an image retention policy.
Delete harbor tool projects with no repositories
This task helps to delete harbor tool projects when all their repositories and images are deleted. This prevents unnecessary bloat.
Create or update Harbor tool project's image retention policy
When a harbor tool project is first created, it has no image retention policy configured. It is the responsibility of this task to create such a policy for each project and to update the policies of already existing projects if we decide to make changes to existing image retention policies.
Stale harbor image cleanup for core Toolforge repositories
We use Harbor to store the images of core toolforge repositories like builds-api for CI/CD purposes. These are grouped under the toolforge harbor project and have image immutability policy configured hence can't be cleaned up using a simple harbor image retention policy. This task is used to clean up these images by disabling the toolforge harbor project's image immutability policy, then running its own custom image cleanup algorithm for each repository in the project before enabling the immutability policy again.
How it's setup
To perform the housekeeping tasks implemented by maintain-harbor, you interact with maintain-harbor on the command line as a CLI. maintain-harbor is currently deployed as three separate toolforge jobs performing the three tasks listed above using the maintain-harbor
toolforge tool account.
If you are a maintainer of the maintain-harbor tool or a tools admin, you can view the Toolforge jobs created by maintain-harbor by doing the following:
user@ubuntu:~/Desktop/maintain-harbor$ ssh login.toolforge.org
...
user@tools-bastion-13:~$ become maintain-harbor
tools.maintain-harbor@tools-bastion-13:~$ toolforge jobs list
+-------------------------------------------+-----------------------+------------------------------------------+
| Job name: | Job type: | Status: |
+-------------------------------------------+-----------------------+------------------------------------------+
| mh--delete-empty-tool-projects-cron | schedule: 23 8 * * * | Last schedule time: 2024-08-19T08:23:00Z |
| mh--delete-stale-toolforge-artifacts-cron | schedule: 0 8 * * 3 | Last schedule time: 2024-08-14T08:00:00Z |
| mh--manage-image-retention-cron | schedule: 23 10 * * * | Last schedule time: 2024-08-19T10:23:00Z |
+-------------------------------------------+-----------------------+------------------------------------------+
tools.maintain-harbor@tools-bastion-13:~$
The maintain-harbor jobs are currently named accordingly:
mh--delete-empty-tool-projects-cron
mh--delete-stale-toolforge-artifacts-cron
mh--manage-image-retention-cron
Common tasks
Deploy maintain-harbor
To deploy maintain-harbor using a Toolforge tool account,
- you need to have it cloned to the home directory of the tool account. We are currently using the maintain-harbor Toolforge tool account but you can use any tool:
tools.maintain-harbor@tools-bastion-13:~$ cd $HOME/maintain-harbor && git clone https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-harbor.git
- Next step is to create a
$HOME/maintain-harbor.yaml
config file. You can see the expected config file format by running:tools.maintain-harbor@tools-bastion-13:~$ cd $HOME/maintain-harbor && $HOME/venv/bin/python3 -m src.maintain_harbor --show-config environments: toolsbeta: auth_password: my_password auth_username: my_username base_harbor_api: https://harbor.domain.name/api/v2.0 do_retentions: true toolforge_repo_artifact_limit: 2 tools.maintain-harbor@tools-bastion-13:~/maintain-harbor$
- To finally get the maintain-harbor jobs running, execute:
tools.maintain-harbor@tools-bastion-13:~$ $HOME/maintain-harbor/deploy.sh
Checking maintain-harbor status
To view the status of the information about all the maintain-harbor jobs:
tools.maintain-harbor@tools-bastion-13:~$ toolforge jobs list
+-------------------------------------------+-----------------------+------------------------------------------+
| Job name: | Job type: | Status: |
+-------------------------------------------+-----------------------+------------------------------------------+
| mh--delete-empty-tool-projects-cron | schedule: 23 8 * * * | Last schedule time: 2024-08-19T08:23:00Z |
| mh--delete-stale-toolforge-artifacts-cron | schedule: 0 8 * * 3 | Last schedule time: 2024-08-14T08:00:00Z |
| mh--manage-image-retention-cron | schedule: 23 10 * * * | Last schedule time: 2024-08-19T10:23:00Z |
+-------------------------------------------+-----------------------+------------------------------------------+
tools.maintain-harbor@tools-bastion-13:~$
To view a detailed status of each job, you can run the toolforge-jobs command toolforge jobs show <job-name>
as the maintain-harbor tool:
tools.maintain-harbor@tools-bastion-13:~$ toolforge jobs show mh--delete-empty-tool-projects-cron
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Job name: | mh--delete-empty-tool-projects-cron |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Command: | cd $HOME/maintain-harbor && $HOME/venv/bin/python3 -m src.maintain_harbor --config $HOME/maintain-harbor.yaml delete-empty-tool-projects |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Job type: | schedule: 23 8 * * * |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Image: | python3.11 |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Port: | none |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| File log: | yes |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Output log: | /data/project/maintain-harbor/mh--delete-empty-tool-projects-cron.out |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Error log: | /data/project/maintain-harbor/mh--delete-empty-tool-projects-cron.err |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Emails: | onfailure |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Resources: | default |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Mounts: | all |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Retry: | no |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Health check: | none |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Status: | Last schedule time: 2024-08-19T08:23:00Z |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Hints: | No pods were created for this job. |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------+
tools.maintain-harbor@tools-bastion-13:~$
Stopping and starting the maintain-harbor jobs
There is currently no way to stop and restart an existing toolforge job. If you wish to do this for the maintain-harbor jobs, you will need to delete the jobs and recreate them whenever you wish.
To delete a job, run:
tools.maintain-harbor@tools-bastion-13:~$ toolforge jobs delete mh--delete-empty-tool-projects-cron
or
tools.maintain-harbor@tools-bastion-13:~$ toolforge jobs flush
you can easily recreate all the deleted jobs by running:
tools.maintain-harbor@tools-bastion-13:~$ $HOME/maintain-harbor/deploy.sh
The above command will recreate all the maintain-harbor jobs if they don't exist already.
Checking the logs
The logs for each job can be found in maintain-harbor tool's home directory as .out
and .err
files that are named after the jobs.
tools.maintain-harbor@tools-bastion-13:~$ cat $HOME/mh--delete-empty-tool-projects-cron.out
...
tools.maintain-harbor@tools-bastion-13:~$ cat $HOME/mh--delete-empty-tool-projects-cron.err
Manual execution
If you wish to manually perform any of the maintain-harbor tasks without using toolforge-jobs to schedule it, you might find the contents of the $HOME/maintain-harbor/jobs-schedule.yaml
useful.
- Clone the git repository. See #Deploy maintain-harbor
- After this you need to create a python virtual environment using the
$HOME/maintain-harbor/requirements.txt
file. For information about creating one, see https://docs.python.org/3/tutorial/venv.html - Create a
$HOME/maintain-harbor.yaml
config file. See #Deploy maintain-harbor
- To run a single maintain-harbor task outside toolforge jobs, execute the following:
tools.maintain-harbor@tools-bastion-13:~$ cd $HOME/maintain-harbor && $HOME/venv/bin/python3 -m src.maintain_harbor --config $HOME/maintain-harbor.yaml delete-empty-tool-projects