Jump to content

Portal:Toolforge/Admin/Build Service

Phabricator project: #tbs
From Wikitech

Documentation of components and common admin procedures for Toolforge/Build Service.

Components

Alerts

Dashboards

You can find all the current dashboards here: https://grafana.wmcloud.org/d/m9V1RQs4k/harbor-overview?orgId=1

Administrative tasks

Starting a service

Harbor

Ssh to the harbor instance (ex. toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud):

dcaro@vulcanus$ wm-ssh toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud 
...
dcaro@toolsbeta-harbor-1:~$ sudo -i
root@toolsbeta-harbor-1:~# cd /srv/ops/harbor/

root@toolsbeta-harbor-1:/srv/ops/harbor# docker-compose up -d  # will start the containers that are down if any
harbor-log is up-to-date
registry is up-to-date
redis is up-to-date
harbor-portal is up-to-date
registryctl is up-to-date
harbor-core is up-to-date
harbor-jobservice is up-to-date
nginx is up-to-date
harbor-exporter is up-to-date

Buildservice API

This lives in kubernetes, behind the API gateway. To start it you can try redepolying it, to do so follow Portal:Toolforge/Admin/Kubernetes#Deploy_new_version (the component is toolforge-builds-api).

You can monitor if it's coming up with the usual k8s commands:

root@toolsbeta-test-k8s-control-4:~# kubectl get all -n builds-api
NAME                              READY   STATUS    RESTARTS   AGE
pod/builds-api-5bffd6b58f-9zg4s   2/2     Running   0          29h
pod/builds-api-5bffd6b58f-jk6sf   2/2     Running   0          29h

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/builds-api   ClusterIP   10.97.55.43   <none>        8443/TCP   18d

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/builds-api   2/2     2            2           18d

NAME                                    DESIRED   CURRENT   READY   AGE

replicaset.apps/builds-api-5bffd6b58f   2         2         2       29h

Tekton

Similar to the builds api, tekton is a k8s component, you can try redepolying it too following Portal:Toolforge/Admin/Kubernetes/Components#Deploy (the component is buildservice).

You can monitor if it's coming up with the usual k8s commands:

root@toolsbeta-test-k8s-control-4:~# kubectl get all -n tekton-pipelines
NAME                                               READY   STATUS    RESTARTS   AGE
pod/tekton-pipelines-controller-5c78ddd49b-dj4hz   1/1     Running   0          57d
pod/tekton-pipelines-webhook-5d899cc8c-zwf7p       1/1     Running   0          57d

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                              AGE
service/tekton-pipelines-controller   ClusterIP   10.96.176.235    <none>        9090/TCP,8008/TCP,8080/TCP           447d
service/tekton-pipelines-webhook      ClusterIP   10.101.163.215   <none>        9090/TCP,8008/TCP,443/TCP,8080/TCP   447d

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/tekton-pipelines-controller   1/1     1            1           87d
deployment.apps/tekton-pipelines-webhook      1/1     1            1           87d

NAME                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/tekton-pipelines-controller-5c78ddd49b   1         1         1       87d
replicaset.apps/tekton-pipelines-webhook-5d899cc8c       1         1         1       87d

NAME                                                           REFERENCE                             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/tekton-pipelines-webhook   Deployment/tekton-pipelines-webhook   4%/100%   1         5         1          447d

Stopping a service

Harbor

Ssh to the harbor instance (ex. toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud):

dcaro@vulcanus$ wm-ssh toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud 
...
dcaro@toolsbeta-harbor-1:~$ sudo -i
root@toolsbeta-harbor-1:~# cd /srv/ops/harbor/

oot@toolsbeta-harbor-1:/srv/ops/harbor# docker-compose stop
Stopping harbor-jobservice ... done
Stopping nginx             ... done
Stopping harbor-exporter   ... done
Stopping harbor-core       ... done
Stopping registry          ... done
Stopping harbor-portal     ... done
Stopping redis             ... done
Stopping registryctl       ... done
Stopping harbor-log        ... done

See also Portal:Toolforge/Admin/Harbor.

Buildservice API

Being a k8s deployment, the quickest way might be just to remove the deployment itself (will require redeploying to start again).

root@toolsbeta-test-k8s-control-4:~# kubectl get deployment -n builds-api builds-api -o yaml > backup.yaml  # in case you want to restore later with kubectl apply -f backup.yaml

root@toolsbeta-test-k8s-control-4:~# kubectl delete deployment -n builds-api builds-api

For a full removal (CAREFUL! Only if you know what you are doing) you can use helm:

root@toolsbeta-test-k8s-control-4:~# helm uninstall -n builds-api builds-api

Tekton

This one is a bit more tricky, but it would be removing the tekton controller itself (the one that handles the PipelineRun and TaskRun resources).

root@toolsbeta-test-k8s-control-4:~# kubectl get deployment -n tekton-pipelines tekton-pipelines-controller -o yaml > backup.yaml  # in case you want to restore later with kubectl apply -f backup.yaml

root@toolsbeta-test-k8s-control-4:~# kubectl delete deployment -n tekton-pipelines tekton-pipelines-controller

NOTE: Tekton does not have yet a helm deployment associated with it

Checking all components are alive

We don't have a unified dashboard yet, but for now you can check each component individually.

You can check the dashboards as a starting point. For the rest keep reading:


Harbor

Ssh to the harbor instance (ex. toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud):

dcaro@vulcanus$ wm-ssh toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud 
...
dcaro@toolsbeta-harbor-1:~$ sudo -i
root@toolsbeta-harbor-1:~# cd /srv/ops/harbor/

root@toolsbeta-harbor-1:/srv/ops/harbor# docker-compose ps
      Name                     Command                  State                          Ports                    
----------------------------------------------------------------------------------------------------------------
harbor-core         /harbor/entrypoint.sh            Up (healthy)                                               
harbor-exporter     /harbor/entrypoint.sh            Up                                                         
harbor-jobservice   /harbor/entrypoint.sh            Up (healthy)                                               
harbor-log          /bin/sh -c /usr/local/bin/ ...   Up (healthy)   127.0.0.1:1514->10514/tcp                   
harbor-portal       nginx -g daemon off;             Up (healthy)                                               
nginx               nginx -g daemon off;             Up (healthy)   0.0.0.0:80->8080/tcp, 0.0.0.0:9090->9090/tcp
redis               redis-server /etc/redis.conf     Up (healthy)                                               
registry            /home/harbor/entrypoint.sh       Up (healthy)                                               
registryctl         /home/harbor/start.sh            Up (healthy)

Buildservice API

You can monitor if it's coming up with the usual k8s commands:

root@toolsbeta-test-k8s-control-4:~# kubectl get all -n builds-api
NAME                              READY   STATUS    RESTARTS   AGE
pod/builds-api-5bffd6b58f-9zg4s   2/2     Running   0          29h
pod/builds-api-5bffd6b58f-jk6sf   2/2     Running   0          29h

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
service/builds-api   ClusterIP   10.97.55.43   <none>        8443/TCP   18d

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/builds-api   2/2     2            2           18d

NAME                                    DESIRED   CURRENT   READY   AGE

replicaset.apps/builds-api-5bffd6b58f   2         2         2       29h

Tekton

Same as before, different namespace:

root@toolsbeta-test-k8s-control-4:~# kubectl get all -n tekton-pipelines
NAME                                               READY   STATUS    RESTARTS   AGE
pod/tekton-pipelines-controller-5c78ddd49b-dj4hz   1/1     Running   0          57d
pod/tekton-pipelines-webhook-5d899cc8c-zwf7p       1/1     Running   0          57d

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                              AGE
service/tekton-pipelines-controller   ClusterIP   10.96.176.235    <none>        9090/TCP,8008/TCP,8080/TCP           447d
service/tekton-pipelines-webhook      ClusterIP   10.101.163.215   <none>        9090/TCP,8008/TCP,443/TCP,8080/TCP   447d

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/tekton-pipelines-controller   1/1     1            1           87d
deployment.apps/tekton-pipelines-webhook      1/1     1            1           87d

NAME                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/tekton-pipelines-controller-5c78ddd49b   1         1         1       87d
replicaset.apps/tekton-pipelines-webhook-5d899cc8c       1         1         1       87d

NAME                                                           REFERENCE                             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/tekton-pipelines-webhook   Deployment/tekton-pipelines-webhook   4%/100%   1         5         1          447d

Updating the builder image

Currently we are using the upstream heroku builder, and the image is hosted in harbor.

To update it, you just have to push a newer version to harbor and configure builds-api to use the new builder image.

user@mylaptop$ podman pull docker.io/heroku/builder:22

user@mylaptop$ podman tag docker.io/heroku/builder:22 toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22
user@mylaptop$ podman push toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22

user@mylaptop$ podman tag docker.io/heroku/builder:22 tools-harbor.wmcloud.org/toolforge/heroku-builder:22
user@mylaptop$ podman push tools-harbor.wmcloud.org/toolforge/heroku-builder:22

You can add first a date to the tag ("22_20240105" for example) so you can test it before releasing it. A new builder might change the actual list of buildpacks that come with it and/or the supported API versions, so you also have to cross-check with the buildpacks we inject (see the list on gitlab) and with the code that inject (see the builds-builder) to make sure they work with the lifecycle image included in the new builder.

Adding a new buildpack

We keep a fork for the buildpacks we inject under https://gitlab.wikimedia.org/groups/repos/cloud/toolforge/buildpacks

Currently we are using the heroku builder and the heroku buildpacks, that usually are in the old heroku structure (as opposed to cloud-native). So we have to add a shim layer to make them cloud-native compatible.

You can see the latest examples in the repository, but a good start is to pull the buildpack from the cnb heroku url https://buildpack-registry.heroku.com/cnb/emk/rust where the last two parts of the path are the author and the name of the buildpack.

That will include a few scripts and files that are valid for cloud-native buildpack API 0.4, but we have to adapt them to API 0.6 at least as that's the minimum supported by the builder as of writting this.

For that, we have to:

  • Add a project.toml file
  • Change the api entry in the bulidpack.toml file
  • Change everywhere where a layer toml file is created to have the newer structure (see the existing examples).

You can create a new branch with the changes.

Once the fork is ready, we can add the buildpack to the list of buildpacks to inject in the builds-builder (see examples there).

Note that this might change soon for a nicer/easier flow, as we are just starting to discover how to manage these things.

History

See Help:Toolforge/Build Service#History.