Portal:Toolforge/Admin/Build Service
Documentation of components and common admin procedures for Toolforge/Build Service.
Components
- Builds client (source code): main entrypoint for users
- Builds API (source code): main entry-point for clients (users use the cli)
- Builds builder (source code): underlying building system
- Harbor (puppet code): Hosts the images the users create
- Tools: https://tools-harbor.wmcloud.org
- Toolsbeta: https://toolsbeta-harbor.wmcloud.org
Alerts
- From the cloud UI: https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive&q=project%3D~^%28tools%7Ctoolsbeta%29
- From the prod UI: https://alerts.wikimedia.org/?q=team%3Dwmcs&q=project%3D~%28tools%7Ctoolsbeta%29
Dashboards
You can find all the current dashboards here: https://grafana.wmcloud.org/d/m9V1RQs4k/harbor-overview?orgId=1
Administrative tasks
Starting a service
Harbor
Ssh to the harbor instance (ex. toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud
):
dcaro@vulcanus$ wm-ssh toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud
...
dcaro@toolsbeta-harbor-1:~$ sudo -i
root@toolsbeta-harbor-1:~# cd /srv/ops/harbor/
root@toolsbeta-harbor-1:/srv/ops/harbor# docker-compose up -d # will start the containers that are down if any
harbor-log is up-to-date
registry is up-to-date
redis is up-to-date
harbor-portal is up-to-date
registryctl is up-to-date
harbor-core is up-to-date
harbor-jobservice is up-to-date
nginx is up-to-date
harbor-exporter is up-to-date
Buildservice API
This lives in kubernetes, behind the API gateway. To start it you can try redepolying it, to do so follow Portal:Toolforge/Admin/Kubernetes#Deploy_new_version (the component is toolforge-builds-api).
You can monitor if it's coming up with the usual k8s commands:
root@toolsbeta-test-k8s-control-4:~# kubectl get all -n builds-api
NAME READY STATUS RESTARTS AGE
pod/builds-api-5bffd6b58f-9zg4s 2/2 Running 0 29h
pod/builds-api-5bffd6b58f-jk6sf 2/2 Running 0 29h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/builds-api ClusterIP 10.97.55.43 <none> 8443/TCP 18d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/builds-api 2/2 2 2 18d
NAME DESIRED CURRENT READY AGE
replicaset.apps/builds-api-5bffd6b58f 2 2 2 29h
Tekton
Similar to the builds api, tekton is a k8s component, you can try redepolying it too following Portal:Toolforge/Admin/Kubernetes/Components#Deploy (the component is buildservice).
You can monitor if it's coming up with the usual k8s commands:
root@toolsbeta-test-k8s-control-4:~# kubectl get all -n tekton-pipelines
NAME READY STATUS RESTARTS AGE
pod/tekton-pipelines-controller-5c78ddd49b-dj4hz 1/1 Running 0 57d
pod/tekton-pipelines-webhook-5d899cc8c-zwf7p 1/1 Running 0 57d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tekton-pipelines-controller ClusterIP 10.96.176.235 <none> 9090/TCP,8008/TCP,8080/TCP 447d
service/tekton-pipelines-webhook ClusterIP 10.101.163.215 <none> 9090/TCP,8008/TCP,443/TCP,8080/TCP 447d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/tekton-pipelines-controller 1/1 1 1 87d
deployment.apps/tekton-pipelines-webhook 1/1 1 1 87d
NAME DESIRED CURRENT READY AGE
replicaset.apps/tekton-pipelines-controller-5c78ddd49b 1 1 1 87d
replicaset.apps/tekton-pipelines-webhook-5d899cc8c 1 1 1 87d
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/tekton-pipelines-webhook Deployment/tekton-pipelines-webhook 4%/100% 1 5 1 447d
Stopping a service
Harbor
Ssh to the harbor instance (ex. toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud
):
dcaro@vulcanus$ wm-ssh toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud
...
dcaro@toolsbeta-harbor-1:~$ sudo -i
root@toolsbeta-harbor-1:~# cd /srv/ops/harbor/
oot@toolsbeta-harbor-1:/srv/ops/harbor# docker-compose stop
Stopping harbor-jobservice ... done
Stopping nginx ... done
Stopping harbor-exporter ... done
Stopping harbor-core ... done
Stopping registry ... done
Stopping harbor-portal ... done
Stopping redis ... done
Stopping registryctl ... done
Stopping harbor-log ... done
See also Portal:Toolforge/Admin/Harbor.
Buildservice API
Being a k8s deployment, the quickest way might be just to remove the deployment itself (will require redeploying to start again).
root@toolsbeta-test-k8s-control-4:~# kubectl get deployment -n builds-api builds-api -o yaml > backup.yaml # in case you want to restore later with kubectl apply -f backup.yaml
root@toolsbeta-test-k8s-control-4:~# kubectl delete deployment -n builds-api builds-api
For a full removal (CAREFUL! Only if you know what you are doing) you can use helm:
root@toolsbeta-test-k8s-control-4:~# helm uninstall -n builds-api builds-api
Tekton
This one is a bit more tricky, but it would be removing the tekton controller itself (the one that handles the PipelineRun
and TaskRun
resources).
root@toolsbeta-test-k8s-control-4:~# kubectl get deployment -n tekton-pipelines tekton-pipelines-controller -o yaml > backup.yaml # in case you want to restore later with kubectl apply -f backup.yaml
root@toolsbeta-test-k8s-control-4:~# kubectl delete deployment -n tekton-pipelines tekton-pipelines-controller
NOTE: Tekton does not have yet a helm deployment associated with it
Checking all components are alive
We don't have a unified dashboard yet, but for now you can check each component individually.
You can check the dashboards as a starting point. For the rest keep reading:
Harbor
Ssh to the harbor instance (ex. toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud
):
dcaro@vulcanus$ wm-ssh toolsbeta-harbor-1.toolsbeta.eqiad1.wikimedia.cloud
...
dcaro@toolsbeta-harbor-1:~$ sudo -i
root@toolsbeta-harbor-1:~# cd /srv/ops/harbor/
root@toolsbeta-harbor-1:/srv/ops/harbor# docker-compose ps
Name Command State Ports
----------------------------------------------------------------------------------------------------------------
harbor-core /harbor/entrypoint.sh Up (healthy)
harbor-exporter /harbor/entrypoint.sh Up
harbor-jobservice /harbor/entrypoint.sh Up (healthy)
harbor-log /bin/sh -c /usr/local/bin/ ... Up (healthy) 127.0.0.1:1514->10514/tcp
harbor-portal nginx -g daemon off; Up (healthy)
nginx nginx -g daemon off; Up (healthy) 0.0.0.0:80->8080/tcp, 0.0.0.0:9090->9090/tcp
redis redis-server /etc/redis.conf Up (healthy)
registry /home/harbor/entrypoint.sh Up (healthy)
registryctl /home/harbor/start.sh Up (healthy)
Buildservice API
You can monitor if it's coming up with the usual k8s commands:
root@toolsbeta-test-k8s-control-4:~# kubectl get all -n builds-api
NAME READY STATUS RESTARTS AGE
pod/builds-api-5bffd6b58f-9zg4s 2/2 Running 0 29h
pod/builds-api-5bffd6b58f-jk6sf 2/2 Running 0 29h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/builds-api ClusterIP 10.97.55.43 <none> 8443/TCP 18d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/builds-api 2/2 2 2 18d
NAME DESIRED CURRENT READY AGE
replicaset.apps/builds-api-5bffd6b58f 2 2 2 29h
Tekton
Same as before, different namespace:
root@toolsbeta-test-k8s-control-4:~# kubectl get all -n tekton-pipelines
NAME READY STATUS RESTARTS AGE
pod/tekton-pipelines-controller-5c78ddd49b-dj4hz 1/1 Running 0 57d
pod/tekton-pipelines-webhook-5d899cc8c-zwf7p 1/1 Running 0 57d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tekton-pipelines-controller ClusterIP 10.96.176.235 <none> 9090/TCP,8008/TCP,8080/TCP 447d
service/tekton-pipelines-webhook ClusterIP 10.101.163.215 <none> 9090/TCP,8008/TCP,443/TCP,8080/TCP 447d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/tekton-pipelines-controller 1/1 1 1 87d
deployment.apps/tekton-pipelines-webhook 1/1 1 1 87d
NAME DESIRED CURRENT READY AGE
replicaset.apps/tekton-pipelines-controller-5c78ddd49b 1 1 1 87d
replicaset.apps/tekton-pipelines-webhook-5d899cc8c 1 1 1 87d
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/tekton-pipelines-webhook Deployment/tekton-pipelines-webhook 4%/100% 1 5 1 447d
Updating the builder image
You can find the latest info on the builds-builder git repo.
Adding a new buildpack
We keep a fork for the buildpacks we inject under https://gitlab.wikimedia.org/groups/repos/cloud/toolforge/buildpacks
Currently we are using the heroku builder and the heroku buildpacks, that usually are in the old heroku structure (as opposed to cloud-native). So we have to add a shim layer to make them cloud-native compatible.
You can see the latest examples in the repository, but a good start is to pull the buildpack from the cnb heroku url https://buildpack-registry.heroku.com/cnb/emk/rust
where the last two parts of the path are the author and the name of the buildpack.
That will include a few scripts and files that are valid for cloud-native buildpack API 0.4, but we have to adapt them to API 0.6 at least as that's the minimum supported by the builder as of writting this.
For that, we have to:
- Add a
project.toml
file - Change the
api
entry in thebulidpack.toml
file - Change everywhere where a layer
toml
file is created to have the newer structure (see the existing examples).
You can create a new branch with the changes.
Once the fork is ready, we can add the buildpack to the list of buildpacks to inject in the builds-builder
(see examples there).
Note that this might change soon for a nicer/easier flow, as we are just starting to discover how to manage these things.