Machine Learning/LiftWing/Deploy

Summary

This page is a guide for ML Team members to start experimenting with helm and ml-serve. The guide is meant to outline the procedure to deploy a KServe InferenceService.

Steps for a regular deployment:

Uploading models (see How to upload a model to Swift)
Adding models to the value.yaml (see Add a model to an existing helmfile config)
Deployment to production (see How to deploy, Test your model after deployment)

If it's a new model that doesn't fit in any K8s namespace, ML SRE need to create a new one. In that case, contact a ML SRE to set up a new helmfile config. (see Add a new helmfile config)

Helm at the WMF

The starting point is the deployment-charts repository, that is split into two macro parts: charts and helmfile config. Deployment pipeline is the official guide for the more common k8s services, a strongly suggested read.

Charts

The charts are Helm charts, that we can consider (at high level) like Debian packages for service definitions. They have a version (in Chart.yaml), and every time that we update/add/delete/etc.. anything in them we need to bump that version. The charts will be then deployed following their versioning (more details later on). A Helm chart is a collection of yaml files, and templates can be used to control their content. A special file called values.yaml is meant to contain the default values for all the placeholders/variables/etc.. used in the templates.

In our case, we have created multiple charts, but the ones that we care for this tutorial are:

- kserve

- kserve-inference

The kserve chart contains the KServe upstream yaml file with all the Kubernetes Resource definitions needed to deploy the service. For example, it takes care to create the kserve namespace config and the kserve-controller-manager pod, that periodically checks for InferenceService resources and takes care of creating the related pods when needed. This chart is changed when Kserve needs to be upgraded (a new upstream version is out) or if we want to tune some of its configs.

The kserve-inference chart is where we define InferenceService resources, that correspond to the pods that implement our ML Services. The idea is to hide from the user all the complexity of an InferenceService config, reducing the boilerplate copy/paste to do. The template that is used in the chart allows the definition of a list of InferenceService resources, but its values.yaml file doesn't contain any value for it. This is because, as mentioned previously, the chart should contain values only for default settings (so something that can be applied to any cluster for example). We deploy Helm charts via Helmfile, see the next section for more info!

Helmfile

Helm is a nice deployment tool for Kubernetes, one of the de-facto standards. As more complex infrastructure were created using Kubernetes, it came up pretty soon the need to configure Helm deployments based on hierarchical settings and groups of charts bundled together, introducing the concept of cluster/environment. This is why helmfile was created! It is basically a very nice wrapper around Helm, implementing features that are not included in it. In the WMF use case, it allows the definition of multiple Kubernetes clusters (main-staging, main-eqiad, main-codfw, ml-serve-eqiad, ml-serve-codfw) and to manage various helm charts with a hierarchical config.

The major difference with Helm charts is that Helmfile configs don't have a version, and they have a totally different syntax from regular charts (but they allow to use templating in yaml files).

Add a model to an existing helmfile config

Let's start with a real life example, namely deploying a new revscoring-based model. In a world without Helm/Helmfile, we would craft something like the following:

apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  annotations:
     serving.kserve.io/s3-endpoint: thanos-swift.discovery.wmnet
     serving.kserve.io/s3-usehttps: "1"
     serving.kserve.io/s3-region: "us-east-1"
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: someaccount
  AWS_SECRET_ACCESS_KEY: somepassword
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: testelukey
secrets:
- name: test-secret
---
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: enwiki-goodfaith
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    serviceAccountName: testelukey
    containers:
      - name: kfserving-container
        image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:2021-07-28-204847-production
        env:
          # TODO: https://phabricator.wikimedia.org/T284091
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
          - name: INFERENCE_NAME
            value: "enwiki-goodfaith"
          - name: WIKI_URL
            value: "https://en.wikipedia.org"

And then we would just kubectl apply -f it to the cluster. This is doable for small tests but it is clearly not scalable for multiple users. There are also a lot of boilerplate configs like secrets and service accounts that should be hidden from users, together with every bit that is common between different InferenceService resources. Here comes our kserve-inference chart :)

As explained above, the chart should only be changed when we have base/common/default configs that we want to apply to all the model services, otherwise Helmfile is the right place to start.

The ML services config is stored in the deployment-charts repository, under helmfile.d/ml-services. There may be one or more sub-directories to explore, generally we create one for each group of models that we want to deploy. For example, the revscoring-editquality-goodfaith directory contains the definition of all the InferenceService resources for the goodfaith models.

Once identified the correct ml-services dir, there are three files to consider:

- helmfile.yaml

- values.yaml

- values-ml-staging-codfw.yaml

The first one is a helmfile specific config of the service, that includes what helm charts are used, their dependencies, how to release to specific clusters, etc.. 99% of the times this file should be left untouched, unless one wants to specifically modify something in it. The vast majority of the times the values.yaml is the one that we want to modify, since it contains the configuration bits for the kserve-inference chart. Do you recall that the chart's values.yaml file contained only default configs? The helmfile's values.yaml file contains the service-specific bits, most notably the list of InferenceService resources to deploy. The values-ml-staging-codfw.yaml is for staging cluster, contains the list of InferenceService resources to deploy for staging. The helmfile config picks up the values.yaml file first, then the staging one, so unless you specifically override things in the staging yaml nothing will be picked up from.

Let's see the revscoring-editquality-goodfaith helmfile config and its values.yaml file content:

docker:
  registry: docker-registry.discovery.wmnet/wikimedia
  imagePullPolicy: IfNotPresent

inference:
  annotations:
    sidecar.istio.io/inject: "true"
  s3_storage_base_uri: "s3://wmf-ml-models"
  model: "goodfaith"
  predictor:
    image: "machinelearning-liftwing-inference-services-editquality"
    version: "2022-08-08-100925-publish"
    base_env:
      - name: WIKI_URL
        value: "https://api-ro.discovery.wmnet"
      - name: REQUESTS_CA_BUNDLE
        value: "/etc/ssl/certs/wmf-ca-certificates.crt"
      - name: EVENTGATE_URL
        value: "https://eventgate-main.discovery.wmnet:4492/v1/events"
      - name: EVENTGATE_STREAM
        value: "mediawiki.revision-score-test"

revscoring_inference_services:
  - wiki: "ar"
    version: "20220214192125"
  - wiki: "bs"
    version: "20220214192131"
  - wiki: "ca"
    version: "20220214192134"

Another example is the articletopic-outlink helmfile config and its values.yaml file content:

docker:
  registry: docker-registry.discovery.wmnet/wikimedia
  imagePullPolicy: IfNotPresent

inference:
  annotations:
    sidecar.istio.io/inject: "true"
  predictor:
    image: "machinelearning-liftwing-inference-services-outlink"
    version: "2022-07-27-061940-publish"
    base_env:
      - name: STORAGE_URI
        value: "s3://wmf-ml-models/articletopic/outlink/20220727080723/"
  transformer:
    image: "machinelearning-liftwing-inference-services-outlink-transformer"
    version: "2022-07-28-100205-publish"
    base_env:
      - name: CUSTOM_UA
        value: "WMF ML Team outlink topic model isvc"
      - name: WIKI_URL
        value: "https://api-ro.discovery.wmnet"
      - name: REQUESTS_CA_BUNDLE
        value: "/etc/ssl/certs/wmf-ca-certificates.crt"

inference_services:
  - name: "outlink-topic-model"

As we can see all the bits related to secrets and service accounts are gone, they are configured behind the scenes and hidden from the user. We have three relevant sections to consider:

inference - this is the common config of all the InferenceService resources that we'll configure, and it is meant to avoid to copy/paste the same bits over and over in inference_services and revscoring_inference_services.
inference_services - every entry in this list corresponds to a separate InferenceService resource, that is composed by the default config (outlined above) plus the more specific one. To see all the configuration bits allowed, please check the kserve-inference templates!
revscoring_inference_services - every entry in this list corresponds to a separate InferenceService resource, that is composed by the default config (outlined above) plus the more specific one. To see all the configuration bits allowed, please check the kserve-inference templates! This is a special ad-hoc config for revscoring-only models, created to support the migration from ORES to LiftWing. Please don't add any non-revscoring model to this list!

revscoring_inference_services

Use this configuration template only for revscoring models! Please check "inference_services" for any other kind of model.

changing a docker image

Let's imagine that we want to change the enwiki-goodfaith's docker image to a new version, leaving the rest of the inference_services entries to use the default one. We can try something like that:

diff --git a/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml b/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
index 55b0464f..93c1261f 100644
--- a/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
@@ -64,6 +64,8 @@ revscoring_inference_services:
     version: "20220214192139"
   - wiki: "en"
     version: "202105140814"
+    predictor:
+      version: "some-new-version"
   - wiki: "eswikibooks"
     host: es.wikibooks.org
     version: "20220214192150"

This change, once code reviewed and deployed, will translate into a new Knative revision of the InferenceService, that will get all new traffic coming in.

TODO: Add best practices and tricks to use with Knative to split traffic between revisions etc..

adding a new model

If we want to add another model to the list, for example Icelandic iswiki-goodfaith, this change should be enough:

diff --git a/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml b/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
index 55b0464f..9bda5be0 100644
--- a/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
@@ -118,3 +118,5 @@ revscoring_inference_services:
     version: "20220214192321"
   - wiki: "zh"
     version: "20220214192324"
+  - wiki: "is"
+    version: "20220819101100"

overriding a default variable

The wiki and model variables will autogenerate the INFERENCE_NAME and WIKI_HOST variables in the InferenceService resource. In specific cases you may want to override the WIKI_HOST variable, for example:

diff --git a/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml b/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
index 55b0464f..4fea8bb5 100644
--- a/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality-goodfaith/values.yaml
@@ -88,6 +88,9 @@ revscoring_inference_services:
     version: "20220214192223"
   - wiki: "it"
     version: "20220214171756"
+  - wiki: "itwikiquote"
+    host: it.wikiquote.org
+    version: "20220819101800"
   - wiki: "ja"
     version: "20220214192231"
   - wiki: "ko"

The "host" variable will replace the auto-generated WIKI_HOST value, that in this case wouldn't have been the correct one (itwikiquote.wikipedia.org).

Add a new helmfile config

This the most complex use case, so if you haven't read the previous one, please do so :)

Every subdirectory under helmfile.d/ml-services in the deployment-charts repository represents a group of models that are deployed in the same Kubernetes namespace, since they have something in common. For example, we currently have 9 subdirectories:

revscoring-articlequality
revscoring-draftquality
revscoring-editquality-damaging
revscoring-editquality-goodfaith
revscoring-editquality-reverted
revscoring-articletopic
revscoring-drafttopic
articletopic-outlink
experimental (for models that are not yet fully productionized)

The idea is to find a trade-off between having too many separate helmfile configs and the need to group multiple models in a consistent way. So if you are looking to add a new model that doesn't fit in any subdirectory (that we could even call it a model group or category) you'll need to create a newer one. In this section we'll basically go through what it was done to add the revscoring-draftquality helmfile config, tracked in T293858.

The first step is to ask to an ML SRE (Luca or Tobias) to set up the basic usernames, namespace and helmfile private settings for the new helmfile config. As stated in the task, multiple things are needed:

Two Kubernetes users, revscoring-draftquality and revscoring-draftquality-deploy. They will be used to (respectively) view and deploy Kubernetes resources in the new namespace (see below).
A new Kubernetes namespace, revscoring-draftquality (with all the relevant security policies).
Private helmfile config to be able to access Swift/S3 (username and password).

The idea is to avoid to share too many things between namespaces, so that a change to one will likely not affect the others. It is the same standard used for the other Kubernetes clusters. Part of the work is done in public repositories (puppet and deployment-charts) and the rest is done in the puppet private repository. The goal is to have the following files available on the deployment host:

elukey@deploy1002:~$ ls -l /etc/helmfile-defaults/private/ml-serve_services/revscoring-draftquality/ml-serve-eqiad.yaml 
-rw-r----- 1 mwdeploy wikidev 406 Oct 20 07:45 /etc/helmfile-defaults/private/ml-serve_services/revscoring-draftquality/ml-serve-eqiad.yaml

elukey@deploy1002:~$ ls -l /etc/kubernetes/revscoring-draftquality-deploy-ml-serve-*
-rw-r----- 1 root deploy-ml-service 403 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-deploy-ml-serve-codfw.config
-rw-r----- 1 root deploy-ml-service 403 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-deploy-ml-serve-eqiad.config

elukey@deploy1002:~$ ls -l /etc/kubernetes/revscoring-draftquality-ml-serve-*
-rw-r----- 1 mwdeploy wikidev 389 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-ml-serve-codfw.config
-rw-r----- 1 mwdeploy wikidev 389 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-ml-serve-eqiad.config

Their role will be explained later on, for the moment note the fact that the revscoring-draftquality-deploy's file can only be read by deploy-ml-service POSIX users (as explained above, a user needs to be added by SRE to that group to be able to deploy to the ml-serve cluster).

At this point, you are free to start working on the new helmfile config in the deployment-charts repository!

There is a special _example_ directory that one can use when creating a new helmfile config, since it contains various placeholders to find/replace with the new config values. In our example, a simple recursive copy of _example_ to revscoring-draftquality does the job, but you are free to take other roads. You'll find three files inside the new directory:

helmfile.yaml
values.yaml
values-ml-staging-codfw.yaml

The former contains the configuration to execute deployments using Helm 3, and it is usually something that we shouldn't change, except of course the replacement of all the SERVICE_NAME placeholders (if copying from _example_). The latter two are where we list the definition of all Inference Services (see section above about it).

Once you have the new directory in place, send a code review and wait for an SRE to +2/merge. After that you should be able to proceed with a regular deployment (see related section).

How to upload a model to Swift

In order to update a model to Swift, you'll need to be in the deploy-ml-service POSIX group (all ML team members are on it). On every stat100x host there is a tool called model-upload:

elukey@stat1004:~$ model-upload

 error: first argument is not a file.

  Usage: model-upload <model-file> <model-type> <model-lang> <bucket>

elukey@stat1004:~$ model-upload /home/elukey/some-model.bin goodfaith itwiki wmf-ml-models

In the above example, the some-model.bin file should end up in s3://wmf-ml-models/goodfaith/itwiki/SOME-TIMESTAMP/some-model.bin

The SOME-TIMESTAMP value will be generated by model_upload upon execution (something like $(date "+%Y%m%d%H%M%S"))

How to deploy

Once you have code reviewed and merged a change for deployment-charts, you'll need to jump to the current deployment node (like deployment.eqiad.wmnet). If you don't have access to the host, it may be due to the fact that you are not in the deploy-ml-services POSIX group. In case you are not in it, file an Access Request to the SRE team (TODO: add links, all the ML team members are already in it).

Once on the deployment node, cd to /srv/deployment-charts/helmfile.d/ml-services and choose the directory corresponding to the model that you want to deploy. The repository gets updated to its latest version by puppet, so after merging your change it may take some minutes before your change appears in the repo on the deployment node. Use git log to confirm that your change is available.

It is always good to double check that the model.bin file is on S3/Swift before proceeding:

elukey@stat1004:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/goodfaith/enwiki/202105260914/
2021-10-19 14:43     10351347  s3://wmf-ml-models/goodfaith/enwiki/202105260914/model.bin

If there is no model.bin file, please do not proceed further!

At this point, you can use the helmfile command in the following way:


#	Steps to Create / Update an isvc pod	Staging	Production
1	Diff: first confirm what is going to be changed if/when you deploy (it only displays a diff, no deploy action is taken, so it is a safe command to run)	`helmfile -e ml-staging-codfw diff`	`helmfile -e ml-serve-eqiad diff` `helmfile -e ml-serve-codfw diff`
2	Deploy: to deploy via helm your new config/code changes.	`helmfile -e ml-staging-codfw sync`	`helmfile -e ml-serve-eqiad sync` `helmfile -e ml-serve-codfw sync`

Please Note: Only SREs have rights to delete isvc pods.

Test your model after deployment

First, you want to check if the inference service is up running:

elukey@deploy1002:~$ kube_env revscoring-editquality-goodfaith ml-serve-eqiad
elukey@deploy1002:~$ kubectl get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
enwiki-goodfaith-predictor-default-74qsz-deployment-dfd48cfdzbv   3/3     Running   0          10d

Once an InferenceService is deployed/changed it should become available with the HTTP Host header MODELNAME.KUBERNETES_NAMESPACE.wikimedia.org. For example, the aforementioned enwiki-goodfaith model should get the Host header enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org (note: the kubernetes namespace is equal to the name of the ml-services' subdirectory). If you want to query it via curl:

elukey@ml-serve-ctrl1001:~$ cat input.json 
{ "rev_id": 132421 }

elukey@ml-serve-ctrl1001:~$ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/enwiki-goodfaith:predict" -X POST -d @input.json -i -H "Host: enwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org" --http1.1
    HTTP/1.1 200 OK
    content-length: 112
    content-type: application/json; charset=UTF-8
    date: Tue, 19 Oct 2021 12:17:28 GMT
    server: istio-envoy
    x-envoy-upstream-service-time: 349
    
    {"predictions": {"prediction": true, "probability": {"false": 0.06715093098078351, "true": 0.9328490690192165}}}
    real	0m0.381s
    user	0m0.015s
    sys 0m0.011s

If you want to inspect some kubernetes-specific settings, for example the Knative revisions and their settings, you can do something like:

elukey@deploy1002:~$ kube_env revscoring-editquality-goodfaith ml-serve-eqiad

elukey@deploy1002:~$ kubectl get revisions
NAME                                       CONFIG NAME                          K8S SERVICE NAME                           GENERATION   READY   REASON
enwiki-goodfaith-predictor-default-7sbq5   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-7sbq5   1            True    
enwiki-goodfaith-predictor-default-84n6c   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-84n6c   5            True    
enwiki-goodfaith-predictor-default-g2ffj   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-g2ffj   3            True    
enwiki-goodfaith-predictor-default-jnb8s   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-jnb8s   2            True    
enwiki-goodfaith-predictor-default-t8dkx   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-t8dkx   4            True

Troubleshooting

In case of troubles, you can always check the logs of the pods. For example, let's assume you see the following after deploying:

elukey@deploy1002:~$ kube_env revscoring-editquality-damaging ml-serve-eqiad

elukey@deploy1002:~$ kubectl get pods 
enwiki-damaging-predictor-default-vf5ck-deployment-78585785lqs6   0/2     CrashLoopBackOff   6          10m
eswiki-damaging-predictor-default-69h54-deployment-7bf5cd99vdjc   2/2     Running            0          4d23h

If you just deployed enwiki-damaging, then something is not right. A quick sanity check could be to inspect the pod's container logs to see if anything looks weird. In this case:

elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-vf5ck-deployment-78585785lqs6 -n revscoring-editquality-damaging storage-initializer 
/usr/local/lib/python3.7/dist-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  "update your install command.", FutureWarning)
[I 211019 14:21:23 storage-initializer-entrypoint:13] Initializing, args: src_uri [s3://wmf-ml-models/damaging/enwiki/202105260914/] dest_path[ [/mnt/models]
[I 211019 14:21:23 storage:52] Copying contents of s3://wmf-ml-models/damaging/enwiki/202105260914/ to local
[I 211019 14:21:23 credentials:1102] Found credentials in environment variables.
[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models
elukey@ml-serve-ctrl1001:~$ kubectl describe pod enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality

We start from the storage-initializer since it is the first one that runs, and in this case it seems doing the right thing (namely pulling the model from s3 to local). So let's see the logs for the kserve-container:

elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-vf5ck-deployment-78585785lqs6 -n revscoring-editquality-damaging kserve-container
Traceback (most recent call last):
  File "model-server/model.py", line 41, in <module>
    model.load()
  File "model-server/model.py", line 17, in load
    with open("/mnt/models/model.bin") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/models/model.bin'

The logs indicate that the model.bin file was not found, but the storage-initializer states that it did download it correctly. In this case the issue is sneaky, compare the following:

[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models

And the relevant code review change:

+      - name: STORAGE_URI
+        value: "s3://wmf-ml-models/damaging/enwiki/202105260914/"

And the s3 bucket list:

elukey@stat1004:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/damaging/enwiki/202105260914/
[..nothing..]

The model.bin file was not uploaded in the correct S3 path, and the storage-initializer probably failed gracefully. In this case a good follow up is uploading the model via s3cmd, or to create another S3 path and redo the change.

Also in case of trouble don't forget to refer to the Debugging Guide in the official kserve documentation.