User:Elukey/MachineLearning/Deploy

From Wikitech
Jump to navigation Jump to search

Summary

This page is a guide for ML Team members to start experimenting with helm and ml-serve. The guide is meant to outline the procedure to deploy a KServe InferenceService.

Helm at the WMF

The starting point is the deployment-charts repository, that is split into two macro parts: charts and helmfile config. Deployment pipeline is the official guide for the more common k8s services, a strongly suggested read.

Charts

The charts are Helm charts, that we can consider (at high level) like Debian packages for service definitions. They have a version (in Chart.yaml), and every time that we update/add/delete/etc.. anything in them we need to bump that version. The charts will be then deployed following their versioning (more details later on). A Helm chart is a collection of yaml files, and templates can be used to control their content. A special file called values.yaml is meant to contain the default values for all the placeholders/variables/etc.. used in the templates.

In our case, we have created multiple charts, but the ones that we care for this tutorial are:

- kserve

- kserve-inference

The kserve chart contains the KServe upstream yaml file with all the Kubernetes Resource definitions needed to deploy the service. For example, it takes care to create the kserve namespace config and the kserve-controller-manager pod, that periodically checks for InferenceService resources and takes care of creating the related pods when needed. This chart is changed when Kserve needs to be upgraded (a new upstream version is out) or if we want to tune some of its configs.

The kserve-inference chart is where we define InferenceService resources, that correspond to the pods that implement our ML Services. The idea is to hide from the user all the complexity of an InferenceService config, reducing the boilerplate copy/paste to do. The template that is used in the chart allows the definition of a list of InferenceService resources, but its values.yaml file doesn't contain any value for it. This is because, as mentioned previously, the chart should contain values only for default settings (so something that can be applied to any cluster for example). We deploy Helm charts via Helmfile, see the next section for more info!

Helmfile

Helm is a nice deployment tool for Kubernetes, one of the de-facto standards. As more complex infrastructure were created using Kubernetes, it came up pretty soon the need to configure Helm deployments based on hierarchical settings and groups of charts bundled together, introducing the concept of cluster/environment. This is why helmfile was created! It is basically a very nice wrapper around Helm, implementing features that are not included in it. In the WMF use case, it allows the definition of multiple Kubernetes clusters (main-staging, main-eqiad, main-codfw, ml-serve-eqiad, ml-serve-codfw) and to manage various helm charts with a hierarchical config.

The major difference with Helm charts is that Helmfile configs don't have a version, and they have a totally different syntax from regular charts (but they allow to use templating in yaml files).

Add a model to an existing helmfile config

Let's start with a real life example, namely deploying a new revscoring-based model. In a world without Helm/Helmfile, we would craft something like the following:

apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  annotations:
     serving.kserve.io/s3-endpoint: thanos-swift.discovery.wmnet
     serving.kserve.io/s3-usehttps: "1"
     serving.kserve.io/s3-region: "us-east-1"
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: someaccount
  AWS_SECRET_ACCESS_KEY: somepassword
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: testelukey
secrets:
- name: test-secret
---
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: enwiki-goodfaith
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    serviceAccountName: testelukey
    containers:
      - name: kfserving-container
        image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:2021-07-28-204847-production
        env:
          # TODO: https://phabricator.wikimedia.org/T284091
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
          - name: INFERENCE_NAME
            value: "enwiki-goodfaith"
          - name: WIKI_URL
            value: "https://en.wikipedia.org"

And then we would just kubectl apply -f it to the cluster. This is doable for small tests but it is clearly not scalable for multiple users. There are also a lot of boilerplate configs like secrets and service accounts that should be hidden from users, together with every bit that is common between different InferenceService resources. Here comes our kserve-inference chart :)

As explained above, the chart should only be changed when we have base/common/default configs that we want to apply to all the model services, otherwise Helmfile is the right place to start.

The ML services config is stored in the deployment-charts repository, under helmfile.d/ml-services. There may be one or more sub-directories to explore, generally we create one for each group of models that we want to deploy. For example, the revscoring-editquality directory contains the definition of all the InferenceService resources for the edit quality models.

Once identified the correct ml-services dir, there are two files to consider:

- helmfile.yaml

- values.yaml

The former is a helmfile specific config of the service, that includes what helm charts are used, their dependencies, how to release to specific clusters, etc.. 99% of the times this file should be left untouched, unless one wants to specifically modify something in it. The vast majority of the times the values.yaml is the one that we want to modify, since it contains the configuration bits for the kserve-inference chart. Do you recall that the chart's values.yaml file contained only default configs? The helmfile's values.yaml file contains the service-specific bits, most notably the list of InferenceService resources to deploy.

For the moment we have only the revscoring-editquality helmfile config, let's see its values.yaml file content:

docker:
  registry: docker-registry.discovery.wmnet/wikimedia
  imagePullPolicy: IfNotPresent

inference:
  image: "machinelearning-liftwing-inference-services-editquality"
  version: "2021-09-01-140944-production"
  annotations:
    sidecar.istio.io/inject: "false"
  base_env:
    - name: WIKI_URL
      value: "https://api-ro.discovery.wmnet"
    - name: REQUESTS_CA_BUNDLE
      value: "/usr/share/ca-certificates/wikimedia/Puppet_Internal_CA.crt"

inference_services:
  - name: "enwiki-goodfaith"
    custom_env:
      - name: INFERENCE_NAME
        value: "enwiki-goodfaith"
      - name: WIKI_HOST
        value: "en.wikipedia.org"
      - name: STORAGE_URI
        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"

As we can see all the bits related to secrets and service accounts are gone, they are configured behind the scenes and hidden from the user. We have two relevant sections to consider:

  • inference - this is the common config of all the InferenceService resources that we'll configure, and it is meant to avoid to copy/paste the same bits over and over in inference_services.
  • inference_services - every entry in this list corresponds to a separate InferenceService resource, that is composed by the default config (outlined above) plus the more specific one. To see all the configuration bits allowed, please check the kserve-inference templates!

Let's imagine that we want to change the enwiki-goodfaith's docker image, leaving the rest of the inference_services entries to use the default one (in this case we have only one service definition, but pretend that there are way more :). We can try something like that:

diff --git a/helmfile.d/ml-services/revscoring-editquality/values.yaml b/helmfile.d/ml-services/revscoring-editquality/values.yaml
index 30c59a38..ec8ad0e6 100644
--- a/helmfile.d/ml-services/revscoring-editquality/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality/values.yaml
@@ -15,6 +15,8 @@ inference:
 
 inference_services:
   - name: "enwiki-goodfaith"
+    image: "machine-learning-liftwing-new-docker-image"
+    image_version: "some-new-version"
     custom_env:
       - name: INFERENCE_NAME
         value: "enwiki-goodfaith"

This change, once code reviewed and deployed, will translate into a new Knative revision of the InferenceService, that will get all new traffic coming in.

TODO: Add best practices and tricks to use with Knative to split traffic between revisions etc..

I we want to add another model to the list, for example enwiki-damaging, this change should be enough:

diff --git a/helmfile.d/ml-services/revscoring-editquality/values.yaml b/helmfile.d/ml-services/revscoring-editquality/values.yaml
index 30c59a38..26aaedd8 100644
--- a/helmfile.d/ml-services/revscoring-editquality/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality/values.yaml
@@ -21,4 +21,12 @@ inference_services:
       - name: WIKI_HOST
         value: "en.wikipedia.org"
       - name: STORAGE_URI
-        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
\ No newline at end of file
+        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
+  - name: "enwiki-damaging"
+    custom_env:
+      - name: INFERENCE_NAME
+        value: "enwiki-damaging"
+      - name: WIKI_HOST
+        value: "en.wikipedia.org"
+      - name: STORAGE_URI
+        value: "s3://wmf-ml-models/goodfaith/damaging/202105140915/"

The idea is that we use each helmfile subdir of ml-services as collections of models, all belonging to the same group/category.

How to add a new helmfile config

This the most complex use case, so if you haven't read the previous one, please do so :)

Every subdirectory under helmfile.d/ml-services in the deployment-charts repository represents a group of models that are deployed in the same Kubernetes namespace, since they have something in common. For example, we currently have two subdirectories (there will be more in the future) for revscoring:

  • revscoring-editquality
  • revscoring-draftquality

The idea is to find a trade-off between having too many separate helmfile configs and the need to group multiple models in a consistent way. So if you are looking to add a new model that doesn't fit in any subdirectory (that we could even call it a model group or category) you'll need to create a newer one. In this section we'll basically go through what it was done to add the revscoring-draftquality helmfile config, tracked in T293858.

The first step is to ask to an ML SRE (Luca or Tobias) to set up the basic usernames, namespace and helmfile private settings for the new helmfile config. As stated in the task, multiple things are needed:

  • Two Kubernetes users, revscoring-draftquality and revscoring-draftquality-deploy. They will be used to (respectively) view and deploy Kubernetes resources in the new namespace (see below).
  • A new Kubernetes namespace, revscoring-draftquality (with all the relevant security policies).
  • Private helmfile config to be able to access Swift/S3 (username and password).

The idea is to avoid to share too many things between namespaces, so that a change to one will likely not affect the others. It is the same standard used for the other Kubernetes clusters. Part of the work is done in public repositories (puppet and deployment-charts) and the rest is done in the puppet private repository. The goal is to have the following files available on the deployment host:

elukey@deploy1002:~$ ls -l /etc/helmfile-defaults/private/ml-serve_services/revscoring-draftquality/ml-serve-eqiad.yaml 
-rw-r----- 1 mwdeploy wikidev 406 Oct 20 07:45 /etc/helmfile-defaults/private/ml-serve_services/revscoring-draftquality/ml-serve-eqiad.yaml

elukey@deploy1002:~$ ls -l /etc/kubernetes/revscoring-draftquality-deploy-ml-serve-*
-rw-r----- 1 root deploy-ml-service 403 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-deploy-ml-serve-codfw.config
-rw-r----- 1 root deploy-ml-service 403 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-deploy-ml-serve-eqiad.config

elukey@deploy1002:~$ ls -l /etc/kubernetes/revscoring-draftquality-ml-serve-*
-rw-r----- 1 mwdeploy wikidev 389 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-ml-serve-codfw.config
-rw-r----- 1 mwdeploy wikidev 389 Oct 20 07:47 /etc/kubernetes/revscoring-draftquality-ml-serve-eqiad.config

Their role will be explained later on, for the moment note the fact that the revscoring-draftquality-deploy's file can only be read by deploy-ml-service POSIX users (as explained above, a user needs to be added by SRE to that group to be able to deploy to the ml-serve cluster).

At this point, you are free to start working on the new helmfile config in the deployment-charts repository!

There is a special _example_ directory that one can use when creating a new helmfile config, since it contains various placeholders to find/replace with the new config values. In our example, a simple recursive copy of _example_ to revscoring-draftquality does the job, but you are free to take other roads. You'll find two files inside the new directory:

  • helmfile.yaml
  • values.yaml

The former contains the configuration to execute deployments using Helm 3, and it is usually something that we shouldn't change, except of course the replacement of all the SERVICE_NAME placeholders (if copying from _example_). The latter is where we list the definition of all Inference Services (see section above about it).

Once you have the new directory in place, send a code review and wait for an SRE to +2/merge. After that you should be able to proceed with a regular deployment (see related section).

How to deploy

Once you have code reviewed and merged a change for deployment-charts, you'll need to jump to the current deployment node (like deploy1002.eqiad.wmnet). If you don't have access to the host, it may be due to the fact that you are not in the deploy-ml-services POSIX group. In case you are not in it, file an Access Request to the SRE team (TODO: add links, all the ML team members are already in it).

Once on the deployment node, cd to /srv/deployment-charts/helmfile.d/ml-services and choose the directory corresponding to the model that you want to deploy. The repository gets updated to its latest version by puppet, so after merging your change it may take some minutes before your change appears in the repo on the deployment node. Use git log to confirm that your change is available.

It is always good to double check that the model.bin file is on S3/Swift before proceeding:

elukey@ml-serve1001:~$ s3cmd ls s3://wmf-ml-models/goodfaith/enwiki/202105260914/
2021-10-19 14:43     10351347  s3://wmf-ml-models/goodfaith/enwiki/202105260914/model.bin

If there is no model.bin file, please do not proceed further!

At this point, you can use the helmfile command in the following way:

  • helmfile -e ml-serve-eqiad diff to see what is going to be changed if you deploy (it only display a diff, no deploy action is taken, so it is a safe command to run)
  • helmfile -e ml-serve-eqiad sync to deploy via helm your new config/code/etc..

Test your model after deployment

Once an InferenceService is deployed/changed it should become available with the HTTP Host header MODELNAME.KUBERNETES_NAMESPACE.wikimedia.org. For example, the aforementioned enwiki-goodfaith model should get the Host header enwiki-goodfaith.revscoring-editquality.wikimedia.org (note: the kubernetes namespace is equal to the name of the ml-services' subdirectory). If you want to query it via curl:

elukey@ml-serve-ctrl1001:~$ cat input.json 
{ "rev_id": 132421 }

elukey@ml-serve-ctrl1001:~$ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/enwiki-goodfaith:predict" -X POST -d @input.json -i -H "Host: enwiki-goodfaith.revscoring-editquality.wikimedia.org" --http1.1
HTTP/1.1 200 OK
content-length: 112
content-type: application/json; charset=UTF-8
date: Tue, 19 Oct 2021 12:17:28 GMT
server: istio-envoy
x-envoy-upstream-service-time: 349

{"predictions": {"prediction": true, "probability": {"false": 0.06715093098078351, "true": 0.9328490690192165}}}
real	0m0.381s
user	0m0.015s
sys	0m0.011s

If you want to inspect some kubernetes-specific settings, for example the Knative revisions and their settings, you can connect to deploy1002 and do something like:

elukey@deploy1002:~$ kube_env revscoring-editquality ml-serve-eqiad

elukey@deploy1002:~$ kubectl get pods 
NAME                                                              READY   STATUS    RESTARTS   AGE
enwiki-goodfaith-predictor-default-84n6c-deployment-656584fbrx4   2/2     Running   0          4d21h

elukey@deploy1002:~$ kubectl get revisions
NAME                                       CONFIG NAME                          K8S SERVICE NAME                           GENERATION   READY   REASON
enwiki-goodfaith-predictor-default-7sbq5   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-7sbq5   1            True    
enwiki-goodfaith-predictor-default-84n6c   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-84n6c   5            True    
enwiki-goodfaith-predictor-default-g2ffj   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-g2ffj   3            True    
enwiki-goodfaith-predictor-default-jnb8s   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-jnb8s   2            True    
enwiki-goodfaith-predictor-default-t8dkx   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-t8dkx   4            True

In case of troubles, you can always check the logs of the pods. For example, let's assume you see the following after deploying:

elukey@deploy1002:~$ kube_env revscoring-editquality ml-serve-eqiad

elukey@deploy1002:~$ kubectl get pods 
NAME  
revscoring-editquality   enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx   0/2     CrashLoopBackOff   6          10m
revscoring-editquality   enwiki-goodfaith-predictor-default-84n6c-deployment-656584fbrx4   2/2     Running            0          4d23h

If you just deployed enwiki-damaging, then something is not right. A quick sanity check could be to inspect the pod's container logs to see if anything looks weird. In this case:

elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality storage-initializer 
/usr/local/lib/python3.7/dist-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  "update your install command.", FutureWarning)
[I 211019 14:21:23 storage-initializer-entrypoint:13] Initializing, args: src_uri [s3://wmf-ml-models/damaging/enwiki/202105260914/] dest_path[ [/mnt/models]
[I 211019 14:21:23 storage:52] Copying contents of s3://wmf-ml-models/damaging/enwiki/202105260914/ to local
[I 211019 14:21:23 credentials:1102] Found credentials in environment variables.
[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models
elukey@ml-serve-ctrl1001:~$ kubectl describe pod enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality

We start from the storage-initializer since it is the first one that runs, and in this case it seems doing the right thing (namely pulling the model from s3 to local). So let's see the logs for the kserve-container:

elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality kserve-container 
Traceback (most recent call last):
  File "model-server/model.py", line 41, in <module>
    model.load()
  File "model-server/model.py", line 17, in load
    with open("/mnt/models/model.bin") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/models/model.bin'

The logs indicate that the model.bin file was not found, but the storage-initializer states that it did download it correctly. In this case the issue is sneaky, compare the following:

[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models

And the relevant code review change:

+      - name: STORAGE_URI
+        value: "s3://wmf-ml-models/damaging/enwiki/202105260914/"

And the s3 bucket list:

elukey@ml-serve1001:~$ s3cmd ls s3://wmf-ml-models/damaging/enwiki/202105260914/
[..nothing..]

The model.bin file was not uploaded in the correct S3 path, and the storage-initializer probably failed gracefully. In this case a good follow up is uploading the model via s3cmd, or to create another S3 path and redo the change.