User:AChou-WMF/LiftWing
Summary
This is a guide on deploying an ML model as an Inference Service (isvc) on Lift Wing. As an example, we will be creating an NSFW model inference service.
Prerequisites
A prerequisite for the guide is that we have loaded the NSFW model as a KServe custom inference service in a local Docker container (done in T313526). Therefore, we have a basic model serving code model.py and the dependencies file requirements.txt.
Make sure you have access to the following hosts:
- ml-sandbox - ml-sandbox.machine-learning.eqiad1.wikimedia.cloud
- deployment server - like deploy1002.eqiad.wmnet
- stat100x machine - like stat1007.eqiad.wmnet
Repositories
We will submit code changes to the following repositories:
- liftwing/inference-services - the monorepo that the ML team uses to store the inference service code.
- integration/config - the Wikimedia configuration for Jenkins.
- deployment-charts - the Wikimedia Helm Charts uses in developing software and deploying it to production.
Clone the repositories with commit-msg hook from Gerrit.
Production Image Development
Blubberfile
Blubber is an abstraction for container build configurations, used by Wikimedia CI to publish production-ready Docker images. We need to develop a Blubberfile that generates a Dockerfile to build an image that can be run in production.
Here is a Bubberfile for serving the NSFW model.
version: v4
base: docker-registry.wikimedia.org/buster:20220807
runs:
insecurely: true
lives:
in: /srv/nsfw-model
variants:
build:
python:
version: python3
requirements: [nsfw-model/model-server/requirements.txt]
apt:
packages:
- python3-pip
builder:
command: ["rm -rf /var/cache/apk/*"]
production:
copies:
- from: local
source: nsfw-model/model-server
destination: model-server
- from: build
source: /opt/lib/python/site-packages
destination: /opt/lib/python/site-packages
apt:
packages:
- python3
- python3-distutils
python:
version: python3
use-system-flag: false
entrypoint: ["python3", "model-server/model.py"]
test:
apt:
packages:
- python3-pip
copies:
- from: local
source: nsfw-model/model-server
destination: model-server
entrypoint: ["tox", "-c", "model-server/tox.ini"]
python:
version: python3
use-system-flag: false
requirements: [nsfw-model/model-server/requirements-test.txt]
tutorial 1, tutorial 2..
Please check out this awesome tutorial to learn how to create your own Blubberfile!
Build a Image
To build the Docker image, use the following command:
blubber .pipeline/nsfw/blubber.yaml production | docker build -t aiko/nsfw-model:1 --file - .
I push the image to the Docker Hub, so it can be used in ML-Sandbox later.
docker push aiko/nsfw-model:1
Testing your Image in ML-Sandbox
Upload a model to Minio
Minio is a model storage we use in ML-sandbox. Before uploading a model, open a separate terminal, expose the minio outside of minikube:
aikochou@ml-sandbox:~$ kubectl port-forward $(kubectl get pod -n kserve-test --selector="app=minio" --output jsonpath='{.items[0].metadata.name}') 9000:9000 -n kserve-test
To upload the model, use the following command:
aikochou@ml-sandbox:~$ mc cp model.h5 myminio/wmf-ml-models/nsfw-model/
Create an Inference Service
We need a nsfw-service.yaml to create an Inference Service:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: nsfw-model
annotations:
sidecar.istio.io/inject: "true"
spec:
predictor:
serviceAccountName: sa
containers:
- name: kserve-container
image: aiko/nsfw-model:1
env:
- name: STORAGE_URI
value: "s3://wmf-ml-models/nsfw-model/"
It defines the container image to "aiko/nsfw-model:1" that we generated from the Blubberfile and points the storage uri to the location where the model is stored. Apply the CRD:
aikochou@ml-sandbox:~$ kubectl apply -f nsfw-service.yaml
Check if the inference service is up running:
aikochou@ml-sandbox:~$ kubectl get pod -n kserve-test
NAME READY STATUS RESTARTS AGE
minio-fbbf6dfb8-p65fr 1/1 Running 0 16d
nsfw-model-predictor-default-cl72b-deployment-9585657df-kk65x 2/2 Running 0 7d8h
Run a prediction
We use a test.sh script that sets model name, ingress host and port, service host name, and uses curl to query the inference service. A test sample input_nsfw.json needs to be in the directory as well.
MODEL_NAME="nsfw-model"
INGRESS_HOST=$(minikube ip)
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
SERVICE_HOSTNAME=$(kubectl get isvc ${MODEL_NAME} -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./input_nsfw.json --http1.1
Run the test script:
aikochou@ml-sandbox:~$ sh test.sh
...
{"prob_nsfw": 0.9999992847442627, "prob_sfw": 7.475603638340544e-07}
In the process of development, we may modify model.py or Blubberfile for various reasons (e.g. adding a missing package). As a result, we will repeat the above steps: rebuild an image, apply the CRD, create an inference service, run a prediction. Developing too many docker images may result in insufficient space in the ML-Sandbox. When it happens, you can use the following commands to clean up images:
aikochou@ml-sandbox:~$ minikube ssh
Last login: Tue Aug 9 14:38:49 2022 from 192.168.49.1
docker@minikube:~$ docker image ls
docker@minikube:~$ docker image rm <image you want to delete>
Delete the Inference Service after testing:
aikochou@ml-sandbox:~$ kubectl delete -f nsfw-service.yaml
Pipelines
Once you are happy with the image generated from the Blubberfile, it is time to configure the pipeline to build the image, run the tests, and publish the production-ready image.
In our inference-services repo, we need to add two pipelines in .pipeline/config.yaml:
nsfw:
stages:
- name: run-test
build: test
run: true
- name: production
build: production
nsfw-publish:
blubberfile: nsfw/blubber.yaml
stages:
- name: publish
build: production
publish:
image:
name: '${setup.project}-nsfw'
tags: [stable]
Switch to integration/config repo, we need to define the jobs and set triggers in the Jenkins job builder spec for the new service. Search for "machinelearning/liftwing/inference-services" in the files and follow the pattern to add new entries. It is basically copy/paste existing inference-services configs for the new Inference Service image.
- jjb/project-pipelines.yaml
- project:
# machinelearning/liftwing/inference-services
name: inference-services
pipeline:
...
- nsfw
- nsfw-publish
jobs:
...
# trigger-inference-services-pipeline-nsfw
# trigger-inference-services-pipeline-nsfw-publish
...
# inference-services-pipeline-nsfw
# inference-services-pipeline-nsfw-publish
- zuul/layout.yaml
# machinelearning/liftwing/inference-services holds several projects each
# having at least two pipelines. We thus need files based filtering and a
# meta job to cover all the pipelines variants.
...
- name: ^trigger-inference-services-pipeline-nsfw
files:
- '.pipeline/nsfw/blubber.yaml'
- '.pipeline/config.yaml'
- 'nsfw-model/model-server/.*'
...
# When adding a new sub project, make sure to add a job filter above in the
# job section to have the job only trigger for the directory holding the
# project in the repository.
- name: machinelearning/liftwing/inference-services
test:
...
- trigger-inference-services-pipeline-nsfw
gate-and-submit:
...
- trigger-inference-services-pipeline-nsfw
postmerge:
...
- trigger-inference-services-pipeline-nsfw-publish
When you are done editing, you can commit your code and create a patchset for the repo. Here are the changes we have made so far:
https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/822046
https://gerrit.wikimedia.org/r/c/integration/config/+/822052
Once your code get reviewed and merged, you will see the PipelineBot comments on the patch with a pointer to the new image and tags it made, like:
Wikimedia Pipeline
Image BuildSUCCESS
IMAGE:
docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-topic:2022-08-11-085125-publish
TAGS:
2022-08-11-085125-publish, stable
Deployment
Upload a model to Swift
We store model files that used in production in Swift, which is an open-source s3-compatible object store that is widely-used across the WMF.
To upload the model, jump to a stat100x host and use a tool called model_upload:
aikochou@stat1007:~$ model_upload model.h5 experimental nsfw wmf-ml-models
Check if the upload is successful:
aikochou@stat1007:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls -r s3://wmf-ml-models/experimental/nsfw/
2022-08-11 08:28 70393536 s3://wmf-ml-models/experimental/nsfw/20220811082819/model.h5
Helmfile
https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/822326
Since the NSFW is a new model, ML SRE will set up a new helmfile/namespace config in the deployment-charts repo. Most of the times the value.yaml is the one you want to modify.
- value.yaml
...
inference:
annotations:
sidecar.istio.io/inject: "true"
predictor:
image: "machinelearning-liftwing-inference-services-nsfw"
version: "2022-08-11-085124-publish"
base_env:
- name: STORAGE_URI
value: "s3://wmf-ml-models/experimental/nsfw/20220811082819/"
inference_services:
- name: "nsfw-model"
- values-ml-staging-codfw.yaml
elukey: The helmfile config picks up the values.yaml file first, then the staging one, so unless you specifically override things in the staging yaml nothing will be picked up from.
(if you check helmfile.yaml in the experimental dir of deployment charts at line 22 "values" will explain what I am saying)
(values are picked up from top to bottom)
Deploy
Machine Learning/LiftWing/Deploy#How to deploy
Test the model after deployment
Machine Learning/LiftWing/Deploy#Test your model after deployment