ML-Sandbox

A development cluster running the WMF KServe stack on Cloud VPS.

Hosts

eqiad

ml-sandbox.machine-learning.eqiad1.wikimedia.cloud

Deploy

We have developed model serving code and a Blubberfile, tested locally with Docker. However, we still want to deploy the service to a production-like K8s environment. For this purpose, we use ml-sandbox, a small cluster using minikube with the WMF KServe stack installed.

Let's assume we want to deploy a NSFW model https://phabricator.wikimedia.org/T313526 on ml-sandbox, and we have built an image using Blubber locally and pushed the image to the Docker Hub.

Upload a model to Minio

We have been using Minio for model storage on the ml-sandbox cluster.

In separate terminal, ssh to ml-sandbox and do:

aikochou@ml-sandbox:~$ kubectl port-forward $(kubectl get pod -n kserve-test --selector="app=minio" --output jsonpath='{.items[0].metadata.name}') 9000:9000 -n kserve-test

This will expose minio outside of minikube so we can use the model_upload script and/or minio client to store model files. In another terminal, try uploading a model using the minio client (mc):

aikochou@ml-sandbox:~$ mc cp model.h5 myminio/wmf-ml-models/nsfw-model/

Confirm that the object is available in minio:

aikochou@ml-sandbox:~$ mc ls myminio -r
[2022-08-09 20:19:45 UTC]  67MiB STANDARD wmf-ml-models/nsfw-model/model.h5

The kserve storage-initializer is configured to pull from our minio instance when loading a model for an Inference Service.

Create an InferenceService

We'll need a yaml file to create an Inference Service:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: nsfw-model
  annotations:
    sidecar.istio.io/inject: "true"
spec:
  predictor:
    serviceAccountName: sa
    containers:
      - name: kserve-container
        image: {docker-username}/{image-name-you-use}:{some-tag}
        env:
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/nsfw-model/"

In the nsfw-isvc.yaml file edit the container image and replace it with the image on your Docker Hub. Apply the CRD:

aikochou@ml-sandbox:~$ kubectl apply -f nsfw-service.yaml -n kserve-test

Check if the inference service is up running:

aikochou@ml-sandbox:~$ kubectl get pod -n kserve-test
NAME                                                            READY   STATUS    RESTARTS   AGE
minio-fbbf6dfb8-p65fr                                           1/1     Running   0          16d
nsfw-model-predictor-default-cl72b-deployment-9585657df-kk65x   2/2     Running   0          7d8h

Run a prediction

We use a test.sh script that sets model name, ingress host and port, service host name, and uses curl to query the service we deployed in the previous step.

MODEL_NAME="nsfw-model"
INGRESS_HOST=$(minikube ip)
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
SERVICE_HOSTNAME=$(kubectl get isvc ${MODEL_NAME} -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./input_nsfw.json --http1.1

You'll need a test sample input_nsfw.json in the directory as well. Run the test script:

aikochou@ml-sandbox:~$ sh test.sh
...
{"prob_nsfw": 0.9999992847442627, "prob_sfw": 7.475603638340544e-07}

Great! It returns the output that we expected. If you want to delete the Inference Service after testing, run:

aikochou@ml-sandbox:~$ kubectl delete -f nsfw-service.yaml -n kserve-test

Clean up Images

If you load too many images, the ML-Sandbox may run out of space.

aikochou@ml-sandbox:~$ minikube status
minikube
type: Control Plane
host: InsufficientStorage
kubelet: Running
apiserver: Running
kubeconfig: Configured

When it happens, use the following commands to clean up images:

aikochou@ml-sandbox:~$ minikube ssh
Last login: Tue Aug  9 14:38:49 2022 from 192.168.49.1
docker@minikube:~$ docker image ls
docker@minikube:~$ docker image rm <image you want to delete>

Configuration

The WMF KServe stack is running via minikube with images available in the WMF Docker Registry. There is a guide and install script that should help recreate the development cluster.