Machine Learning/LiftWing/KServe/DeployLocal

From Wikitech

Note: This is a guide that installs kserve 0.10 in a local minikube kubernetes cluster that follows the instructions from the official documentation to install kserve in RawDeployment mode that doesn't need knative (and cannot scale down to zero).

Recommended Version Matrix

Kubernetes Version Recommended Istio Version
1.22 1.11, 1.12
1.23 1.12, 1.13
1.24 1.13, 1.14
1.25 1.15, 1.16

1. Install minikube and start a cluster

Install minikube following the instructions from the official webpage.

Start a minikube cluster using k8s v1.23 (the memory and cpu arguments can be adjusted accordingly to suit the users requirements)

minikube --memory 8192 --cpus 2 --kubernetes-version=v1.23.16 start

2. Install istio

curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.13.0 TARGET_ARCH=x86_64 sh -
cd istio-1.13.0
export PATH=$PWD/bin:$PATH
istioctl install
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: istio
spec:
  controller: istio.io/ingress-controller
EOF

3. Install Cert Manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml

4. Install KServe

Install kserve CRDs.

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.10.0/kserve.yaml

Install kserve runtimes for prepackaged model servers (e.g. sklearn, torch, tensorflow etc.)

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.10.0/kserve-runtimes.yaml

We change the defaultDeployment in the inferenceservice-config to RawDeployment

kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"deploy": "{\"defaultDeploymentMode\": \"RawDeployment\"}"}}'

In the case that we use an IngressClass other than istio we need to change the IngressClassName in the inferenceservice-config to the corresponding name. In our case we use istio so there is no need to change anything.

5. Deploy first InferenceService and access our app

The easiest way to setup networking for our local cluster is to run the following command in a separate terminal:

minikube tunnel

This will allow us to communicate with our minikube cluster by acessing its load balancer through the external IP which will be set our localhost. To deploy an example InferenceService we use a pretrained sklearn model server.

kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:

      args: ["--enable_docs_url=True"]

      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

Extract the SERVICE_HOSTNAME, INGRESS_HOST and INGRESS_PORT in order to communicate with the cluster

export SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')

Create a file with input samples:

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

Extract the SERVICE_HOSTNAME and get your predictions using the REST endpoint:

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./iris-input.json

6. Install Minio (model storage)

Create a file called minio.yaml and install the minio test instance to your cluster:

kubectl apply -f minio.yaml

Install the Minio client (mc):

curl -LJ0 https://dl.min.io/client/mc/release/linux-amd64/mc > mc
chmod +x mc
mc --help

In a different terminal window, port-forward our minio test app:

# Run port forwarding command in a different terminal
kubectl port-forward $(kubectl get pod --selector="app=minio" --output jsonpath='{.items[0].metadata.name}') 9000:9000

Add our test instance and create a bucket for model storage:

mc config host add myminio http://127.0.0.1:9000 minio minio123
mc mb myminio/wmf-ml-models

Create a s3-secret.yaml for minio and attach it to a service account:

kubectl apply -f s3-secret.yaml

8. Deploy enwiki-goodfaith and run a prediction

Upload enwiki-goodfaith model binary file:

mc cp model.bin myminio/wmf-ml-models/enwiki-goodfaith/

Create an enwiki-goodfaith.yaml and apply the yaml to deploy the InferenceService on KServe:

kubectl apply -f enwiki-goodfaith.yaml

Same as the Step 5, run minikube tunnel to setup networking for our local cluster, and export env vars of the SERVICE_HOSTNAME, INGRESS_HOST and INGRESS_PORT. Create a file with an input sample:

cat <<EOF > "./input.json"
{ "rev_id": 1145145653 }
EOF

Run a prediction:

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/enwiki-goodfaith:predict -d @./input.json

Expected Output:

*   Trying 127.0.0.1:80...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> POST /v1/models/enwiki-goodfaith:predict HTTP/1.1
> Host: enwiki-goodfaith-default.example.com
> User-Agent: curl/7.86.0
> Accept: */*
> Content-Length: 24
> Content-Type: application/x-www-form-urlencoded
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Fri, 17 Mar 2023 15:21:50 GMT
< server: istio-envoy
< content-length: 194
< content-type: application/json
< x-envoy-upstream-service-time: 7055
< 
* Connection #0 to host 127.0.0.1 left intact
{"enwiki":{"models":{"goodfaith":{"version":"0.5.1"}},"scores":{"1145145653":{"goodfaith":{"score":{"prediction":true,"probability":{"false":0.021491526258609506,"true":0.9785084737413905}}}}}}}%