Machine Learning/LiftWing/KServe/DeployLocal
Note: This is a guide that installs kserve 0.10 in a local minikube kubernetes cluster that follows the instructions from the official documentation to install kserve in RawDeployment mode that doesn't need knative (and cannot scale down to zero).
Recommended Version Matrix
Kubernetes Version | Recommended Istio Version |
---|---|
1.22 | 1.11, 1.12 |
1.23 | 1.12, 1.13 |
1.24 | 1.13, 1.14 |
1.25 | 1.15, 1.16 |
1. Install minikube and start a cluster
Install minikube following the instructions from the official webpage.
Start a minikube cluster using k8s v1.23 (the memory and cpu arguments can be adjusted accordingly to suit the users requirements)
minikube --memory 8192 --cpus 2 --kubernetes-version=v1.23.16 start
2. Install istio
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.13.0 TARGET_ARCH=x86_64 sh -
cd istio-1.13.0
export PATH=$PWD/bin:$PATH
istioctl install
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: istio
spec:
controller: istio.io/ingress-controller
EOF
3. Install Cert Manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml
4. Install KServe
Install kserve CRDs.
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.10.0/kserve.yaml
Install kserve runtimes for prepackaged model servers (e.g. sklearn, torch, tensorflow etc.)
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.10.0/kserve-runtimes.yaml
We change the defaultDeployment
in the inferenceservice-config to RawDeployment
kubectl patch configmap/inferenceservice-config -n kserve --type=strategic -p '{"data": {"deploy": "{\"defaultDeploymentMode\": \"RawDeployment\"}"}}'
In the case that we use an IngressClass other than istio we need to change the IngressClassName in the inferenceservice-config to the corresponding name. In our case we use istio so there is no need to change anything.
5. Deploy first InferenceService and access our app
The easiest way to setup networking for our local cluster is to run the following command in a separate terminal:
minikube tunnel
This will allow us to communicate with our minikube cluster by acessing its load balancer through the external IP which will be set our localhost. To deploy an example InferenceService we use a pretrained sklearn model server.
kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
model:
args: ["--enable_docs_url=True"]
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
Extract the SERVICE_HOSTNAME
, INGRESS_HOST
and INGRESS_PORT
in order to communicate with the cluster
export SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
Create a file with input samples:
cat <<EOF > "./iris-input.json"
{
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
EOF
Extract the SERVICE_HOSTNAME
and get your predictions using the REST endpoint:
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./iris-input.json
6. Install Minio (model storage)
Create a file called minio.yaml and install the minio test instance to your cluster:
kubectl apply -f minio.yaml
Install the Minio client (mc):
curl -LJ0 https://dl.min.io/client/mc/release/linux-amd64/mc > mc
chmod +x mc
mc --help
In a different terminal window, port-forward our minio test app:
# Run port forwarding command in a different terminal
kubectl port-forward $(kubectl get pod --selector="app=minio" --output jsonpath='{.items[0].metadata.name}') 9000:9000
Add our test instance and create a bucket for model storage:
mc config host add myminio http://127.0.0.1:9000 minio minio123
mc mb myminio/wmf-ml-models
Create a s3-secret.yaml for minio and attach it to a service account:
kubectl apply -f s3-secret.yaml
8. Deploy enwiki-goodfaith and run a prediction
Upload enwiki-goodfaith model binary file:
mc cp model.bin myminio/wmf-ml-models/enwiki-goodfaith/
Create an enwiki-goodfaith.yaml and apply the yaml to deploy the InferenceService on KServe:
kubectl apply -f enwiki-goodfaith.yaml
Same as the Step 5, run minikube tunnel
to setup networking for our local cluster, and export env vars of the SERVICE_HOSTNAME
, INGRESS_HOST
and INGRESS_PORT
.
Create a file with an input sample:
cat <<EOF > "./input.json"
{ "rev_id": 1145145653 }
EOF
Run a prediction:
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/enwiki-goodfaith:predict -d @./input.json
Expected Output:
* Trying 127.0.0.1:80...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> POST /v1/models/enwiki-goodfaith:predict HTTP/1.1
> Host: enwiki-goodfaith-default.example.com
> User-Agent: curl/7.86.0
> Accept: */*
> Content-Length: 24
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< date: Fri, 17 Mar 2023 15:21:50 GMT
< server: istio-envoy
< content-length: 194
< content-type: application/json
< x-envoy-upstream-service-time: 7055
<
* Connection #0 to host 127.0.0.1 left intact
{"enwiki":{"models":{"goodfaith":{"version":"0.5.1"}},"scores":{"1145145653":{"goodfaith":{"score":{"prediction":true,"probability":{"false":0.021491526258609506,"true":0.9785084737413905}}}}}}}%