Machine Learning/LiftWing/ML-Sandbox/Configuration
Installation + Configuration script for ML-Sandbox.
Summary
This is a guide for installing the KServe stack locally using WMF tools and images. The install steps diverge from the official KServe quick_install script in order to run on WMF infrastructure. All upstream changes to YAML configs were first published in the KServe chart’s README for the deployment-charts repository. In deployment-charts/custom_deploy.d/istio/ml-serve there is the config.yaml that we apply in production.
Software pre-requisites
Before we set up our local cluster, we need to locally install: Minikube, kubectl, Helm, Istioctl, Minio, s3cmd.
Install on MacOS
# Install current version of Istioctl
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.15.7 TARGET_ARCH=arm64 sh -
# You can install remaining software using homebrew
brew install minikube
brew install kubectl
brew install helm
brew install s3cmd
brew install minio/stable/mc
Install on Linux
Many software packages are available inside WMF APT repository.
# Install Minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
# Install packages from our APT repository
sudo apt install helm
sudo apt install istioctl -y
To install remaining software (kubectl, Minio, s3cmd), please follow the documentation.
Start Minikube cluster
Start the cluster matching our production Kubernetes version:
minikube start --kubernetes-version=v1.23.14
Install Istio operator on the cluster
Istio namespace
Run the command below in your terminal to create Istio namespace:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: istio-system
labels:
istio-injection: disabled
EOF
Istio operator
To install the operator, first create the file `istio-minimal-operator.yaml` with the following manifest:
apiVersion: install.istio.io/v1beta1
kind: IstioOperator
spec:
values:
global:
proxy:
autoInject: disabled
useMCP: false
# The third-party-jwt is not enabled on all k8s.
# See: https://istio.io/docs/ops/best-practices/security/#configure-third-party-service-account-tokens
jwtPolicy: first-party-jwt
meshConfig:
accessLogFile: /dev/stdout
addonComponents:
pilot:
enabled: true
components:
ingressGateways:
- name: istio-ingressgateway
enabled: true
- name: cluster-local-gateway
enabled: true
label:
istio: cluster-local-gateway
app: cluster-local-gateway
k8s:
service:
type: ClusterIP
ports:
- port: 15020
targetPort: 15021
name: status-port
- port: 80
name: http2
targetPort: 8080
- port: 443
name: https
targetPort: 8443
Next you can apply the manifest using istioctl:
istioctl manifest apply -f istio-minimal-operator.yaml -y
Clone deployment charts
To deploy Knative and Kserve, we'll use their charts in our deployment charts repository.
git clone "https://gerrit.wikimedia.org/r/operations/deployment-charts"
Install Calico NetworkPolicy CRDs
We're using NetworkPolicy CRDs from Calico in Knative and Kserve. First, make sure to install those CRDs:
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/crds.yaml
Deploy Knative
Available Images
To learn more about the newest available Knative images, you can check the docker registry:
- Webhook: https://docker-registry.wikimedia.org/knative-serving-webhook/tags/
- Queue: https://docker-registry.wikimedia.org/knative-serving-queue/tags/
- Controller: https://docker-registry.wikimedia.org/knative-serving-controller/tags/
- Autoscaler: https://docker-registry.wikimedia.org/knative-serving-autoscaler/tags/
- Activator: https://docker-registry.wikimedia.org/knative-serving-activator/tags/
- Net-istio webhook: https://docker-registry.wikimedia.org/knative-net-istio-webhook/tags/
- Net-istio controller: https://docker-registry.wikimedia.org/knative-net-istio-controller/tags/
Create knative-serving namespace
First, let’s create a namespace for knative-serving:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: knative-serving
labels:
serving.knative.dev/release: "v1.7.2"
EOF
Deploy Knative charts
First, let's deploy the CRDs:
helm install deployment-charts/charts/knative-serving-crds knative-serving-crds
Next, you can install the serving chart:
helm install deployment-charts/charts/knative-serving knative-serving
Next we need to add registries skipping tag resolving etc.:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: config-deployment
namespace: knative-serving
data:
queueSidecarImage: docker-registry.wikimedia.org/knative-serving-queue:1.7.2-7
registriesSkippingTagResolving: "kind.local,ko.local,dev.local,docker-registry.wikimedia.org,index.docker.io"
EOF
Deploy KServe
Images
- KServe agent: https://docker-registry.wikimedia.org/kserve-agent/tags/
- Kserve controller: https://docker-registry.wikimedia.org/kserve-controller/tags/
- KServe storage-initializer: https://docker-registry.wikimedia.org/kserve-storage-initializer/tags/
Create kserve namespace
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
labels:
control-plane: kserve-controller-manager
controller-tools.k8s.io: "1.0"
istio-injection: disabled
name: kserve
EOF
Deploy Kserve charts
First, let's install the base kserve chart:
helm install deployment-charts/charts/kserve kserve
Next, we can install the kserve-inference chart:
helm install deployment-charts/charts/kserve-inference kserve-inference
Install self-signed certificates
We have everything needed to run kserve, however we still need to deal with tls certificate. We will use the self-signed-ca available in the kserve repo: https://raw.githubusercontent.com/kserve/kserve/v0.15.2/hack/self-signed-ca.sh
First, delete the existing secrets:
kubectl delete secret kserve-webhook-server-cert -n kserve
kubectl delete secret kserve-webhook-server-secret -n kserve
Now copy the self-signed script and execute it:
curl -LJ0 https://raw.githubusercontent.com/kserve/kserve/v0.11.2/hack/self-signed-ca.sh > self-signed-ca.sh
chmod +x self-signed-ca.sh
./self-signed-ca.sh
Deploy kserve-test namespace
Now, we can create namespace where we will deploy our services:
kubectl create namespace kserve-test
Deploy Minio
This is an optional step for using minio for model storage in your development cluster. In Production, we us Thanos Swift to store our model binaries, however, we can use something more adhoc for local dev.
This will mostly follow the document here: https://github.com/kserve/website/blob/main/docs/modelserving/kafka/kafka.md
Create Minio Service
First we create a file called minio.yaml, with the following contents:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: minio
name: minio
namespace: kserve-test
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: minio
strategy:
type: Recreate
template:
metadata:
labels:
app: minio
spec:
containers:
- args:
- server
- /data
env:
- name: MINIO_ACCESS_KEY
value: minio
- name: MINIO_SECRET_KEY
value: minio123
image: minio/minio:RELEASE.2020-10-18T21-54-12Z
imagePullPolicy: IfNotPresent
name: minio
ports:
- containerPort: 9000
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
labels:
app: minio
name: minio-service
spec:
ports:
- port: 9000
protocol: TCP
targetPort: 9000
selector:
app: minio
type: ClusterIP
Now, you can install the minio test instance to your cluster:
kubectl apply -f minio.yaml -n kserve-test
Deploy Secrets to interact with Minio
Now we need to create an s3 secret for minio and attach it to a service account. Create a file `s3-secret.yaml` with the following contents:
apiVersion: v1
kind: Secret
metadata:
name: storage-secret
annotations:
serving.kserve.io/s3-endpoint: minio-service.kserve-test:9000 # replace with your s3 endpoint
serving.kserve.io/s3-usehttps: "0" # by default 1, for testing with minio you need to set to 0
serving.kserve.io/s3-verifyssl: "0"
serving.kserve.io/s3-region: us-east-1
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: minio
AWS_SECRET_ACCESS_KEY: minio123
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa
secrets:
- name: storage-secret
---
and we can apply it as follows:
kubectl apply -f s3-secret.yaml -n kserve-test
Create storage bucket in Minio
First, we need to port-forward our minio test app in a different terminal window
# Run port forwarding command in a different terminal
kubectl port-forward $(kubectl get pod -n kserve-test --selector="app=minio" --output jsonpath='{.items[0].metadata.name}') 9000:9000 -n kserve-test
Now lets add our test instance and create a bucket for model storage
mc alias set myminio http://127.0.0.1:9000 minio minio123
mc mb myminio/wmf-ml-models
Upload model to Minio bucket
Upload manually
You should be able to upload a model binary file as follows:
mc cp model.bin myminio/wmf-ml-models/
Upload via model-upload script
You can use the modelupload.sh script to handle model uploads to minio. First you need to create a s3cmd config file called ~/.s3cfg:
# Setup endpoint
host_base = 127.0.0.1:9000
host_bucket = 127.0.01:9000
bucket_location = us-east-1
use_https = False
# Setup access keys
access_key = minio
secret_key = minio123
# Enable S3 v4 signature APIs
signature_v2 = False
Now you can download the modelupload script and use in on the ml-sandbox:
curl -LJ0 https://gitlab.wikimedia.org/accraze/ml-utils/-/raw/main/model_upload.sh > model_upload.sh
chmod +x model_upload.sh
./model_upload.sh model.bin articlequality enwiki wmf-ml-models ~/.s3cfg
Deploy InferenceService
Finally, when you create an Inference service, you can point it at the new minio bucket (s3://wmf-ml-models), just make sure to add the serviceAccountName “sa” to the container that has a storage uri.
Example Inference Service spec:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: enwiki-goodfaith
annotations:
sidecar.istio.io/inject: "false"
spec:
predictor:
serviceAccountName: sa
containers:
- name: kfserving-container
image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:2021-07-28-204847-production
env:
# TODO: https://phabricator.wikimedia.org/T284091
- name: STORAGE_URI
value: "s3://wmf-ml-models/"
- name: INFERENCE_NAME
value: "enwiki-goodfaith"
Upgrade KServe
Before starting, please check https://github.com/kserve/kserve/releases and familiarize with the changes in the new version.
Upgrading KServe is a process that involves two macro parts:
- Upgrade all isvc Docker images in the
inference-servicesrepository. This can be done as first step, in a totally friendly and relaxed rolling upgrade of everykservepackage installed by pip. - Upgrade the K8s control plane, namely the Go controllers that extend the K8s API to support
InferenceServicecustom resources (and related).
The first step is the easiest but very tedious, since upgrading kserve often means a little bit of Python dependency hell in our isvc's requirements.txt. Pay particular attention if the Python version needs to be bumped, and if possible couple the upgrade with an Operating System bump (via Blubber config). The OS bump is not mandatory but coupling the two seems nice and easy, unless there is some major reason/breaking-change that tells otherway.
The second step has some sub-steps:
- Upgrade the KServe Docker images in the
production-imagesrepository. - Build and release the new images to the Docker Registry. Please see Kubernetes/Images#Image building for more info (SRE only).
- Upgrade the
kserveHelm chart in thedeployment-chartsrepository. The new config can be retrieved in various places, but the easiest is to look into Github's released files for the new Kserve version (like https://github.com/kserve/kserve/releases/tag/v0.12.0, bottom of the page). The new yaml is usually really huge, something like 20k lines, and a line-by-line comparison is something unbearable for any human being. The procedure that we have used so far is the following:- Use an editor that allows to place two yaml files one near the other, and also that allows to collapse yaml bits if required.
- Compare the current kserve.yaml file in
deployment-chartswith the new one downloaded from Github. - Find all the occurrences of
schemain the new yaml file, and collapse them (we don't need to customize them, since it is mostly related to webhooks). Now the things to compare are way less and more manageable :) - Check all occurrences of custom values that we add in
deployment-charts'kserve.yaml,for example the ones surrounded by{{ etc.. }}, and replicate them in the new kserve.yaml file. - Read the README file in
deployment-charts'kservechart, since it lists a series of customizations that we applied over time. - When you are done, copy the new
kserve.yamlfile over thedeployment-charts' one, bump the chart's version and check the diff in Gerrit. It will be pretty easy now to compare old and new one, if you missed anything or touched/modified bits inadvertently. - Merge the change once it gets reviewed, and prepare to deploy :)
- Deploy the new chart to
ml-staging-codfw, and check various bits:- Deleting an isvc pod should work fine (and the new
storage-initializer's image should work). - You shouldn't see error logs (or any horror related to yaml parsing etc..) in the KServe's control plane nodes (
kservenamespace).
- Deleting an isvc pod should work fine (and the new
- Finally deploy to prod!
At this point the task is completed! Note for the reader - in the future we may want to use the upstream's kserve chart config, even if I am not 100% sure if it simplifies the above or not (since we'll have to apply customizations anyway).
Delete cluster
Sometimes you might need to destroy the cluster and rebuild. Here is a helpful command:
minikube delete --purge --all
minikube start --kubernetes-version=v1.16.15 --cpus 4 --memory 8192 --driver=docker --force