User:Elukey/MachineLearning/kfserving

Kfserving canary releases

https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/rollout

Deploy a custom InferenceService

We are currently testing InferenceService on the ml-serve-eqiad cluster in prod, and this is the procedure to create one endpoint without using helmfile (that will come later on).

On ml-serve-ctrl1001.eqiad.wmnet:

# Replace 'YOUR-USERNAME' with your username
kubectl create namespace  YOUR-USERNAME-test

# Create a yaml file called 'inference.yaml' as following
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: allow-privileged-psp
  namespace: YOUR-USERNAME-test
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: allow-privileged-psp
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: Group
    name: system:serviceaccounts:YOUR-USERNAME-test
    namespace: YOUR-USERNAME-test

apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  annotations:
     serving.kubeflow.org/s3-endpoint: thanos-swift.discovery.wmnet
     serving.kubeflow.org/s3-usehttps: "1"
type: Opaque
stringData: # use `stringData` for raw credential string or `data` for base64 encoded string
  AWS_ACCESS_KEY_ID: mlserve:prod
  AWS_SECRET_ACCESS_KEY: REDACTED (fetch the password from your .s3cfg file on ml-serve1001:/home/yourusername/.s3cgf)
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: testYOUR-USERNAME
secrets:
- name: test-secret
---
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: enwiki-goodfaith
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    serviceAccountName: testYOUR-USERNAME
    containers:
      - name: kfserving-container
        image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:2021-07-28-204847-production
        env:
          # TODO: https://phabricator.wikimedia.org/T284091
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
          - name: INFERENCE_NAME
            value: "enwiki-goodfaith"
          - name: WIKI_URL
            value: "https://en.wikipedia.org"

# Check again the specs before applying reading the file!
kubectl apply -f inference.yaml -n YOUR-USERNAME-test

Remember a couple of things:

Be very careful when handling the swift password :)
the s3 path needs to be created beforehand, and inside the model file needs to be called 'model.bin'
replace all occurrences of YOUR-USERNAME with your shell username.
Use only internal endpoints (so no call to AWS S3 etc..)

Docs: https://github.com/kubeflow/kfserving/blob/master/docs/samples/storage/s3/README.md

General Debugging

The nsenter tool is useful to execute commands (not shipped in a Docker image, like say netstat) on a container. For example, let's say that you want to get the list of ports bound to the istiod pod for debugging.

The first step is to find where the pod is running:

elukey@ml-serve-ctrl1001:~$ kubectl get pods -o wide -n istio-system
NAME                      READY   STATUS    RESTARTS   AGE    IP             NODE                       NOMINATED NODE   READINESS GATES
istiod-6b68cc877d-6rdzm   1/1     Running   0          128m   10.64.78.135   ml-serve1002.eqiad.wmnet   <none>           <none>

The next step is to ssh to the host and run docker ps to find details about the running container:

elukey@ml-serve1002:~$ sudo docker ps | grep istiod
6750c5cd1ec1        c46c352e0461                            "/usr/bin/pilot-disc…"   2 hours ago         Up 2 hours                              [..]
83c01843255d        docker-registry.discovery.wmnet/pause   "/pause"                 2 hours ago         Up 2 hours                              [..]

Then find the PID of the container:

elukey@ml-serve1002:~$ sudo docker inspect --format '{{ .State.Pid }}' 6750c5cd1ec1
6503

And finally use nsenter:

# Note: the parameters do matter! For example, in this case I used only the -n parameter to get the network namespace (needed to get netstat info)
elukey@ml-serve1002:~$ sudo nsenter -n -t 6503 netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:9876          0.0.0.0:*               LISTEN      6503/pilot-discover 
tcp6       0      0 :::8080                 :::*                    LISTEN      6503/pilot-discover 
tcp6       0      0 :::15010                :::*                    LISTEN      6503/pilot-discover 
tcp6       0      0 :::15012                :::*                    LISTEN      6503/pilot-discover 
tcp6       0      0 :::15014                :::*                    LISTEN      6503/pilot-discover 
tcp6       0      0 :::15017                :::*                    LISTEN      6503/pilot-discover

Istio

# Inspect ingress routes and listeners
# Same commands for the cluster-local-gateway
istioctl-1.9.5 -n istio-system proxy-config route deploy/istio-ingressgateway
istioctl-1.9.5 -n istio-system proxy-config listener deploy/istio-ingressgateway

# Change Istio Gateway's log level
istioctl-1.9.5 proxy-config log istio-ingressgateway-7ffffd874b-67zkp.istio-system --level info

Specific configuration for Knative 0.18: https://github.com/knative/docs/blob/release-0.18/docs/install/installing-istio.md

Knative serving

# Needed to prevent issues when validating TLS for Dockerhub
# Useful only in testing (minikube)
kubectl -n knative-serving edit configmap config-deployment
# Add then registriesSkipTagResolving: index.docker.io

Minikube testing stack

istioctl-1.9.5 manifest apply -f deployment-charts/custom.d/istio/ml-serve/config.yaml
kubectl create namespace knative-serving
helm install deployment-charts/charts/knative-serving-crds --generate-name -n knative-serving
helm install deployment-charts/charts/knative-serving --generate-name -n knative-serving

# Edit the Knative serving's configmap for knative as explained above.

kubectl apply -f kfserving.yaml
./self-signed-ca.sh
kubectl apply -f service.yaml

Kfserving stack

This section should describe my understanding of the Kfserving stack, in particular how we deploy it at Wikimedia. First of all, the components are:

istio 1.9.5
knative-serving 0.18 (the last one supporting kubernetes 1.16)
kfserving 0.5.1

The istio setup is created via istioctl, and it is very simple:

No service mesh TLS auth/encryption
ingress gateway with predefined nodeports for 15021 (status-check port), 443 (https)
cluster-local-gateway with predefined ports for 80/8080/etc..

The above config is highlighted in the knative upstream docs. What we care about, for the moment, is having TLS between the API gateway and the kfserving LVS VIP (that should be inference-service.wikimedia.org), terminating TLS on the istio ingress gateway pods. After the ingress gateway the traffic is not encrypted.

One important bit is that this initial config takes care of L4 settings, meanwhile we'll eventually need to instruct the istio ingress gateway with L7 routing. The istioctl manifest follows the IstioOperatorSpec, that mentions only TCP ports. What takes care of the Istio L7 routing (Gateway, Routes, etc..) are the levels above, Knative serving and kfserving.

The Knative-serving install specs say that three yaml files are needed:

one with all the CRDs
one with core settings
one for net-istio settings

The net-istio ones are the important bits to configure initial L7 routing (that will be completed by Kfserving later on). Knative serving seems to need the cluster-local-gateway (in addition to the ingress gateway) to route internal traffic. This is a comment that I found the example configs:

    # A cluster local gateway to allow pods outside of the mesh to access
    # Services and Routes not exposing through an ingress.  If the users
    # do have a service mesh setup, this isn't required and can be removed.
    #
    # An example use case is when users want to use Istio without any
    # sidecar injection (like Knative's istio-ci-no-mesh.yaml).  Since every pod
    # is outside of the service mesh in that case, a cluster-local  service
    # will need to be exposed to a cluster-local gateway to be accessible.

The knative-cluster-local gateway is currently mapped to istio-cluster-local-gateway in 0.18, but it will become standalone in future version (namely we'll not need a cluster local istio definition anymore in theory). Let's explore a little bit more the net-istio.yaml config:

# This is the shared Gateway for all Knative routes to use.
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: knative-ingress-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "{{ .Chart.AppVersion }}"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

The above adds an Istio Gateway config to the ingress gateway for port 80. We replace this one with a similar config for port 443, with a TLS certificate (see helm charts) since this is the configuration between that we mentioned between API gateway and Ingress.

# A cluster local gateway to allow pods outside of the mesh to access
# Services and Routes not exposing through an ingress.  If the users
# do have a service mesh setup, this isn't required.
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cluster-local-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "{{ .Chart.AppVersion }}"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: cluster-local-gateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: knative-local-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "{{ .Chart.AppVersion }}"
    networking.knative.dev/ingress-provider: istio
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 8081
        name: http
        protocol: HTTP
      hosts:
        - "*"
---
apiVersion: v1
kind: Service
metadata:
  name: knative-local-gateway
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "{{ .Chart.AppVersion }}"
    networking.knative.dev/ingress-provider: istio
spec:
  type: ClusterIP
  selector:
    istio: ingressgateway
  ports:
    - name: http2
      port: 80
      targetPort: 8081

The above is a little more cryptic, but it should work like this:

A Gateway for port 80 is added to the cluster-local-gateway config (see selector, this is different from the previous one that was for the ingress gateway)
A Gateway for port 8081 is added to the ingress gateway config (but it is not exposed via any NodePort service, since we didn't specifity it in istioctl's manifest)
A regular Kubernetes Service is added to create a ClusterIP that maps its port 80 to Istio Ingress Gateway's 8081 port (see selector).

My understanding is that the cluster local gateway is mapped to the ingress gateway with the Service acting as "bridge". Pods communicate first via cluster local gateway, that than uses port 8081 on the ingress gateway.

Last but not the least, kfserving. This layer is responsible to instruct the Ingress about target backends, as soon as new services are added.