Event Platform/EventGate/Administration

From Wikitech
Jump to navigation Jump to search

EventGate is deployed using the WMF Helm & Kubernetes deployment pipeline. This page will describe how to build and deploy EventGate services as well as document how to administer and debug EventGate in beta and production.

Our deployments of EventGate are done using the eventgate-wikimedia repository. This is an npm module that implements a WMF specific EventGate factory, and specifies EventGate as a dependency. It launches service-runner via the EventGate module with config provided here that sets eventgate_factory_module to eventgate-wikimedia.js.

Mediawiki Vagrant development

See: Event_Platform/EventGate#Development_in_Mediawiki_Vagrant

Beta / deployment-prep

Since we deploy Docker images to Kubernetes in production, we want to run these same images in beta. This is done by including the role::beta::docker_services class on a deployment-prep node via the Horizon Puppet Configuration interface. The configuration of the service and image is done by editing Hiera config in the same Horizon interface. deployment-eventgate-3 is a good example. The EventBus Mediawiki extension in beta is configured with $wgEventServices that point to these instances.


Primary documentation for Kubernetes Deployments is here: Deployments_on_kubernetes

Production deployments of EventGate use WMF's Service Deployment Pipeline. Deploying new code and configuration to this pipeline currently has several steps. You should first be familiar with the various technologies phases of this pipeline. Here's some reading material for ya!

  • Deployment pipeline
  • Deployment Pipeline Design (AKA Streamlined Service Delivery Design)
  • Blubber - Dockerfile generator, ensures consistent Docker images.
  • Helm - Manages deployment releases to Kubernetes clusters. Helm Charts describe e.g. Docker images and versions, service config templating, automated monitoring, metrics and logging, service replica scaling, etc.
  • Kubernetes - Containerized cloud clusters made of of 'pods'. Each pod can run multiple containers.

Deployment Pipeline Overview

Here's a general overview of how a code and then a Helm chart change in EventGate makes it to production. Code changes require Docker image rebuilds, and eventgate Helm chart changes require a new chart version and release upgrade.

Each EventGate service is deployed via the same eventgate Helm chart. Each service runs in its own Kubernetes namespace and has a distinct release name. The services are configured and deployed using helmfile custom values files and commands.

Current services (as of 2020-03)

  • eventgate-main - Produces lower volume 'production' events to Kafka main-* clusters.
  • eventgate-analytics - Produces high volume 'analytics' events to Kafka jumbo-eqiad cluster.
  • eventgate-analytics-external - Produces medium volume client side 'analytics' events to Kafka jumbo-eqiad cluster.
  • eventgate-logging-external - Produces client side error logs to the Kafka logging-* cluster for use in logstash.

In case you get confused, here are the Helm and Kubernetes terms for the eventgate-analytics service:

  • main app (service) name: eventgate-analytics
  • docker image name: eventgate-wikimedia (built from the eventgate-wikimedia gerrit repository)
  • Helm chart: eventgate
  • Helm release name: canary or production
  • Kubernetes cluster name: staging, eqiad or codfw
  • Kubernetes namespace: eventgate-analytics

In the eventgate-analytics service examples below, you will be deploying the eventgate-wikimedia docker image from the eventgate Helm chart deploying and applying values via helmfile.

There are 3 repositories that may need changes.

  • EventGate - This is the generic pluggable library & service
  • eventgate-wikimedia - Wikimedia specific implementation code and deployment pipeline Blubber files.
  • deployment-charts - Helm charts and helmfile values, specifies configs for service deployment.

If you make a change to EventGate or eventgate-wikimedia, you must trigger a rebuild of the eventgate Docker image, then change the image version in the eventgate chart and deploy. If you just need to make a config or chart change, then you only need to build a new chart and deploy.

EventGate / eventgate-wikimedia Code Change

If this is an EventGate change, first push the change to the EventGate repository, then change the eventgate dependency SHA version in eventgate-wikimedia package.json.

1. Change is merged to eventgate-wikimedia. This will trigger a service-pipeline-build

2. Jenkins trigger-service-pipeline-test-and-publish is triggered and launches the service-pipeline-test-and-publish job.

3. Once service-pipeline-test-and-publish finishes, the image will be available in our Docker registry https://docker-registry.wikimedia.org. You can list existing image tags with:

 curl https://docker-registry.wikimedia.org/v2/wikimedia/eventgate-wikimedia/tags/list

Once the image is available, we can upgrade the appropriate release(s) in Kubernetes clusters.

4. Edit the appropriate helm values.yaml file(s) in the deployment-charts repo. E.g. helmfile.d/services/eventgate-analytics/values.yaml and update the image version. Merge this change. 1 minute later the updates values file will be pulled on the deployment server.

5. Jump to deployment.eqiad.wmnet. Upgrade the eventgate-analytics service in Kubernetes and verify that it works. Again, to do this follow the instructions at Deployments_on_kubernetes#Code_deployment/configuration_changes.

eventgate service values config change

Service specific configs are kept in values.yaml files inside of helmfile.d To make a simple config value change, edit the appropriate service / cluster(s) values.yaml files, e.g. deployment-charts/helmfile.d/services/eventgate-analytics/values*.yaml. Commit and merge the change, wait up to 1 minute for the change to be synced on the deployment server, then follow the upgrade process described in steps 5.

eventgate chart change

To modify the Helm chart to e.g. change a template or default values, do the following:

1. Edit the eventgate chart in the deployment-charts repository.

2. Test locally in Minikube (more below).

3. Once satisfied, bump the chart version in Chart.yaml. (NOTE: The chart version is independent of the EventGate code version.)

4. Commit and submit the changes to gerrit for review. Once merged, the new chart release should show up at https://helm-charts.wikimedia.org/api/stable/charts/eventgate.

5. Follow the above instructions at Deployments_on_kubernetes#Code_deployment/configuration_changes to upgrade your service to the new deployment.

EventStreamConfig change

EventGate instances are configured to request stream configuration from the MediaWiki EventStreamConfig API, but the way they do so varies depending on configuration. For most 'production' instances, stream configuration is not often edited. To avoid runtime coupling of production EventGate instances, these production instances are configured to only look up their pertinent stream configs at when the service starts. However, eventgate-wikimedia also supports 'dynamic' runtime stream config lookup; meaning if a stream is being produced for which EventGate does not have stream configuration, it will attempt to look up that configuration from the remote EventStreamConfig API.

eventgate-analytics-external is meant for feature instrumentation, and has a higher rate of stream configuration changes. It is the only EventGate instance (as of 2020-08) that looks up event stream configuration at runtime.

To make a change to stream config, either to add a new stream or to change a setting:

1. Edit wgEventStreams in mediawiki-config InitialiseSettings.php. This might look like:

			'stream' => 'resource-purge',
			'schema_title' => 'resource_change',
			'destination_event_service' => 'eventgate-main',

The stream config entry must minimally specify the stream name setting, the schema_title setting (the title field of the event schemas that will be allowed in this stream), and the destination_event_service setting to the EventGate service name that is allowed to produce this event stream. Other stream config settings may be used by services other than EventGate (e.g. the EventLogging extension).

2. Merge and sync this change.

What happens next is dependent on if the EventGate instance uses static or dynamic stream config"

3a. If this stream config change is for an EventGate instance that uses dynamic stream config, no action is needed; the new stream config will be automatically looked up when it is used.

3b. If this was a change for an EventGate that uses static stream config, you'll have to restart the pods to get them to look up the change.

TODO: how to do this?

Troubleshooting in production

All helmfile and kubectl commands below assume your CWD is a helmfile.d service directory on the deployment server, e.g. /srv/deployment-charts/helmfile.d/services/staging/eventgate-analytics

Get detailed status of Helm release

See Migrating_from_scap-helm#Seeing_the_current_status

Upgrade a Helm release

See Migrating_from_scap-helm#Code_deployment/configuration_changes

Rollback to a previous Helm chart version

See Migrating_from_scap-helm#Rolling_back_changes

List k8s pods and their k8s host nodes

source .hfenv; kubectl get pods -o wide

Delete a specific k8s pod

sudo -i; kube_env admin <CLUSTER>; kubectl -n <tiller_namespace> delete pod <pod_name>

(<tiller_namespace> is likely the service name, e.g. eventgate-main.)

Delete all k8s pods in a cluster

You shouldn't do this in production!

sudo -i; kube_env admin <CLUSTER>; kubectl -n eventstreams kubectl delete pod -n <tiller_namespace> --all

(<tiller_namespace> is likely the service name, e.g. eventgate-main.)

Tail sdtout/logs on all pods in a service

 for pod in $(source .hfenv; kubectl get pods -o wide  | grep eventgate | awk '{print $1}'); do source .hfenv; kubectl logs -c $TILLER_NAMESPACE -f --since 1h -c    $TILLER_NAMESPACE $pod & done | jq .

Tail stdout/logs on a specific k8s pod container

In staging (automaticly using the single active pod id):

source .hfenv; kubectl logs -c $TILLER_NAMESPACE -f --since 60m $(source .hfenv; kubectl get pods -l app=$TILLER_NAMESPACE  -o wide | tail -n 1 | awk '{print $1}') | jq .

For a specific pod:

source .hfenv; kubectl logs -c $TILLER_NAMESPACE -f --since 60m <pod_name> | jq .

Get a shell on a specific k8s pod container

In staging (automaticly using the single active pod id):

source .hfenv; sudo KUBECONFIG=/etc/kubernetes/admin-staging.config kubectl exec -ti -n $TILLER_NAMESPACE -c $TILLER_NAMESPACE $(source .hfenv; kubectl get pods -l app=$TILLER_NAMESPACE  -o wide | tail -n 1 | awk '{print $1}') bash

For a specific pod:

CLUSTER=eqiad # or codfw
source .hfenv; sudo KUBECONFIG=/etc/kubernetes/admin-$CLUSTER.config kubectl exec -ti -n $TILLER_NAMESPACE -c $TILLER_NAMESPACE <pod_name> bash

strace on a process in a specific pod container

First find the host node your pod is running on. See above for kubectl get pods. ssh into that node.

# Get the docker container id in your pod.  This will be $1 in the output.
sudo docker ps | grep <pod_name> | grep nodejs
# now get the pid
sudo docker top <container_id> | grep '/usr/bin/node'
# strace it:
sudo strace -p <node_pid>

Or, all in one command (after finding your pod_name and logging into the k8s node:

sudo strace -p $(sudo docker top $(sudo docker ps | grep $pod_name | grep nodejs | head -n 1 | awk '{print $1}')  | grep /usr/bin/node | head -n 1 | awk '{print $2}')

Get a root shell on a specific k8s pod container

Again, find the node where your pod is running and log into that node. Then:

sudo docker exec -ti -u root $(sudo docker ps |grep <pod_name> | grep nodejs | tail -n 1 | awk '{print $1}') /bin/bash

Helm Chart Development

User:Alexandros_Kosiaris/Benchmarking_kubernetes_apps has some instructions on setting up Minikube and Helm for chart development and then benchmarking. This section provides some EventGate specific instructions.

EventGate Helm development environment setup

1. Install Minikube. Follow instructions at https://kubernetes.io/docs/tasks/tools/install-minikube/. Minikube is a virtualized local developement single host Kubernetes cluster.

If Minikube is not started, you can start it with:

minikube start

You'll also need to turn on promiscuous mode so that the Kafka pod will work properly:

minikube ssh
sudo ip link set docker0 promisc on

(See: https://stackoverflow.com/questions/45748536/kafka-inaccessible-once-inside-kubernetes-minikube/52792288#52792288)

2. Install kubectl. Follow instructions on https://kubernetes.io/docs/tasks/tools/install-kubectl/

3. Install Helm. Follow instructions at https://docs.helm.sh/using_helm/#installing-helm. You will need to download the appropriate version for your OS and place it in the $PATH (or %PATH% if you are on Windows)

4. Install Blubber. Follow instructions at https://wikitech.wikimedia.org/wiki/Blubber/Download.

5. Use Minikube as your Docker host:

eval $(minikube docker-env)

6. clone the eventgate-wikimedia repository

git clone https://gerrit.wikimedia.org/r/eventgate-wikimedia
cd eventgate-wikimedia

7. Build a local eventgate-wikimedia development Docker image using Blubber:

 blubber .pipeline/blubber.yaml development > Dockerfile && docker build -t eventgate-dev .

There are several variants in the blubber.yaml file. Here development is selected, and the Docker image is tagged with eventgate-dev.

7. If you don't already have it, clone the operations/deployment-charts repository.

 git clone https://gerrit.wikimedia.org/r/operations/deployment-charts

7. Install the Kafka development Helm chart into Minikube:

cd deployment-charts/charts
helm install ./kafka-dev

This will install a Zookeeper and Kafka pod and keep it running.

8. Install a development chart release into Minikube:

helm install -n eventgate-dev --set main_app.image=eventgate-dev ./eventgate

9. Test that it works:

# Consume from the Kafka test event topic
kafkacat -C -b $(minikube ip):30092 -t datacenter1.test.event
# In another shell, define a handy service alias:
alias service="echo $(minikube ip):$(kubectl get svc --namespace default eventgate-development -o jsonpath='{.spec.ports[0].nodePort}')"
# POST to the eventgate-development service in Minikube
curl -v -H 'Content-Type: application/json' -d '{"$schema": "/test/event/0.0.2", "meta": {"stream": "test.event", "id": "12345678-1234-5678-1234-567812345678", "dt": "2019-01-01T00:00:00Z", "domain": "wikimedia.org"}, "test": "specific test value"}'  $(service)/v1/events

You should see some output from curl like:

< HTTP/1.1 201 All 1 out of 1 events were accepted.

10. Now that the development release is running, you can make local changes to it and re-deploy those changes in Minikube:

helm delete --purge eventgate-dev && helm install -n eventgate-dev --set main_app.image=eventgate-dev ./eventgate