Wikifunctions/Runbook

From Wikitech

This is a list of runbooks for the Abstract Wikipedia Team, particularly the Wikifunctions services, covering step-by-step lists of what to do when things need doing, especially when things go wrong.

How to disable execution in an emergency

Prevent logged-out users from running functions

  1. In mediawiki-config, change core-Permissions.php to set 'wikilambda-execute' => false in the '*' (logged-out users) block of groupOverrides => '+wikifunctions' , push to gerrit, and get deployed like any MW config change.

Prevent all users from running functions

  1. In mediawiki-config, change core-Permissions.php to set 'wikilambda-execute' => false in the '*' (logged-out users) and also the 'user' (logged-in users) blocks of groupOverrides => '+wikifunctions' , push to gerrit, and get deployed like any MW config change.

Back-end services

Main article: Kubernetes/Deployments#Deploying_with_helmfile

What services?

  • function-orchestrator, a service to co-ordinate function requests.
  • function-evaluator, a service to execute user-written code.

Deploy a config update to the orchestrator

  1. Make a change to the Wikifunctions services helm values over-ride in the deployment-charts repo in gerrit, make a commit, land it with a colleague or by yourself
  2. Shell into production deployment server (ssh deployment.eqiad.wmnet) and go to our service directory (cd /srv/deployment-charts/helmfile.d/services/wikifunctions)
  3. Check that the new change to deployment-charts git repo has made it automatically to the server (git log)
  4. Run helmfile -e staging -i apply --context 5 to validate that the helm chart applies and the diff looks correct. If it shows no diff, wait a little for the chart museum to catch up, then try again.
  5. Make a simple request via curl to check that the orchestrator performs as expected, e.g.:
    curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z801","Z801K1":"foo"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
    … should output a JSON blob starting with {"Z1K1":"Z22","Z22K1":"foo",… (call just to the orchestrator)
    curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":{"Z1K1":"Z61","Z61K1":"python"},"Z16K2": "def Z400(Z400K1,Z400K2):\n\treturn str(int(Z400K1) + int(Z400K2))"}}],"Z8K5":"Z400"},"Z400K1":"5","Z400K2":"8"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
    … should output a JSON blob starting with {"Z1K1":"Z22","Z22K1":"13",… (call to the Python evaluator via the orchestrator)
    curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":"Z600","Z16K2":"function Z400( Z400K1,Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }"}}],"Z8K5":"Z400"},"Z400K1":"15","Z400K2":"18"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
    … should output a JSON blob starting with {"Z1K1":"Z22","Z22K1":"33",… (call to the JavaScript evaluator via the orchestrator)
    curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z10000","Z10000K1":"foo","Z10000K2":"bar"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
    … should output a JSON blob starting with {"Z1K1":"Z22","Z22K1":"foobar",… (call to the evaluator via the orchestrator, making a call to the wiki)
  6. Run helmfile -e codfw -i apply --context 5 to deploy the update to the Texas datacentre (the change is now live for some users)
  7. Run helmfile -e eqiad -i apply --context 5 to deploy the update to the Virginia datacentre (the change is now live for all users)
  8. Monitor production for a bit, and revert if needed
    • Wikifunctions services grafana dashboard
    • For the above curl commands, you can replace wikifunctions.k8s-staging.discovery.wmnet with wikifunctions.discovery.wmnet:
      curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z801","Z801K1":"foo"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
      curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":{"Z1K1":"Z61","Z61K1":"python"},"Z16K2": "def Z400(Z400K1,Z400K2):\n\treturn str(int(Z400K1) + int(Z400K2))"}}],"Z8K5":"Z400"},"Z400K1":"5","Z400K2":"8"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
      curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":"Z600","Z16K2":"function Z400( Z400K1,Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }"}}],"Z8K5":"Z400"},"Z400K1":"15","Z400K2":"18"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
      curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z10000","Z10000K1":"foo","Z10000K2":"bar"},"doValidate":false}' --header "Content-type: application/json" -w "\n"

Deploy a new version of the orchestrator

  1. Make a change to the orchestrator repo in GitLab, make a Merge Request, wait for it to landed by a colleague (example)
  2. Make a config change to the orchestrator's helm chart as above, changing in values-main-orchestrator.yaml the version value of the docker image to the newly-created docker-registry tag from step 1. You may wish to explain in the commit what changes are being deployed, for ease of tracking later. (example)

Disable an evaluator from being called

  1. Make a config change to the orchestrator's helm values as above, changing in values-main-orchestrator.yaml the ORCHESTRATOR_CONFIG value to remove the evaluator from the map of known evaluators.
    • If the evaluator you are removing is the only one assigned to that language, you are disabling evaluation in that language.

Add an evaluator to be called

  1. To add a new evaluator instance, edit helmfile.yaml to add a new entry in the releases section, and add a new values-foo-evaluator.yaml file like the others but pointed at the appropriate image and version. Deploy this, and ensure the new release deploys successfully.
  2. Make a config change to the orchestrator's helm chart as above, changing in values-main-orchestrator.yaml the ORCHESTRATOR_CONFIG value to add the evaluator to the map of known evaluators for the appropriate languages.
    • If the evaluator you are adding is the only one assigned to that language, remember that you are enabling evaluation in that language.

Deploy a config update update to an evaluator

Note: There is intentionally very little configurability of the evaluators themselves.

  1. Identify which evaluator release you're updating (JavaScript, Python, etc.) or if it's for all evaluators.
  2. Find the values-*-evaluator.yaml file for the release you wish to adjust, and alter them to add/update/remove the relevant config values, and deploy as above (example)

Deploy a new version of an evaluator

  1. Make a change to the evaluator repo in GitLab, make a Merge Request, wait for it to landed by a colleague (example)
  2. Make a config change to the Wikifunctions service helm values as above, changing in one or all of the values-*-evaluator.yaml files the version value of the docker image to the newly-created docker-registry tag from step 1. You may wish to explain in the commit what changes are being deployed, for ease of tracking later. (example)

Wikifunctions.org wiki

Add a new pre-defined Object in production

  1. Shell into a maintenance server
  2. If you only need to add one Object, run: mwscript extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --zid <NUMERICAL ZID>,
    e.g.mwscript extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --zid 1234
  3. If you have a few to add, run: mwscript extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --from <NUMERICAL ZID> --to <NUMERICAL ZID>,
    e.g.mwscript extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --from 1234 --to 1237


How to monitor usage

How to debug an issue

How to inspect Wikifunctions with Kubernetes tools

  1. Log in to a deployment server and run kube_env wikifunctions $CLUSTER where $CLUSTER is probably staging unless you are very brave.
  2. Now you can run kubectl commands like kubectl get pods.
  3. For example, to get logs for the function-orchestrator, you can run kubectl logs `kubectl get pods | grep orchestrator | awk '{print $1}'` function-orchestrator-main-orchestrator.

How to poke the orchestrator/evaluator

  1. Log in to a deployment server.
  2. The endpoint for the orchestrator can be found at https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/.
  3. Example commands:
    1. to provoke the orchestrator directly: curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z801","Z801K1":"foo"},"doValidate":false}' --header "Content-type: application/json"
    2. to call the evaluator via the orchestrator: curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{ "Z1K1": "Z7", "Z7K1": { "Z1K1": "Z8", "Z8K1": [ "Z17", { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K1" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } }, { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K2" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } } ], "Z8K2": "Z1", "Z8K3": [ "Z20" ], "Z8K4": [ "Z14", { "Z1K1": "Z14", "Z14K1": "Z400", "Z14K3": { "Z1K1": "Z16", "Z16K1": { "Z1K1": "Z61", "Z61K1": "javascript" }, "Z16K2": "function Z400( Z400K1, Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }" } } ], "Z8K5": "Z400" }, "Z400K1": "5", "Z400K2": "8" } ,"doValidate":false}' --header "Content-type: application/json"

Thing that might go wrong

  1. environment variables
    • Environment variables set in the microservice images are ignored by Kubernetes. If you add/delete/modify an environment variable in a container image, you must also update the corresponding configuration when deploying that version of the image.

Beta Cluster

  • Our Beta Cluster production imitation runs on deployment-docker-wikifunctions01 using the role::beta::docker_services hack to run them directly in docker (no kubernetes), so it's not entirely prod-like
  • If you're an admin member of the deployment-prep project, you should be able to do almost everything need in Horizon
  • To debug, shell in via ssh deployment-docker-wikifunctions01.deployment-prep.eqiad1.wikimedia.cloud
    • To trigger an immediate puppet update rather than waiting for the cron, run sudo -i puppet agent -tv
    • To see what services are running, run sudo docker ps
    • To inspect logs from one of the services, run e.g. sudo docker logs function-evaluator-py.service
    • To run a test from the CLI, you can use the above sample commands but changing out the URL for https://wikifunctions-orchestrator-beta.wmflabs.org/1/v1/evaluate, e.g.:
      curl https://wikifunctions-orchestrator-beta.wmflabs.org/1/v1/evaluate -X POST --data '{"zobject":{ "Z1K1": "Z7", "Z7K1": { "Z1K1": "Z8", "Z8K1": [ "Z17", { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K1" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } }, { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K2" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } } ], "Z8K2": "Z1", "Z8K3": [ "Z20" ], "Z8K4": [ "Z14", { "Z1K1": "Z14", "Z14K1": "Z400", "Z14K3": { "Z1K1": "Z16", "Z16K1": "Z600", "Z16K2": "function Z400( Z400K1, Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }" } } ], "Z8K5": "Z400" }, "Z400K1": "15", "Z400K2": "18" },"doValidate":false}' --header "Content-type: application/json" -w "\n"