Wikifunctions/Runbook
Appearance
This is a list of runbooks for the Abstract Wikipedia Team, particularly the Wikifunctions services, covering step-by-step lists of what to do when things need doing, especially when things go wrong.
How to disable execution in an emergency
Prevent logged-out users from running functions
- In mediawiki-config, change core-Permissions.php to set
'wikilambda-execute' => false
in the'*'
(logged-out users) block ofgroupOverrides => '+wikifunctions'
, push to gerrit, and get deployed like any MW config change.
Prevent all users from running functions
- In mediawiki-config, change core-Permissions.php to set
'wikilambda-execute' => false
in the'*'
(logged-out users) and also the'user'
(logged-in users) blocks ofgroupOverrides => '+wikifunctions'
, push to gerrit, and get deployed like any MW config change.
Prevent service from accessing Wikidata
- In deployment-charts, change values-main-orchestrator.yaml to set
"useWikidata": false,
, push to gerrit, and get deployed like any helm chart change.
Back-end services
Main article: Kubernetes/Deployments#Deploying_with_helmfile
What services?
function-orchestrator
, a service to co-ordinate function requests.function-evaluator
, a service to execute user-written code.
Deploy a config update to the orchestrator
- Make a change to the Wikifunctions services helm values over-ride in the deployment-charts repo in Gerrit, make a commit, land it with a colleague or by yourself
- Shell into production deployment server (
ssh deployment.eqiad.wmnet
) and go to our service directory (cd /srv/deployment-charts/helmfile.d/services/wikifunctions
) - Check that the new change to deployment-charts git repo has made it automatically to the server (
git log
)- Be sure you can see the correct latest commit(s) via
git status
- Sometimes you may need to communicate with external team/members to check their status on updates. Most to all communication surrounding deploy takes place in IRC, but talks of moving this to a different platform is in the works.
- Be sure you can see the correct latest commit(s) via
- [Cautionary step] In general you might want to first deploy
function-orchestrator
changes beforefunction-evaluator
. Doing both in parallel adds more risk and should generally be avoided. Repeat from this step once you've successfully deployed the first change, if you have others. - Run this commend to validate that the helm chart applies and the diff looks correct. If it shows no diff, wait a little for the chart museum to catch up, then try again:
helmfile -e staging -i apply --context 5
- Make a simple request via curl to check that the orchestrator performs as expected, e.g.:
curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z801","Z801K1":"foo"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
- … should output a JSON blob starting with
{"Z1K1":"Z22","Z22K1":"foo",…
(call just to the orchestrator)curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":{"Z1K1":"Z61","Z61K1":"python"},"Z16K2": "def Z400(Z400K1,Z400K2):\n\treturn str(int(Z400K1) + int(Z400K2))"}}],"Z8K5":"Z400"},"Z400K1":"5","Z400K2":"8"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
- … should output a JSON blob starting with
{"Z1K1":"Z22","Z22K1":"13",…
(call to the Python evaluator via the orchestrator)curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":"Z600","Z16K2":"function Z400( Z400K1,Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }"}}],"Z8K5":"Z400"},"Z400K1":"15","Z400K2":"18"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
- … should output a JSON blob starting with
{"Z1K1":"Z22","Z22K1":"33",…
(call to the JavaScript evaluator via the orchestrator)curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z10000","Z10000K1":"foo","Z10000K2":"bar"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
- … should output a JSON blob starting with
{"Z1K1":"Z22","Z22K1":"foobar",…
(call to the evaluator via the orchestrator, making a call to the wiki)curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z6825","Z6825K1":{"Z1K1":"Z6095","Z6095K1":"L1"}},"doValidate":false}' --header "Content-type: application/json" -w "\n"
- … should output a JSON blob starting with
{"Z1K1":"Z22","Z22K1":{"Z1K1":"Z6005",…
(call to dereference a Wikidata Lexeme)
- Check that you can see the logs triggered from the above requests in LogStash.
- You can tell where the log is coming from by observing the ‘host’ label (rather than ‘kubernetes.host’, because Staging uses the same host, ‘eqiad’, as Production).
- Run this to deploy the update to the Texas datacentre (the change is now live for some users):
helmfile -e codfw -i apply --context 5
- Run this to deploy the update to the Virginia datacentre (the change is now live for all users)
helmfile -e eqiad -i apply --context 5
- (There are two data centers used: ‘codfw’ and ‘eqiad’, and the Foundation rotates between them every six months.)
- Monitor production for a bit, and revert if needed
- Wikifunctions services grafana dashboard
- For the above
curl
commands, you can replacewikifunctions.k8s-staging.discovery.wmnet
withwikifunctions.discovery.wmnet
:curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z801","Z801K1":"foo"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":{"Z1K1":"Z61","Z61K1":"python"},"Z16K2": "def Z400(Z400K1,Z400K2):\n\treturn str(int(Z400K1) + int(Z400K2))"}}],"Z8K5":"Z400"},"Z400K1":"5","Z400K2":"8"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":{"Z1K1":"Z8","Z8K1":["Z17",{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K1"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}},{"Z1K1":"Z17","Z17K1":"Z6","Z17K2":{"Z1K1":"Z6","Z6K1":"Z400K2"},"Z17K3":{"Z1K1":"Z12","Z12K1":["Z11"]}}],"Z8K2":"Z1","Z8K3":["Z20"],"Z8K4":["Z14",{"Z1K1":"Z14","Z14K1":"Z400","Z14K3":{"Z1K1":"Z16","Z16K1":"Z600","Z16K2":"function Z400( Z400K1,Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }"}}],"Z8K5":"Z400"},"Z400K1":"15","Z400K2":"18"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z10000","Z10000K1":"foo","Z10000K2":"bar"},"doValidate":false}' --header "Content-type: application/json" -w "\n"
curl https://wikifunctions.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z6825","Z6825K1":{"Z1K1":"Z6095","Z6095K1":"L1"}},"doValidate":false}' --header "Content-type: application/json" -w "\n"
- Check again that you can see the logs triggered from the above Production requests in LogStash.
Deploy a new version of the orchestrator
- Make a change to the orchestrator repo in GitLab, make a Merge Request, wait for it to landed by a colleague (example)
- In your local clone the deployment-charts repo in Gerrit, start a commit based on master, and find the latest commits for our code.
git checkout -B wikifunctions-deploy origin/master && git log helmfile.d/services/wikifunctions/
- Take the hash of the latest update to the function-orchestrator (in this case, 4f41ae5) and in your local clone of the function-orchestrator get the list of commits:
git fetch && git log --topo-order --no-merges --reverse --oneline 4f41ae5..origin/main
- Also use this to get a list of tasks that will be affected by this deploy:
git fetch && git log --topo-order --no-merges --reverse 4f41ae5..origin/main | grep Bug: | sort | uniq
- On your Merge Request's page on GitLab, find out what the published container was tagged as: find the post-merge pipeline run, and within that the "publish-images" stage; click through to the job output, and then scroll down to the bottom where there should be a line like
pushing manifest for docker-registry.discovery.wmnet/repos/abstract-wiki/wikifunctions/function-orchestrator:2024-11-13-145636@sha256:…
—2024-11-13-145636
is the image tag. - Edit
helmfile.d/services/wikifunctions/values-main-orchestrator.yaml
to change theversion:
value to the new image tag. - Add your change to your git stack:
git add -p helmfile.d/services/wikifunctions/values-main-orchestrator.yaml
- Commit your git stack with
git commit
, using the above details, e.g.:wikifunctions: Upgrade orchestrator from 2024-10-15-192817 to 2024-11-13-145636 2f9ab91 db: Show the full URL, not just the initial value, in logs 5047011 Include nested senses inside fetched lexemes 0c5c623 pass header details for tracing from evaluator add6191 Add referencePreCache, ensuring that each ZID will be resolved exactly once per orchestrator invocation. 4719da9 Update function-schemata sub-module to HEAD (f2c043c) 60b4c4d Increase orchestrator rate limit to 300 Bug: T356144 Bug: T367120 Bug: T375922 Bug: T375944 Bug: T376060 Bug: T376826 Bug: T377380 Bug: T377547 Bug: T377797 Bug: T377851 Bug: T378678 Bug: T379098
- Push your commit for review
git review
Disable an evaluator from being called
- Make a config change to the orchestrator's helm values as above, changing in
values-main-orchestrator.yaml
theORCHESTRATOR_CONFIG
value to remove the evaluator from the map of known evaluators.- If the evaluator you are removing is the only one assigned to that language, you are disabling evaluation in that language.
Add an evaluator to be called
- To add a new evaluator instance, edit
helmfile.yaml
to add a new entry in thereleases
section, and add a newvalues-foo-evaluator.yaml
file like the others but pointed at the appropriate image and version. Deploy this, and ensure the new release deploys successfully. - Make a config change to the orchestrator's helm chart as above, changing in
values-main-orchestrator.yaml
theORCHESTRATOR_CONFIG
value to add the evaluator to the map of known evaluators for the appropriate languages.- If the evaluator you are adding is the only one assigned to that language, remember that you are enabling evaluation in that language.
Deploy a config update update to an evaluator
Note: There is intentionally very little configurability of the evaluators themselves.
- Identify which evaluator release you're updating (JavaScript, Python, etc.) or if it's for all evaluators.
- Find the
values-*-evaluator.yaml
file for the release you wish to adjust, and alter them to add/update/remove the relevant config values, and deploy as above (example)
Deploy a new version of an evaluator
- Make a change to the evaluator repo in GitLab, make a Merge Request, wait for it to landed by a colleague (example)
- Make a config change to the Wikifunctions service helm values as above, changing in one or all of the
values-*-evaluator.yaml
files theversion
value of the docker image to the newly-created docker-registry tag from step 1. You may wish to explain in the commit what changes are being deployed, for ease of tracking later. (example)
Wikifunctions.org wiki
Add or edit pre-defined Objects in production
When to do this?
- If any new pre-defined Objects have been added in the latest function-schemata updates.
- If existing pre-defined Objects have been edited in the latest function-schemata updates.
Where to find the objects to update?
- A list of the Objects created and edited in function-schemata since the last update should be kept in the AW Chores page.
- You can also gather this list by analyzing each schemata change in the latest sub-module update applied in production (see the list of latest merged schemata updates).
How to run the script?
- Shell into a deployment server
- To add new Objects:
- If you only need to add one Object, run the script with the
--zid <ZID>
flag, e.g.:
mwscript-k8s -f -- extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --zid Z1234
- If you need to add a few Objects within a range, run the script with the
--from <ZID>
--to <ZID>
flags, e.g.:
mwscript-k8s -f -- extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --from Z1234 --to Z1237
- If you only need to add one Object, run the script with the
- To edit existing Objects:
- Use the
--zid
or--to
and--from
flags as explained above. - Add the
--merge
flag to merge the function-schemata latest version with the currently stored Object in production.- E.g.: to apply latest changes to the built-ins from Z6000 to Z6006, run:
mwscript-k8s -f -- extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --merge --from Z6000 --to Z6006
- E.g.: to apply latest changes to the built-in Z1234, run:
mwscript-k8s -f -- extensions/WikiLambda/maintenance/loadPreDefinedObject.php --wiki=wikifunctionswiki --merge --zid Z1234
- E.g.: to apply latest changes to the built-ins from Z6000 to Z6006, run:
- The
--merge
flag might find conflicts; the script will show information of the conflict and request action:- If the conflict flags an intended change in function-schemata, enter
y
(yes). - If the conflict is unrelated or you have any doubt, enter
n
(no) to keep the current version and discuss the conflict with the team.
- If the conflict flags an intended change in function-schemata, enter
- Use the
For more detailed documentation on loadPreDefinedObject.php, see the WikiLambda README.md file.
Re-run the secondary data updates for a kind of Object in production
To refresh the secondary data tables (e.g. labels) for a Type, such as when we fix a bug that means they might have been corrupted or partially-missing:
- Shell into a deployment server
- Do a dry-run to check:
mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType <TYPE ZID> --report --verbose --dryRun
, e.g.:mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType 60 --report --verbose --dryRun
- … then do the real:
mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType <TYPE ZID> --report --verbose
, e.g.:mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType Z60 --report --verbose
How to monitor usage
This section is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
- mw-wikifunctions service dashboard
- Wikifunctions service general dashboard (WIP)
- Envoy telemetry usage dashboard of the function-orchestrator
How to debug an issue
This section is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
How to inspect Wikifunctions with Kubernetes tools
- Log in to a deployment server and run
kube_env wikifunctions $CLUSTER
where$CLUSTER
is probablystaging
unless you are very brave. - Now you can run
kubectl
commands likekubectl get pods
. - For example, to get logs for the function-orchestrator, you can run
kubectl logs `kubectl get pods | grep orchestrator | awk '{print $1}'` function-orchestrator-main-orchestrator
. - For another example, to read the Prometheus metrics for a pod, you can get the IP via
kubectl get pods -o wide
and thencurl <IP>:9100
, to see that the expected metrics are being set.
How to poke the orchestrator/evaluator
- Log in to a deployment server.
- The endpoint for the orchestrator can be found at
https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/
. - Example commands:
- to provoke the orchestrator directly:
curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{"Z1K1":"Z7","Z7K1":"Z801","Z801K1":"foo"},"doValidate":false}' --header "Content-type: application/json"
- to call the evaluator via the orchestrator:
curl https://wikifunctions.k8s-staging.discovery.wmnet:30443/1/v1/evaluate/ -X POST --data '{"zobject":{ "Z1K1": "Z7", "Z7K1": { "Z1K1": "Z8", "Z8K1": [ "Z17", { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K1" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } }, { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K2" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } } ], "Z8K2": "Z1", "Z8K3": [ "Z20" ], "Z8K4": [ "Z14", { "Z1K1": "Z14", "Z14K1": "Z400", "Z14K3": { "Z1K1": "Z16", "Z16K1": { "Z1K1": "Z61", "Z61K1": "javascript" }, "Z16K2": "function Z400( Z400K1, Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }" } } ], "Z8K5": "Z400" }, "Z400K1": "5", "Z400K2": "8" } ,"doValidate":false}' --header "Content-type: application/json"
- to provoke the orchestrator directly:
Thing that might go wrong
- environment variables
- Environment variables set in the microservice images are ignored by Kubernetes. If you add/delete/modify an environment variable in a container image, you must also update the corresponding configuration when deploying that version of the image.
Beta Cluster
- Our Beta Cluster production imitation runs on
deployment-docker-wikifunctions01
using therole::beta::docker_services
hack to run them directly in docker (no kubernetes), so it's not entirely prod-like - If you're an admin member of the deployment-prep project, you should be able to do almost everything need in Horizon
- To debug, shell in via
ssh deployment-docker-wikifunctions01.deployment-prep.eqiad1.wikimedia.cloud
- To trigger an immediate puppet update rather than waiting for the cron, run
sudo -i puppet agent -tv
- To see what services are running, run
sudo docker ps
- To inspect logs from one of the services, run e.g.
sudo docker logs function-evaluator-py.service
- To run a test from the CLI, you can use the above sample commands but changing out the URL for
https://wikifunctions-orchestrator-beta.wmflabs.org/1/v1/evaluate
, e.g.:curl https://wikifunctions-orchestrator-beta.wmflabs.org/1/v1/evaluate -X POST --data '{"zobject":{ "Z1K1": "Z7", "Z7K1": { "Z1K1": "Z8", "Z8K1": [ "Z17", { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K1" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } }, { "Z1K1": "Z17", "Z17K1": "Z6", "Z17K2": { "Z1K1": "Z6", "Z6K1": "Z400K2" }, "Z17K3": { "Z1K1": "Z12", "Z12K1": [ "Z11" ] } } ], "Z8K2": "Z1", "Z8K3": [ "Z20" ], "Z8K4": [ "Z14", { "Z1K1": "Z14", "Z14K1": "Z400", "Z14K3": { "Z1K1": "Z16", "Z16K1": "Z600", "Z16K2": "function Z400( Z400K1, Z400K2 ) { return (parseInt(Z400K1) + parseInt(Z400K2)).toString(); }" } } ], "Z8K5": "Z400" }, "Z400K1": "15", "Z400K2": "18" },"doValidate":false}' --header "Content-type: application/json" -w "\n"
- To trigger an immediate puppet update rather than waiting for the cron, run