Zotero

Zotero is a Node.js service which is run alongside the Citoid service.

Updating

Deployment

These are directions for deploying zotero, a Node.js service. More detailed but more general directions for nodejs services are at Migrating_from_scap-helm#Code_deployment/configuration_changes.

Locate build candidate

From gitlab, locate the candidate build. https://gitlab.wikimedia.org/repos/mediawiki/services/zotero/-/jobs

You can also check the docker registry, but this is slow to update so may not contain the latest build: https://docker-registry.wikimedia.org/repos/mediawiki/services/zotero/tags/

The name of the build will look something like '2024-11-08-122812-production'.

Add change via gerrit

Clone deployment-charts repo.
vi values.yaml

main_app:
  image: repos/mediawiki/services/zotero
  limits:
    cpu: 10
    memory: 4Gi
  liveness_probe:
    tcpSocket:
      port: 1969
  port: 1969
  requests:
    cpu: 200m
    memory: 200Mi
  version: 2024-11-08-122812-change-me-production

Make a CR to change the version value in values.yaml, and after a successful review, merge it. Staging.yaml and codfw.yaml will inherit the values from values.yaml.
After merge, log into a deployment server, there is a cronjob (1 minute) that will update the /srv/deployment-charts directory with the contents from git.

Log into deployment server

Ssh into the deploy machine.

ssh deployment.eqiad.wmnet

Navigate to /srv/deployment-charts/helmfile.d/services/

Zotero runs in both of the core data centers: Eqiad and Codfw. There is also a staging server. You can test out changes on the staging server first.

Staging server

cd /srv/deployment-charts/helmfile.d/services/zotero
helmfile -e staging -i apply

Helfile apply (may take awhile)

This checks status again so you can see if all instances have been restarted yet.

Verify Zotero is running on staging with a curl request:

curl -k -d 'https://en.wikipedia.org/wiki/Darth_Vader' -H 'Content-Type: text/plain' https://staging.svc.eqiad.wmnet:4969/web

curl -d '9791029801297' -H 'Content-Type: text/plain' https://staging.svc.eqiad.wmnet:4969/search

Production server

helmfile -e codfw -i apply

repeat for eqiad

helmfile -e eqiad -i apply

eqiad queries

curl -k -d '9791029801297' -H 'Content-Type: text/plain' https://zotero.svc.eqiad.wmnet:4969/search

curl -k -d 'https://en.wikipedia.org/wiki/Darth_Vader' -H 'Content-Type: text/plain' https://zotero.svc.eqiad.wmnet:4969/web


curl -k -d 'https://www.example.com' -H 'Content-Type: text/plain' https://zotero.svc.eqiad.wmnet:4969/web

Verify

Verify Zotero is running with a curl request:

curl -k -d 'https://en.wikipedia.org/wiki/Darth_Vader' -H 'Content-Type: text/plain' https://zotero.svc.codfw.wmnet:4969/web

curl -k -d 'https://en.wikipedia.org/wiki/Darth_Vader' -H 'Content-Type: text/plain' https://zotero.svc.eqiad.wmnet:4969/web

curl -d 'https://en.wikipedia.org/wiki/Darth_Vader' -H 'Content-Type: text/plain' https://zotero.discovery.wmnet:4969/web

curl -d '9791029801297' -H 'Content-Type: text/plain' https://zotero.discovery.wmnet:4969/search

Logs

Logs have been disabled in 827891af due to them being actively harmful (aside from useless) to our environment. If you need to chase down a request that caused an issue, Citoid logs might be helpful as, aside from monitoring, Citoid is the only service talking to zotero. Citoid logs are in Logstash

Monitoring

There is a probe which uses the export endpoint to check if Zotero is alive:

https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/service.yaml#3249

These alerts can be seen on logstash for AlertManager filtering for labels:instance: zotero:4969

https://logstash.wikimedia.org/app/dashboards#/view/8b1907c0-2062-11ec-85b7-9d1831ce7631?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-3w,to:now))&_a=(description:'Investigate%20alert%20trends%20from%20Icinga%20and%20Alertmanager.',filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'0fade920-6712-11eb-8327-370b46f9e7a5',key:labels.instance,negate:!f,params:(query:'zotero:4969'),type:phrase),query:(match_phrase:(labels.instance:'zotero:4969')))),fullScreenMode:!f,options:(hidePanelTitles:!f,useMargins:!t),query:(language:kuery,query:),timeRestore:!t,title:'Alerts%20overview',viewMode:view)

The citoid swagger probe also includes a test of Zotero:

https://grafana.wikimedia.org/d/_77ik484k/openapi-swagger-endpoint-state?orgId=1

Envoy telemetry also includes Zotero:

https://grafana-rw.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s?orgId=1&var-datasource=thanos&var-site=eqiad&var-prometheus=k8s&var-kubernetes_namespace=citoid&var-kubernetes_namespace=zotero&var-app=All&var-destination=zotero&from=now-2d&to=now&forceLogin=true

https://grafana.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s?orgId=1&var-datasource=thanos&var-site=eqiad&var-prometheus=k8s&var-kubernetes_namespace=zotero&var-app=zotero&var-destination=local_service&from=now-30m&to=now

Rolling back changes

If you need to roll back a change because something went wrong:

Revert the git commit to the deployment-charts repo
Merge the revert (with review if needed)
Wait one minute for the cron job to pull the change to the deployment server
execute ENV=<staging,eqiad,codfw> kube_env zotero $ENV; helmfile -e $ENV diff to see what you'll be changing
execute helmfile -e $ENV apply

Rolling restart

A rolling restart of Zotero is sometimes necessary to resolve production issues. To run a rolling restart, run this on the deployment host, where $ENV is one of staging, eqiad, or codfw:

helmfile -e ${ENV?} -f /srv/deployment-charts/helmfile.d/services/zotero/helmfile.yaml --state-values-set roll_restart=1 sync

Rolling back in an emergency

This is discouraged but is noted here for completeness. Only use it in really truly big emergencies. If you are wondering whether a situation qualifies as truly big emergency, it almost certainly is not. Reach out to more senior SREs and the service owner first

If you can't wait the one minute, or the cron job to update from git fails etc. then it is possible to manually roll back using helm.

Find the revision to roll back to

kube_env zotero <staging,eqiad,codfw>; helm history <production> --tiller-namespace YOUR_SERVICE_NAMESPACE
Find the revision to roll back to

e.g. perhaps the penultimate one

REVISION        UPDATED                         STATUS          CHART           DESCRIPTION     
1               Tue Jun 18 08:39:20 2019        SUPERSEDED      termbox-0.0.2   Install complete
2               Wed Jun 19 08:20:42 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
3               Wed Jun 19 10:33:34 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete
4               Tue Jul  9 14:21:39 2019        SUPERSEDED      termbox-0.0.3   Upgrade complete

Rollback with: kube_env zotero <staging,eqiad,codfw>; helm rollback <production> 3 --tiller-namespace YOUR_SERVICE_NAMESPACE