Mathoid

From Wikitech
Jump to navigation Jump to search

Mathoid is a stateless node service on the kubernetes cluster. It gets POST requests from MediaWiki containing mathematical formulas in LaTeX, and returns JSON with MathML and SVG renderings of the formula. An early version used PhantomJS for the rendering, but this was replaced with MathJax directly in nodejs before deploy. As such, this service does not shell out or store anything in the file system.

Deployment

FIXME Deployment is currently done by *cough* something about charts and helm here *cough*, making changes to the mathoid repo, rebuilding the relevant docker image via blubber, mumble mumble CI and.. (How do these updated images go live?)

Monitoring

  • grafana
  • Icinga: look at mathoid.svc.{eqiad,codfw}.net for lvs mappings, or kubernetes for issues with the services themselves

Troubleshooting

The service listens on port 10042. Example request:

curl -XPOST -d'q=e=mc^2' http://mathoid.discovery.wmnet:10042/

This should return JSON with 'mml' and 'svg' members. This query can also be run directly against any kubernetes host, e.g. kubernetes10XX.eqiad.wmnet.

Logs

Logs go to logstash. But you can get logs as well from every pod. First fetch a pod name

deploy1001:~$ KUBECONFIG=/etc/kubernetes/mathoid-eqiad.config kubectl get pods
NAME                                  READY     STATUS    RESTARTS   AGE
mathoid-production-7b9999dd5f-4ml9j   2/2       Running   0          9d
mathoid-production-7b9999dd5f-5mpmd   2/2       Running   0          9d
mathoid-production-7b9999dd5f-65phz   2/2       Running   1          9d
mathoid-production-7b9999dd5f-mvvcm   2/2       Running   1          9d
...

Let's choose `mathoid-production-7b9999dd5f-4ml9j`. Note that the names of the pods are random as they change with every new deployment. Then run kubectl logs for the logs

deploy1001:~$ KUBECONFIG=/etc/kubernetes/mathoid-eqiad.config kubectl logs mathoid-production-7b9999dd5f-mvvcm mathoid-production
{"name":"mathoid","hostname":"mathoid-production-7b9999dd5f-mvvcm","pid":1,"level":40,"levelPath":"warn/service- runner","msg":"Startup finished","time":"2019-02-26T15:48:07.170Z","v":0} 
TeX parse error: Undefined control sequence \emph
TeX parse error: Undefined control sequence \emph
TeX parse error: Double subscripts: use braces to clarify
TeX parse error: Double subscripts: use braces to clarify

...

Note the mathoid-production part. That's the name of the first of the 2 containers running in this pod and it's the actual application. The second one is the statsd prometheus exporter. Logs for the latter are as easy to obtain.

deploy1001:~$ KUBECONFIG=/etc/kubernetes/mathoid-eqiad.config kubectl logs mathoid-production-7b9999dd5f-mvvcm production-metrics-exporter
time="2019-02-26T15:48:03Z" level=info msg="Starting StatsD -> Prometheus Exporter (version=0.8.0+ds1, branch=master, revision=0.8.0+ds1-4)" source="main.go:158" 

Physically, those are logs that are stored /var/log/pods, so you could find your way through that too, but chances are logstash and kubectl logs are going to be more efficient

Fail-over

The pods are managed by kubernetes and will fail over automatically, within a configurable at the cluster level time frame. The default is 5 mins. The failure of an individual node does not affect the availability of the service.

Restarting a node

The entire worker node can simply be rebooted, see Kubernetes;

If you want to restart a specific pod, just delete it. The infrastructure will restart it automatically. e.g.


deploy1001:$ KUBECONFIG=/etc/kubernetes/mathoid-eqiad.config kubectl delete pods mathoid-production-7b9999dd5f-cr2dr

Error from server (Forbidden): pods "mathoid-production-7b9999dd5f-cr2dr" is forbidden: User "mathoid" cannot delete pods in the namespace "mathoid"

Note the forbidden part. That means that the regular deployer does not have the ability to do that (and for good reason). But SREs can do:

 deploy1001:~$ sudo KUBECONFIG=/etc/kubernetes/admin-eqiad.config kubectl -n mathoid delete pods mathoid-production-7b9999dd5f-cr2dr
pod "mathoid-production-7b9999dd5f-cr2dr" deleted

Restarting all pods for the service

Since pods are regenerated after deletion, just delete them all. You can delete all the pods for the service (as an SRE) by

kubectl -n mathoid delete pods --all

(Maybe we want to be able to do these with a delay between each, rather than accidentally-ing the entire service?)

See also