Debugging in production

From Wikitech
Jump to navigation Jump to search

Debugging a web request

Externally

Use X-Wikimedia-Debug to make a request bypass Varnish cache and route to a specific debug server.

Directly

You can make a self-request directly to a web server by ssh-ing to a production serve and using Curl, like so:

mwdebug1002$ curl -i --connect-to ::$HOSTNAME 'https://test.wikipedia.org/w/load.php'
HTTP/1.1 200 OK
Server: mwdebug1002.eqiad.wmnet
…
/* This file is theWeb entry point for MediaWiki's ResourceLoader:  */

Or over HTTP:

mwdebug1002$ curl -i --connect-to ::$HOSTNAME 'http://test.wikipedia.org/wiki/Main_Page'
HTTP/1.1 302 Found
Server: mwdebug1002.eqiad.wmnet
Location: https://test.wikipedia.org/wiki/Main_Page
mwdebug1001$ curl -i --connect-to ::$HOSTNAME 'http://www.wikimedia.org/'
HTTP/1.1 200 OK
Server: mwdebug1001.eqiad.wmnet
…
<!DOCTYPE html>
<html lang="mul" dir="ltr">
<head>
<meta charset="utf-8">
<title>Wikimedia</title>
<meta name="description" content="Wikimedia is a global movement whose mission is to bring free educational content to the world.">
…

And over HTTP as if from an external HTTPS request (This is currently the only way to debug in Beta Cluster, since internal HTTPS is not available there):

deployment-mediawiki11$ curl -i --connect-to ::$HOSTNAME -H 'X-Forwarded-Proto: https' 'http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page'
HTTP/1.1 200 OK
Server: deployment-mediawiki11.deployment-prep.eqiad1.wikimedia.cloud
…
<!DOCTYPE html>
…


Note about Host header: Prior to 2015, the more traditional approach of using curl 'http://localhost/wiki/Main_Page' -H 'Host: test.wikipedia.org' was supported, but per T190111 this is no longer possible because connections via "localhost" are handled by a higher priority VirtualHost in Apache that serves responses for the health status checks (not related to MediaWiki).

Note about FQDN address: Prior to 2019, it was common to workaround the above "localhost" issue by using the internal FQDN (mw0000.eqiad.wmnet) or its internal IP address instead. This is easiest via $HOSTNAME or $(hostname -f), e.g. like curl -i -H 'Host: test.wikipedia.org' "http://$HOSTNAME/w/load.php". While this still works today for HTTP requests, it does not work reliably for HTTPS requests since the web server in question has no certificate for the internal hostname, though this could be bypassed with curl --insecure (or curl -k for short).

Note about --resolve option: Prior to 2020, other documentation pages recommended --resolve as the main strategy, e.g. curl -i --resolve "test.wikipedia.org:443:$(hostname -i)" 'https://test.wikipedia.org/w/load.php'. This still works perfectly today and is functionally equivalent to the current recommendation with --connect-to. The --resolve option is no longer recommended because it is too easy to misuse and not realize that it was silently ignored. For example, if you specify "resolve" with a different hostname than your URL (with redirects, there can be many host names involved), curl will silently connect to the main production edge for your first and only request, which is easy to miss if you don't enable verbose -v mode and check what server it actually connected to. This can be mitigated by using a wildcard hostname like --resolve "*:443:$(hostname -i)" but that still requires getting the port right, which means over HTTP, it would silently get ignored again, plus it requires the IP address and thus the extra hostname command. The --connect-to option has the benefit of allowing both host and port to be omitted, and supports a hostname as destination (instead of IP address), thus allowing the simpler and more memorable "::$HOSTNAME" form.

Pushing code to a debug server

Developers can debug code changes on one of the mwdebug hosts, without deploying to the production cluster. See also Pre-deployment testing in production.

Debug via Gerrit and Scap

If you don't have shell access or if you prefer to have the safety of CI checks first, then draft your debug patch locally and send it to Gerrit. Then yourself or a deployer that helps you, can cherry-pick that stage on the deployment server, and run scap pull on the mwdebug host. This automatically takes care of restarting php-fpm and thus clearing the PHP opcache.

When you are done testing, clean up the deployment server and run scap pull on the mwdebug host once more to reset it back to the clean state.

Debug directly on the server

When editing files on a mwdebug server directly, remember to restart php-fpm. Without this, changes to files on disk might not take any affect because production servers use php-opcache to compile code into RAM.

$ sudo -i /usr/local/sbin/restart-php7.4-fpm

If these instructions are out of date, you might find them at Application servers/Runbook#PHP7 rendering and Service restarts#php-fpm restart.

When you are done testing, run scap pull to reset the server back to a clean state.

Conditional code

Any changes you make by debugging in this way will be overwritten by the next MediaWiki deployment (e.g. train or backport window).

f changes are meant to last more than an hour, please commit them to Git instead and deploy them the normal way. Use conditionals such as if ( $wgDBname === 'testwiki' ) to limit any unforeseen risk of side-effects from your debug code as much as possible. You can also limit your debugging to a single server through conditionals like if ( php_uname( 'n' ) === 'mw1234' ).

Testing it

Use X-Wikimedia-Debug in a browser to route one of your regular web requests to the debug server you have staged code on.

Debugging databases

From a maintenance host, use the sql command, or use mwscript mysql.php directly.

Run sql enwiki -h for help to see how you can connect to non-core databases, like ExternalStorage, or specific database hosts (e.g. db0123).

In particular, take note that in MediaWiki some of our DB clusters have a different name. For example "x1" and "x2" are known as "extension1" and "extension2", for the purposes of the sql --cluster parameter and internal values of $wgLBFactoryConf that this corresponds with.

Examples:

# Connect to s3.test2wiki database on a live replica in production.
$ sql test2wiki

# Connect to s7.centralauth 
$ sql centralauth
$ mwscript mysql.php --wiki metawiki --wikidb centralauth

# Connect to x1.wikishared
$ sql wikishared
$ mwscript mysql.php --wiki metawiki --cluster extension1 --wikidb wikishared

# Connect to x2.mainstash
$ sql mainstash --cluster extension2
$ mwscript mysql.php --wiki metawiki --cluster extension2 --wikidb mainstash

# Connect to an arbitrary host that is not pooled or never in $wgLBFactoryConf,
# such as parser cache hosts.
$ sql parsercache --raw-host 10.64.0.57
$ mwscript mysql.php --wiki metawiki --raw-host 10.64.0.5 --wikidb parsercache

# List pooled hosts (s1)
$ sql enwiki --list-hosts
$ mwscript mysql.php --wiki enwiki --list-hosts
db0001
db0002
db0003

# List pooled hosts (centralauth)
$ sql centralauth --list-hosts
$ mwscript mysql.php --wiki enwiki --wikidb centralauth --list-hosts
…

# List pooled hosts (external cluster)
$ mwscript mysql.php --wiki metawiki --cluster extension1 --list-hosts
…

Debugging a maintenance script

ssh to a mwdebug host, then:

source /usr/local/lib/mw-deployment-vars.sh
sudo -u "$MEDIAWIKI_WEB_USER" php -m debug "$MEDIAWIKI_DEPLOYMENT_DIR/multiversion/MWScript.php" someScript.php --wiki=testwiki --scriptSpecificParameters "goHere"

Debugging logs

To inspect how log messages are sent over the network from MediaWiki PHP to Logstash, read Application servers/Runbook#Logging.

To enable verbose logging in production, it is recommended to work on an mwdebug host and then use WikimediaDebug (either via your browser, or via curl from a shell on the mwdebug host).

If you are investigating problems with a specific appserver or otherwise can't rely on Logstash, follow WikimediaDebug#Debug logging (CLI) to temporarily read logs on the same server.

Debugging in shell

To open a command-line shell to PHP, log in an mwdebug server or the Maintenance server and run:

$ mwscript shell.php dbname_here

Where dname is e.g. aawiki. You can call arbitrary MW code from here, see mw:Manual:Shell.php.

Ad-hoc log messages

The recommended approach to ad-hoc logging in production is wfDebugLog( 'AdHocDebug', 'Hi...' );. This will reliably send the message to Logstash from both web-facing contexts, jobrunners, and CLI maintenance scripts, and does so without running the risk of unintentionally disclosing sensitive data attached to objects in memory.

If the data will be logged from a mwdebug host via CLI or via WikimediaDebug, then the message will show up at Logstash dashboard: mwdebug

If the data is expected to come from a different host (e.g. only reproducible there, or waiting for the condition to be hit organically), then the message will show up at Logstash dashboard: mediawiki where you can query for channel:AdHocDebug, or page through the channel list and zoom in on the appropiate channel.

Ad-hoc command line logging

To reproduce an issue programmatically, it is recommended to follow #Debugging in shell instead without modifying source code on disk or running modified programs.

If an issue is difficult to reproduce and you need to modify a maintenance script to log some information quickly you can use the wfDebugLog() approach above. Alternatively, to keep the information local and not write to Logstash, you can also choose one of the following:

  • error_log('Hi ...', 4);
  • syslog(LOG_DEBUG, 'Hi...);

error_log Type 4 corresponds to STDERR in CLI. For web requests via Apache, STDERR is not defined and these go to syslog instead. For such web requests, these will end up in Logstash as type:apache2 message:"Got error 'PHP message: Hi...". For mwdebug hosts, these end up on the mwdebug Logstash, but take note that these will not match type:mediawiki queries and do not show up on the general "mediawiki" or "mediawiki-errors" dashboards in Logstash. For other hosts, you may find these on the apache2log Logstash dashboard

Syslog will end up on disk, readable via sudo tail /var/log/syslog and is also readable without sudo on the syslog Logstash dashboard, possibly querying with e.g. host:mw0000 or message:Hi to find specific entries.