Open main menu

wikitech-static is an offsite, read-only backup of wikitech.wikimedia.org (Wikitech). It is maintained in order to provide emergency documentation in case of a site-wide outage affecting Wikimedia's production network. It also acts as a temporary meta-monitoring host to monitor Icinga.

If the primary datacenter (and, hence, Wikitech) is down and you need to search for a troubleshooting guide or look at a wiring diagram, wikitech-static.wikimedia.org will be there for you.

What content is on wikitech-static?

Wikitech-static gets a full dump of Wikitech once every day at around noon UTC.  It also attempts to sync image files but there is no guarantee that we're getting full-res versions of those images that are pulled in from commons.  When in doubt you should directly verify that the versions of critical diagrams have enough detail on wikitech-static. This dump includes the SAL, so the log will generally be up to date but (naturally) drifts a bit during the day.

Where is it?

Wikitech-static is hosted on a Rackspace Cloud instance. It is physically located in Rackspace's Chicago datacenter, 'ORD,' which is in Illinois, USA, to not be co-located in the same city of any of our primary datacenters.

What is wikitech-static running?

It's a basic Debian Stretch box. The host can be accessed via ssh with a root login. The password is in the pwstore in the 'wikitech-static' file. Alternatively it can be accessed using keyholder's proxy (SSH_AUTH_SOCK="/run/keyholder/proxy.sock") from the Icinga hosts (as of Nov. 2019 icinga[12]001).

Mediawiki

It is running Apache and a stripped-down MediaWiki install from Git. It is unaffected by Scap so it should live through any disastrous production deployments. MediaWiki updates are performed manually.

Meta-monitoring

It is also monitoring our monitoring system to ensure that it works properly. The current setup involves monitoring Icinga:

  • Checkout of the https://gerrit.wikimedia.org/r/admin/projects/operations/software/external-monitoring gerrit repository into /srv/external-monitoring/.
  • Symlink of the icinga/check_icinga.py script into /usr/local/bin/check_icinga.
  • Configuration for the script in /etc/check_icinga/config.yaml and /etc/check_icinga/contacts.yaml. The contacts configuration file is automatically synced once a day from both production Icinga hosts in a splayed way, so the file is updated at most twice a day, if there is any change. A syntax validation of the files is performed before switching it.
  • To manually validate that the configuration files are valid you can run check_icinga_validate_config.
  • Two crontab entries to run the script against both Icinga hosts, both logging into syslog, plus an additional crontab to notify a restricted group of people in case the configuration is broken.
    */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga2001.wikimedia.org
    */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga1001.wikimedia.org
    11 */2 * * * /usr/bin/systemd-cat -t "check_icinga_validate_config" /root/check_icinga_validate_config_crontab.sh
    

Example of page notifications

Only alerts for the active Icinga host will generate also a page in addition to the email notification.

PROBLEM: Icinga on icinga1001.wikimedia.org is CRITICAL (check email for details)
RECOVERY: Icinga on icinga1001.wikimedia.org is OK (check email for details)

Example of email notification

Subject: PROBLEM: Icinga on icinga2001.wikimedia.org is CRITICAL
From: check_icinga@wikitech-static.wikimedia.org
To: [...SNIP...]

check_icinga@wikitech-static.wikimedia.org found Icinga CRITICAL on icinga2001.wikimedia.org

Issues of attempt 1 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Issues of attempt 2 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Issues of attempt 3 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Subject: RECOVERY: Icinga on icinga2001.wikimedia.org is OK
From: check_icinga@wikitech-static.wikimedia.org
To: [...SNIP...]

check_icinga@wikitech-static.wikimedia.org found Icinga OK on icinga2001.wikimedia.org

How do we maintain it?

Automatic content syncronization

A puppetized cron on wikitech runs /usr/local/sbin/mw-files.sh and /usr/local/sbin/mw-xml.sh which back up the wiki to /a/backup/ and /a/backup/public.

A non-puppetized cron on wikitech-static runs /wikitech-static/wikitechsync/import-wikitech.sh which copies the files from wikitech and installs them.

Manual updates

When updating the MediaWiki deploy on the host, update debs and then upgrade the git repos in /srv/mediawiki/w.

$ ssh root@wikitech-static.wikimedia.org
$ apt-get update && apt-get upgrade
$ cd /srv/mediawiki/w
$ git fetch origin
$ git branch -a
  # Look for the latest REL branch
$ git checkout -b <local branch> <upstream branch>
$ git submodule update --init --recursive
$ ./composer.phar update --no-dev
$ php maintenance/update.php
$ service apache2 graceful

VM control

To access the rackspace admin panel that controls the wikitech-static host, visit https://www.rackspace.com/login and select Cloud Control Panel. The username and login are available in the Ops password repo in the file named 'rackspace'.

What alerts should we watch out for?

Wikitech-static will alert if the site goes offline, or if the MediaWiki version falls behind the official stable release as reported by mediawiki.org.  Labweb1001 and 1002 will alert if the Special:RecentChanges page on wikitech-static falls more than a couple of days behind the Special:RecentChanges page on wikitech.

Services

External link