wikitech-static

From Wikitech

wikitech-static is an offsite, read-only backup of wikitech.wikimedia.org (Wikitech). It is maintained in order to provide emergency documentation in case of a site-wide outage affecting Wikimedia's production network. It also acts as a temporary meta-monitoring host to monitor Icinga.

If the primary data center (and, hence, Wikitech) is down and you need to search for a troubleshooting guide or look at a wiring diagram, wikitech-static.wikimedia.org will be there for you.

What content is on wikitech-static?

Wikitech-static gets a full dump of Wikitech once every day at around noon UTC. It also attempts to sync image files but there is no guarantee that we're getting full-res versions of those images that are pulled in from commons. When in doubt you should directly verify that the versions of critical diagrams have enough detail on wikitech-static. This dump includes the SAL, so the log will generally be up to date but (naturally) drifts a bit during the day.

Where is it?

Wikitech-static is hosted on a Rackspace Cloud instance. It is physically located in Rackspace's Chicago datacenter, 'ORD,' which is in Illinois, USA, to not be co-located in the same city of any of our primary datacenters.

What is wikitech-static running?

It's a basic Debian Buster box. The host can be accessed via ssh with a root login. The password is in the pwstore in the 'wikitech-static' file. Alternatively it can be accessed using keyholder's proxy (SSH_AUTH_SOCK="/run/keyholder/proxy.sock") from the alerting hosts (as of June 2023 alert[12]001.wikimedia.org).

MediaWiki

It is running Apache and a stripped-down MediaWiki install from Git. It is unaffected by Scap so it should live through any disastrous production deployments. MediaWiki updates are performed manually.

Status page redirect

Our public status page at www.wikimediastatus.net is hosted by Atlassian. They can't also host us at wikimediastatus.net, for technical reasons at the intersection of Atlassian and Markmonitor, our domain management provider (see task T293504).

Therefore wikimediastatus.net resolves to wikitech-static, where Apache is configured to serve an HTTP 301 redirect to www.wikimediastatus.net.

Meta-monitoring

It is also monitoring our monitoring system to ensure that it works properly. Monitoring is performed using the virtual host `icinga-extmon.wikimedia.org` which is configured to only serve the extinfo.cgi script with an acl that only includes the wikitech-static IP's

The current setup involves monitoring Icinga:

  • Checkout of the https://gerrit.wikimedia.org/r/admin/repos/operations/software/external-monitoring gerrit repository into /srv/external-monitoring/.
  • Symlink of the icinga/check_icinga.py script into /usr/local/bin/check_icinga.
  • Configuration for the script in /etc/check_icinga/config.yaml and /etc/check_icinga/contacts.yaml. The contacts configuration file is automatically synced once a day from both production Icinga hosts in a splayed way, so the file is updated at most twice a day, if there is any change. A syntax validation of the files is performed before switching it.
  • To manually validate that the configuration files are valid you can run check_icinga_validate_config.
  • Two crontab entries to run the script against both Icinga hosts, both logging into syslog, plus an additional crontab to notify a restricted group of people in case the configuration is broken.
    */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga2001.wikimedia.org
    */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga1001.wikimedia.org
    11 */2 * * * /usr/bin/systemd-cat -t "check_icinga_validate_config" /root/check_icinga_validate_config_crontab.sh
    

Example of page notifications

Only alerts for the active Icinga host will generate also a page in addition to the email notification.

PROBLEM: Icinga on icinga1001.wikimedia.org is CRITICAL (check email for details)
RECOVERY: Icinga on icinga1001.wikimedia.org is OK (check email for details)

Example of email notification

Subject: PROBLEM: Icinga on icinga2001.wikimedia.org is CRITICAL
From: check_icinga@wikitech-static.wikimedia.org
To: [...SNIP...]

check_icinga@wikitech-static.wikimedia.org found Icinga CRITICAL on icinga2001.wikimedia.org

Issues of attempt 1 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Issues of attempt 2 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Issues of attempt 3 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Subject: RECOVERY: Icinga on icinga2001.wikimedia.org is OK
From: check_icinga@wikitech-static.wikimedia.org
To: [...SNIP...]

check_icinga@wikitech-static.wikimedia.org found Icinga OK on icinga2001.wikimedia.org

How do we maintain it?

Configuration files

The config files on wikitech-static are stored in Git at operations/wikitech-static. Those files are applied more-or-less by hand on the actual server.

Automatic content syncronization

A puppetized cron on wikitech runs /usr/local/sbin/mw-files.sh and /usr/local/sbin/mw-xml.sh which back up the wiki to /a/backup/ and /a/backup/public.

A non-puppetized cron on wikitech-static runs /wikitech-static/wikitechsync/import-wikitech.sh which copies the files from wikitech and installs them.

Manual updates

When updating the MediaWiki deploy on the host, update debs and then upgrade the git repos in /srv/mediawiki/w.

$ ssh root@wikitech-static.wikimedia.org
$ apt-get update && apt-get upgrade
$ cd /srv/mediawiki/w
$ git fetch origin
$ git branch -a
  # Look for the latest REL branch
$ git checkout -b <release branch> origin/<release branch>
$ (cd ./vendor/composer && git reset --hard)
$ git submodule update --init --recursive
$ ./composer.phar update --no-dev
$ git status
  # for each of the untracked extensions:
  $ (cd extensions/<extension> && git fetch origin && git checkout -b <release branch> origin/<release branch>)
$ php maintenance/run.php update.php
$ systemctl reload apache2
  # TODO: what is this for?

VM control

To access the rackspace admin panel that controls the wikitech-static host, visit https://mycloud.rackspace.com. The username and login are available in pwstore in the file named rackspace. From the Servers drop down, choose Cloud Servers, and open the dashboard for wikitech-static-ord.

What alerts should we watch out for?

Wikitech-static will alert if the site goes offline, or if the MediaWiki version falls behind the official stable release as reported by mediawiki.org. Labweb1001 and 1002 will alert if the Special:RecentChanges page on wikitech-static falls more than a couple of days behind the Special:RecentChanges page on wikitech.

Services

External link