wikitech-static

From Wikitech
Jump to navigation Jump to search

wikitech-static is an offsite, read-only backup of wikitech.wikimedia.org (Wikitech). It is maintained in order to provide emergency documentation in case of a site-wide outage affecting Wikimedia's production network. It also acts as a temporary meta-monitoring host to monitor Icinga.

If the primary datacenter (and, hence, Wikitech) is down and you need to search for a troubleshooting guide or look at a wiring diagram, wikitech-static.wikimedia.org will be there for you.

What content is on wikitech-static?

Wikitech-static gets a full dump of Wikitech once every day at around noon UTC.  It also attempts to sync image files but there is no guarantee that we're getting full-res versions of those images that are pulled in from commons.  When in doubt you should directly verify that the versions of critical diagrams have enough detail on wikitech-static. This dump includes the SAL, so the log will generally be up to date but (naturally) drifts a bit during the day.

Where is it?

Wikitech-static is hosted on a Rackspace Cloud instance. It is physically located in Rackspace's Chicago datacenter, 'ORD,' which is in Illinois, USA, to not be co-located in the same city of any of our primary datacenters.

What is wikitech-static running?

It's a basic Debian Stretch box.

Mediawiki

It is running Apache and a stripped-down MediaWiki install from Git. It is unaffected by Scap so it should live through any disastrous production deployments. MediaWiki updates are performed manually.

Meta-monitoring

It is also monitoring our monitoring system to ensure that it works properly. The current setup involves monitoring Icinga:

  • Checkout of the https://gerrit.wikimedia.org/r/admin/projects/operations/software/external-monitoring gerrit repository into /srv/external-monitoring/.
  • Symlink of the icinga/check_icinga.py script into /usr/local/bin/check_icinga.
  • Configuration for the script in /etc/check_icinga/config.yaml and /etc/check_icinga/contacts.yaml. When modifying the contacts file follow the current procedure:
    $ cp -a /etc/check_icinga/contacts.yaml /root/check_icinga_contacts.yaml
    $ vi /root/check_icinga_contacts.yaml  # make all modifications
    $ check_icinga_validate_config --contacts /root/check_icinga_contacts.yaml
    # If the file is valid
    $ mv /root/check_icinga_contacts.yaml /etc/check_icinga/contacts.yaml
    $ check_icinga_validate_config
    # Ensure validity of the final file
    
    This allows to ensure that the file is always valid and it will not prevent the script capability to alert, even if a single contact might have typos.
  • Two crontab entries to run the script against both Icinga hosts, both logging into syslog:
    */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga2001.wikimedia.org
    */2 * * * * /usr/bin/systemd-cat -t "check_icinga" /usr/local/bin/check_icinga icinga1001.wikimedia.org
    

Example of page notifications

Only alerts for the active Icinga host will generate also a page in addition to the email notification.

PROBLEM: Icinga on icinga1001.wikimedia.org is CRITICAL (check email for details)
RECOVERY: Icinga on icinga1001.wikimedia.org is OK (check email for details)

Example of email notification

Subject: PROBLEM: Icinga on icinga2001.wikimedia.org is CRITICAL
From: check_icinga@wikitech-static.wikimedia.org
To: [...SNIP...]

check_icinga@wikitech-static.wikimedia.org found Icinga CRITICAL on icinga2001.wikimedia.org

Issues of attempt 1 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Issues of attempt 2 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Issues of attempt 3 of 3:
Event Handlers Enabled? Yes (expected No)
Notifications Enabled? YES (expected NO)
Subject: RECOVERY: Icinga on icinga2001.wikimedia.org is OK
From: check_icinga@wikitech-static.wikimedia.org
To: [...SNIP...]

check_icinga@wikitech-static.wikimedia.org found Icinga OK on icinga2001.wikimedia.org

How do we maintain it?

Automatic content syncronization

A puppetized cron on wikitech runs /usr/local/sbin/mw-files.sh and /usr/local/sbin/mw-xml.sh which back up the wiki to /a/backup/ and /a/backup/public.

A non-puppetized cron on wikitech-static runs /wikitech-static/wikitechsync/import-wikitech.sh which copies the files from wikitech and installs them.

Manual updates

The host can be accessed via ssh with a root login. The password is in the pwstore in the 'wikitech-static' file.

When updating the MediaWiki deploy on the host, update debs and then upgrade the git repos in /srv/mediawiki/w.

$ ssh root@wikitech-static.wikimedia.org
$ apt-get update && apt-get upgrade
$ cd /srv/mediawiki/w
$ git fetch origin
$ git branch -a
  # Look for the latest REL branch
$ git checkout -b <local branch> <upstream branch>
$ git submodule update --init --recursive
$ ./composer.phar update --no-dev
$ php maintenance/update.php
$ service apache2 graceful

VM control

To access the rackspace admin panel that controls the wikitech-static host, visit https://www.rackspace.com/login and select Cloud Control Panel. The username and login are available in the Ops password repo in the file named 'rackspace'.

What alerts should we watch out for?

Wikitech-static will alert if the site goes offline, or if the MediaWiki version falls behind the official stable release as reported by mediawiki.org.  Labweb1001 and 1002 will alert if the Special:RecentChanges page on wikitech-static falls more than a couple of days behind the Special:RecentChanges page on wikitech.

Services

External link