Incident documentation meeting/QR201407/group2/notes

From Wikitech

See also:

20140328-DB-Queries

Bryan, Reedy

  • PrivateSettings.php should be in a repo so we can be sure what's changed.
    • (Not Gerrit obvs, maybe living same place as private ops repo or similar?) (reedy)
      • Local on tin in a sub directory I guess. Properly backed up (hashar)
    • AdminSettings.php has the same problem bd808 thinks (bryan)
  • Changes made should go out immediately as they do for all configuration files.
  • Db user and password settings should perhaps go into PrivateSettings (and not be removed from AdminSettings until anyone relying on that file has converted their jobs).
    • Migration
      • Move AdminSettings.php contents into PrivateSettings.php
      • Symlink AdminSettings.php to PrivateSettings.php (saves keeping 2 copies of passwords around, but keeps target file in place till known to be perfectly unused)
    • Only (obvious) usage would seem to be snapshots - "Fixed" in https://gerrit.wikimedia.org/r/#/c/145017/ (stuff needs moving to PrivateSettings.php first)

reedy@ubuntu64-web-esxi:~/git/operations/puppet$ grep -R AdminSettings * modules/snapshot/templates/wq.conf.media.erb:adminsettings=wmf-config/AdminSettings.php modules/snapshot/templates/wikidump.conf.erb:adminsettings=wmf-config/AdminSettings.php modules/snapshot/templates/addschanges.conf.erb:adminsettings=wmf-config/AdminSettings.php modules/snapshot/templates/wq.conf.erb:adminsettings=wmf-config/AdminSettings.php

  • Better coordination (dunno how to implement this one).
    • Header comment in PrivateSettings to tell people to commit (when it's in a repo) and SYNC after changing

20140403-Deploy

Bryan, Reedy (bd808) basically had to run scap twice. scap.py updated

  • go back to the testwikipedia removal discussion (GREG)
    • bug 43722

20140503-Thumbnails

Antoine, Faidon

  • $wgSquidServersNoPurge needed to include the varnish hosts for images
  • due to a change by Brandon ($wgSquidServersNoPurge now uses ranges instead of unique IP), the only times this should need to be updated will be with new rows/DCs/etc, hopefully
  • we still need some sort of auto-graph-watching that notices anomalies/slope changes/etc

20140517-bits

Timo

  • Tracking bug for the outage: bug 65424
  • Monitor for anomalies/spikes in read failures of memcached
    • bugzilla:67817

20140529-appservers

Ori, Reedy

  • we really need to get rid of wikimedia-task-appserver debian package in favor of puppet
  • Reedy will update status, if this even makes sense any more ;)

20140608-Kafka

Andrew O. Most things are done. Two monitoring related RTs to resolve:

20140612-Math

Greg

  • add schema change test on beta cluster bug
  • still need an arch review?

20140618-Wikitech

Andrew B. / Marc-André

  • Need RT filled to add an icinga monitoring of puppet status on wikitech host
    • RT 7842