Incident documentation meeting/QR201407/group2/notes
(Redirected from Incident documentation/QR201407/group2/notes)
See also:
- https://wikitech.wikimedia.org/wiki/Incident_documentation/QR201407
- https://wikitech.wikimedia.org/wiki/Incident_documentation/QR201407/group2
20140328-DB-Queries
Bryan, Reedy
- PrivateSettings.php should be in a repo so we can be sure what's changed.
- (Not Gerrit obvs, maybe living same place as private ops repo or similar?) (reedy)
- Local on tin in a sub directory I guess. Properly backed up (hashar)
- AdminSettings.php has the same problem bd808 thinks (bryan)
- (Not Gerrit obvs, maybe living same place as private ops repo or similar?) (reedy)
- Changes made should go out immediately as they do for all configuration files.
- Db user and password settings should perhaps go into PrivateSettings (and not be removed from AdminSettings until anyone relying on that file has converted their jobs).
- Migration
- Move AdminSettings.php contents into PrivateSettings.php
- Symlink AdminSettings.php to PrivateSettings.php (saves keeping 2 copies of passwords around, but keeps target file in place till known to be perfectly unused)
- Only (obvious) usage would seem to be snapshots - "Fixed" in https://gerrit.wikimedia.org/r/#/c/145017/ (stuff needs moving to PrivateSettings.php first)
- Migration
reedy@ubuntu64-web-esxi:~/git/operations/puppet$ grep -R AdminSettings * modules/snapshot/templates/wq.conf.media.erb:adminsettings=wmf-config/AdminSettings.php modules/snapshot/templates/wikidump.conf.erb:adminsettings=wmf-config/AdminSettings.php modules/snapshot/templates/addschanges.conf.erb:adminsettings=wmf-config/AdminSettings.php modules/snapshot/templates/wq.conf.erb:adminsettings=wmf-config/AdminSettings.php
- Better coordination (dunno how to implement this one).
- Header comment in PrivateSettings to tell people to commit (when it's in a repo) and SYNC after changing
20140403-Deploy
Bryan, Reedy (bd808) basically had to run scap twice. scap.py updated
- go back to the testwikipedia removal discussion (GREG)
- bug 43722
20140503-Thumbnails
Antoine, Faidon
- $wgSquidServersNoPurge needed to include the varnish hosts for images
- due to a change by Brandon ($wgSquidServersNoPurge now uses ranges instead of unique IP), the only times this should need to be updated will be with new rows/DCs/etc, hopefully
- we still need some sort of auto-graph-watching that notices anomalies/slope changes/etc
20140517-bits
Timo
- Tracking bug for the outage: bug 65424
- Monitor for anomalies/spikes in read failures of memcached
- bugzilla:67817
20140529-appservers
Ori, Reedy
- we really need to get rid of wikimedia-task-appserver debian package in favor of puppet
- Reedy will update status, if this even makes sense any more ;)
20140608-Kafka
Andrew O. Most things are done. Two monitoring related RTs to resolve:
- https://rt.wikimedia.org/Ticket/Display.html?id=7828
- https://rt.wikimedia.org/Ticket/Display.html?id=7829
20140612-Math
Greg
- add schema change test on beta cluster bug
- still need an arch review?
20140618-Wikitech
Andrew B. / Marc-André
- Need RT filled to add an icinga monitoring of puppet status on wikitech host
- RT 7842