Server Admin Log

From Wikitech
(Redirected from Server admin log)
Jump to navigation Jump to search

2020-10-22

  • 03:37 eileen: civicrm revision changed from 4dce7bf535 to bb7c08bf6d, config revision is 9a522d03dd
  • 03:13 eileen: civicrm revision changed from 3c3dcf80ae to 4dce7bf535, config revision is 9a522d03dd
  • 01:12 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@870829c]: 0.3.52 (duration: 09m 07s)
  • 01:04 ryankemper: Tests passing on canary `wdqs1003`, proceeding with wdqs deploy for rest of fleet
  • 01:03 ryankemper@deploy1001: Started deploy [wdqs/wdqs@870829c]: 0.3.52

2020-10-21

  • 23:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: T266033 (duration: 01m 05s)
  • 23:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: T265751 T265754 (duration: 01m 08s)
  • 21:38 mutante: testreduce1001 assigned 2 more GBs of RAM - rebooting (T257940, T257906)
  • 19:44 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T264963)
  • 19:15 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T264963)
  • 18:13 Urbanecm: Morning B&C window done
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 45312d3: [WikibaseMediaInfo] Fix concept chips array nesting structure (T256431) (duration: 01m 05s)
  • 18:12 mepps: updated payments-wiki-staging from db03677b2d to 5fdd29bc16
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d94e33f: cirrus: Hardcode more_like to codfw cirrus cluster (duration: 01m 05s)
  • 17:56 XioNoX: configure FB PNI in eqdfw
  • 17:43 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.14/skins/WikimediaApiPortal: Backport gerrit:635329, T266021 (duration: 01m 06s)
  • 17:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch ParserCache to JSON on testwiki gerrit:635382 (duration: 01m 05s)
  • 17:24 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 08s)
  • 17:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 06s)
  • 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 mutante: scandium - disabling puppet so that Parsoid team can make some tests on testreduce1001 today
  • 16:46 effie: restart php-fpm and pool mw2252 and mw2328
  • 15:58 Lucas_WMDE: Deployed patch for T260349
  • 15:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:31 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:28 moritzm: updating prometheus-openldap-exporter to 0+git20171128-3 to buster-wikimedia
  • 15:23 jbond42: upgrade puppetlabs-stdlib to 6.5.0 https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278
  • 15:08 moritzm: imported prometheus-openldap-exporter 0+git20171128-3 to buster-wikimedia T264388
  • 15:02 otto@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster (duration: 02m 56s)
  • 15:01 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 otto@deploy1001: Started deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster
  • 14:56 crusnov@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler T242554 (duration: 01m 07s)
  • 14:34 dcausse: restarting blazegraph on codfw servers (T263952)
  • 13:21 moritzm: pooling ldap-replica2003 T264388
  • 13:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.14 (duration: 01m 04s)
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.14
  • 11:40 matthiasmullie: EU B&C done
  • 11:33 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [WikibaseMediaInfo] Add config for related terms API (duration: 01m 04s)
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 785404f: Disable registrations stat on Special:TranslationStats (T264158) (duration: 01m 05s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1156742: Enable ContentTranslation in 5 Wikipedias as a default tool (T264737; T264738; T264739; T264740; T264741) (duration: 01m 30s)
  • 11:00 marostegui: Upgrade db2093's mariadb version T266003
  • 10:58 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=rowiki; T246539)
  • 10:37 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; T246539)
  • 10:01 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; T246539)
  • 10:00 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; T246539)
  • 09:59 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 100% - T258405
  • 09:42 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; T246539)
  • 09:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; T246539)
  • 09:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; T246539)
  • 09:37 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; T246539
  • 09:30 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; T246539)
  • 09:23 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:21 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:52 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; T246539)
  • 08:50 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; T246539
  • 08:46 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # T246539
  • 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:38 root@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:33 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:10 XioNoX: Upgrade Routinator 3000 to 0.8.0 on rpki1001 - T266001
  • 08:09 XioNoX: add Routinator 3000 0.8.0 to apt - T266001
  • 07:58 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/635319
  • 04:35 ryankemper: re-enabled icinga notifications on all wdqs hosts now that `wdqs-updater` is healthy

2020-10-20

  • 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
  • 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
  • 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
  • 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
  • 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
  • 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
  • 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
  • 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
  • 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:48 effie: depooling mw2328 - T266052
  • 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args (duration: 01m 31s)
  • 15:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args
  • 15:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: fee2d3b: Prevent uncaught warnings/exception on Special:AbuseFilter (T265994) (duration: 01m 03s)
  • 14:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: 00ef00f: Prevent uncaught warnings/exception on Special:AbuseFilter (T265994) (duration: 01m 01s)
  • 14:48 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/FileImporter/: 5eee9b7: Set originalRequest (incl. X-Forwarded-For) for remote edits (T265810) (duration: 01m 06s)
  • 14:16 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/FileImporter/: 5f8d3de: Set originalRequest (incl. X-Forwarded-For) for remote edits (T265810) (duration: 01m 09s)
  • 14:15 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master u=)]$ sudo /usr/local/sbin/fix-staging-perms
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13033 and previous config saved to /var/cache/conftool/dbconfig/20201020-135436-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 80%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13032 and previous config saved to /var/cache/conftool/dbconfig/20201020-133933-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 60%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13031 and previous config saved to /var/cache/conftool/dbconfig/20201020-132430-root.json
  • 13:19 XioNoX: install routinator 3000 0.8.0 on rpki2001 - T266001
  • 13:16 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.14
  • 13:11 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.14 (duration: 58m 03s)
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13030 and previous config saved to /var/cache/conftool/dbconfig/20201020-130926-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13029 and previous config saved to /var/cache/conftool/dbconfig/20201020-125423-root.json
  • 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:13 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.14
  • 11:37 liw: 1.36.0-wmf.14 was branched at 1b7b5f7 for T263180
  • 11:35 Lucas_WMDE: EU backport/config window done
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Backport: SearchSatisfaction: Set isAnon field (T259250) (duration: 00m 57s)
  • 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set Wikidata MF to collapse sections by default (T239195) (duration: 00m 56s)
  • 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove noratelimit from Wikidata bot group (T258354) (duration: 00m 56s)
  • 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 10:04 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 09:59 dcausse: T255399: resuming wdqs-data-reload manually from chunk no 776 on wdqs1009
  • 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .

2020-10-19

  • 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 23:56 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation (duration: 04m 33s)
  • 23:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation
  • 23:02 mutante: etherpad got restarted with new config options related to rate limiting - hopefully this fixed T265490
  • 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:19 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions (duration: 04m 48s)
  • 21:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions
  • 21:01 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:41 eileen: drush vset match_on_import 1
  • 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp2020.codfw.wmnet
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item (duration: 01m 03s)
  • 20:16 mutante: decom'ing wtp201[0-9].codfw.wmnet (pooled=inactive) T265558
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:15 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp201[0-9].codfw.wmnet
  • 20:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item
  • 20:09 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=parsoid,service=canary
  • 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:01 mutante: decom'ing wtp200[1-9].codfw.wmnet (pooled=inactive) T265558
  • 20:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp200[1-9].codfw.wmnet
  • 19:57 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads (duration: 03m 35s)
  • 19:41 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads
  • 19:35 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:34 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:33 mutante: wtp2001 - sudo confctl decommission
  • 19:29 dzahn@cumin1001: conftool action : set/weight=0; selector: dc=codfw,cluster=parsoid,service=canary
  • 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Set default variant to D on trwiki (T243445, T265556) (duration: 00m 56s)
  • 18:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 18902aa: Change votewiki language temporarily to fa for fawiki elections (T262689) (duration: 00m 56s)
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on trwiki (T243445) (duration: 00m 57s)
  • 18:29 tzatziki: removing 10 files for legal compliance
  • 18:24 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/MobileFrontend/: Fix mobile diff redirect when curid parameter is present (T265654) (duration: 00m 58s)
  • 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable variant C/D for new users (T265556) (duration: 00m 56s)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop wgHiddenPrefs hack for VE beta feature (T254349) (duration: 00m 56s)
  • 17:53 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:59 Urbanecm: mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=smnwiki --cluster=all
  • 15:31 elukey: update puppet compilers' facts
  • 14:36 bpirkle@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:634841 Add api.wikimedia.org to the list of allowed CORS origins (duration: 00m 57s)
  • 14:32 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 55s)
  • 14:30 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 56s)
  • 14:15 moritzm: installing llvm-toolchain-7 bugfix updates from Buster point release
  • 13:34 Urbanecm: Start of `[urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist` (T246539; wikis.dblist is medium wikis from group2.dblist)
  • 13:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:31 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:26 moritzm: import prometheus-openldap-exporter 0+git20171128-2+deb10u1 for buster-wikimedia T264388
  • 12:48 moritzm: installing httpcomponents-client security updates on Buster
  • 12:26 Urbanecm: Creation of smnwiki is done (T264859)
  • 12:25 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 56s)
  • 12:22 urbanecm@deploy1001: Synchronized langlist: Creating smnwiki (T264859) (duration: 00m 56s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:16 marostegui: Sanitize smnwiki on db1124:3315 and db2094:3315 - T264900
  • 12:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating smnwiki (T264859) (duration: 00m 56s)
  • 12:15 marostegui: Deploy schema change on smnwiki T265321 T264900
  • 12:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating smnwiki (T264859)
  • 12:12 urbanecm@deploy1001: Synchronized dblists: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:11 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:10 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating smnwiki (T264859) (duration: 00m 56s)
  • 11:51 moritzm: updating idp-test1001 to CAS 6.2.4
  • 11:46 moritzm: updating idp-test2001 to CAS 6.2.4
  • 11:43 Urbanecm: End of `[urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist` # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
  • 11:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` (T246539)
  • 11:40 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
  • 11:31 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:24 Urbanecm: EU B&C window done
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ce92c98: Restore bureaucrat abilities at uzwiki (T265746) (duration: 00m 56s)
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 26b9726: Disable EditorJourney (UnderstandingFirstDay) (T252391) (duration: 01m 10s)
  • 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:13 Urbanecm: Manually run `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` for several small group2 wikis (T246539)
  • 10:57 Urbanecm: Start `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` in a tmux session named updateVarDumps at mwmaint2001 (T246539)
  • 10:53 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # T246539
  • 09:09 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 08:40 jayme: updated helm to 2.16.12-1 on deploy*,chartmuseum*,contint*
  • 08:37 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog2001 - T259780
  • 08:31 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:26 jayme: updated helm to 2.16.12-1 on deploy2001
  • 08:24 jayme: imported helm 2.16.12-1 to buster-wikimedia stretch-wikimedia jessie-wikimedia - T263616
  • 08:01 godog: re-enable compaction for prometheus[12]003 - T261281
  • 07:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 07:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 07:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 ', diff saved to https://phabricator.wikimedia.org/P13022 and previous config saved to /var/cache/conftool/dbconfig/20201019-071614-marostegui.json
  • 06:46 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27 (duration: 00m 10s)
  • 06:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27

2020-10-17

  • 13:22 Urbanecm: [urbanecm@mwmaint2001 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Fæ . # T264529

2020-10-16

  • 21:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:43 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:25 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:39 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:37 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:43 thcipriani: restarting gerrit due to gc thrashing
  • 16:25 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s)
  • 16:21 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors
  • 15:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:36 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 15:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:01 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:41 effie: pooling mw2279.codfw.wmnet T264698
  • 12:11 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:09 jiji@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:35 reedy@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping T265571 (duration: 01m 12s)
  • 09:23 ema: text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 09:19 ema: upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 09:08 ema: upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 09:03 XioNoX: eqsin, push CR 634473
  • 09:01 ema: text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:53 ema: upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 08:52 XioNoX: add BGP_IXP_RS_in to eqsin RS BGP sessions
  • 08:48 ema: text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:29 ema: upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 08:24 ema: text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:09 elukey: reboot stat1005/stat1008 to pick up correct GPU settings
  • 08:09 ema: upload@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 07:59 ema: text@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 07:19 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table (duration: 04m 22s)
  • 07:15 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table
  • 06:57 XioNoX: enable cr2-eqdfw:xe-0/1/2
  • 02:14 eileen: civicrm revision changed from 585eb835d8 to 3c3dcf80ae, config revision is f76d7849bc
  • 01:01 ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`
  • 00:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 00:56 cdanis@cumin1001: START - Cookbook sre.network.cf

2020-10-15

  • 23:49 ryankemper: Began in-place reindex of `eqiad`, `codfw`, and `cloudelastic`. Running on `ryankemper@mwmaint2001` under tmux sessions `inplace_reindex_[eqiad, codfw, cloudelastic]`
  • 23:00 krinkle@deploy1001: Synchronized wmf-config/env.php: I245e84e0b8c (duration: 01m 10s)
  • 22:09 cdanis: previous sre.network.cf invocation was a no-op; just checking status
  • 22:08 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 22:08 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 22:06 mutante: depooled remaining wtp* servers in codfw. old parsoid servers, new servers are parse2* (T265558)
  • 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp2020.codfw.wmnet
  • 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[6-9].codfw.wmnet
  • 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[0-5].codfw.wmnet
  • 20:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:27 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources (duration: 06m 22s)
  • 19:43 marxarelli: all wikis promoted to 1.36.0-wmf.13 (T263179)
  • 19:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources
  • 19:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.13
  • 19:30 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:20 catrope@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing (T265500) (duration: 01m 29s)
  • 19:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing (T265500) (duration: 01m 51s)
  • 19:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/Echo/: Drop text indent in modern Vector (T264339) (duration: 01m 51s)
  • 19:09 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/: Vertically align personal tools (T264339) (duration: 01m 43s)
  • 19:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Revert "clientError: Adds is_logged_in tag to aid filtering" (T256173) (duration: 01m 58s)
  • 19:04 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/UploadWizard/: Work around LESS calculating calc() values wrong (T265560) (duration: 02m 07s)
  • 18:32 mutante: depooling wtp2005 through wtp2009 (parsoid, old server generation) T265558
  • 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[6-9].codfw.wmnet
  • 18:07 mutante: mx1001/mx2001: made previous live hack official and added benefactors@wikipedia alias, re-enabling puppet
  • 17:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:19 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 jbond42: deleteing old pcc reports in compiler1002 to free disk space
  • 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 17:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 16:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 16:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 16:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 16:51 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 16:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 16:46 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 16:11 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/CheckUser/includes/specials/: fd94002: Revert "Validate username input before constructing subpage links" (T265606) (duration: 02m 48s)
  • 15:50 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:35 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:19 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 15:07 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs (duration: 00m 59s)
  • 15:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs
  • 14:51 elukey: roll restart druid-historical daemons on druid1004-1008 to pick up new conn pooling changes
  • 14:51 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 14:45 jbond42: enable puppet post deploy puppetdb change blacklisting dynamic facts
  • 14:41 ema: varnish 6.0.6-1wm2 uploaded to apt.wikimedia.org component/varnish6 T264074
  • 14:38 jbond42: disable puppet to deploy puppetdb change blacklisting dynamic facts
  • 14:21 ema: cp3050: systemctl reload varnishkafka-webrequest.service T264074
  • 14:21 jayme: imported doxygen_1.8.19-1~deb10+wmf1 to component/ci buster-wikimedia - T265579
  • 14:12 ema: cp3050: restart varnishkafka-webrequest w/ libvarnishapi2 6.0.6-1wm2 T264074
  • 14:11 ema: cp3050: upgrade varnish to 6.0.6-1wm2 T264074
  • 14:10 ema: cp3050: upgrade varnish to 6.0.6-1wm2 T26407
  • 12:58 gilles@deploy1001: Finished deploy [performance/navtiming@dff55f8]: (no justification provided) (duration: 00m 05s)
  • 12:58 gilles@deploy1001: Started deploy [performance/navtiming@dff55f8]: (no justification provided)
  • 12:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:47 vgutierrez: restart ats-backend on cp3050
  • 10:00 akosiaris: T264209. Initiate a docker pull of docker-registry.discovery.wmnet/mwcachedir:0.0.1 from all kubernetes and kubernetes staging nodes.
  • 08:17 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 04:27 ryankemper: Rolling upgrade for cirrus `codfw` complete
  • 04:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 02:18 ryankemper: Rolling upgrade for cirrussearch `codfw` beginning
  • 02:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 02:14 ryankemper: Rolling upgrade for cirrussearch `eqiad` is complete
  • 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 00:36 ryankemper: Beginning rolling upgrade for cirrussearch `eqiad`. Cookbook will restart elasticsearch on 36 nodes total, 3 nodes at a time
  • 00:36 eileen: tools revision changed from d4e08c52de to a2a91d6c6a
  • 00:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 00:24 twentyafterfour: phabricator update was uneventful
  • 00:13 twentyafterfour: updating phabricator

2020-10-14

  • 23:35 foks: Removing one further file for legal compliance
  • 23:28 foks: Removing nine files for legal compliance
  • 23:11 ebernhardson: Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day
  • 23:08 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s)
  • 23:00 dwisehaupt: all payments hosts in eqiad are now running the REL1_35 code.
  • 22:41 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s)
  • 22:38 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression
  • 22:13 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
  • 22:12 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
  • 22:08 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s)
  • 22:04 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive
  • 22:01 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: Make attribution source logic more defensive T263599 (duration: 01m 05s)
  • 21:51 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 (T123582) (duration: 01m 03s)
  • 21:33 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: Stylesheet needs to be compatible with cached HTML T265543 (duration: 01m 07s)
  • 20:39 marxarelli: group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: T263179)
  • 20:37 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11
  • 20:32 marxarelli: rolling back group1 due to malformed html in nav menu
  • 19:46 marxarelli: 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates (T263179)
  • 19:39 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
  • 19:38 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
  • 19:33 mutante: mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed
  • 19:14 mutante: depooling 5 of the older parsoid servers in codfw
  • 19:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet
  • 18:28 Urbanecm: wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # T265347
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d6a56bb: Add rollbacker right on uzwiki (T265509) (duration: 01m 04s)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 0da8999: Add spamblacklistlog as a default right for the CU log user (T239288) (duration: 01m 05s)
  • 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:59 elukey: drain + reboot an-worker1100 to pick up GPU settings - T255138
  • 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:29 elukey: drain + reboot an-worker110[1,2] to pick up GPU settings - T255138
  • 15:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:24 jayme: enabled and ran puppet on deploy1001 - T260917
  • 14:56 elukey: drain + reboot an-worker109[8,9] to pick up GPU settings - T255138
  • 14:55 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 14:12 jayme: disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - T260917
  • 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T264209
  • 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T265183
  • 13:53 jbond42: enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs
  • 13:48 jbond42: disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs
  • 12:46 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - T258405
  • 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:43 moritzm: imported php-memcached, php-redis to component/icu63 T264991
  • 11:25 Urbanecm: EU B&C window completed
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c63632d: Enable DiscussionTools as a beta feature on 30 more wikis (T264693) (duration: 01m 15s)
  • 11:16 moritzm: imported php-igbinary, php-apcu-bc to component/icu63 T264991
  • 09:59 moritzm: imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 T264991
  • 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 08:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:09 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json
  • 06:12 marostegui: Change UNIQUE into KEY on enwikivoyage.imagelinks T265445
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json

2020-10-13

  • 23:22 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: Revert removal of variant A (T265372) (duration: 01m 04s)
  • 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Rename GrowthExperiments help desk on ptwiki (T265214) (duration: 01m 04s)
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable event logging in MediaViewer (T260582) (duration: 01m 04s)
  • 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry on frwiki, fawiki, dewiki, cswiki (T264780) (duration: 01m 04s)
  • 21:16 mutante: icinga had gerrit health alert but did not notice an issue myself and was gone next check
  • 21:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:44 mutante: bast1002 - apt-get autoremove - cleans up golang and ruby packages
  • 20:44 mutante: bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)
  • 20:15 ebernhardson: unban elastic2029 from production-search-psi-codfw
  • 20:14 ebernhardson: restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell
  • 20:06 marxarelli: 1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates (T263179)
  • 20:03 ebernhardson: add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell
  • 19:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13
  • 19:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:40 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)
  • 19:00 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.13
  • 18:58 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)
  • 18:56 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)
  • 18:53 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)
  • 18:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11
  • 18:21 marxarelli: 1.36.0-wmf.11 promoted to group1. no new errors (T263177). promoting to all wikis
  • 18:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:09 robh: scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw T238036
  • 18:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:01 robh: scs-c1-codfw firmware update via T238036
  • 17:47 marxarelli: 1.36.0-wmf.13 branched at a6be801 for T263179
  • 17:35 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)
  • 17:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
  • 17:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 marxarelli: 1.36.0-wmf.11 promoted to group0. no new errors (T263177). preparing to promote to group1
  • 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 16:39 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 16:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)
  • 16:26 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc
  • 15:56 papaul: power down ms-be2036 for maintenance
  • 15:02 godog: bounce logstash on logstash1007, GC death
  • 14:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:18 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5b28fd6: Add setmentor to wgAvailableRights (duration: 00m 59s)
  • 13:42 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:40 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:15 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # T265336
  • 13:08 moritzm: imported php-mailparse, php-mongodb, php-msgpack to component/icu63 T264991
  • 12:50 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # T265336
  • 12:49 Urbanecm: End of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix` # T265336
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 for on-site maintenance T263837 ', diff saved to https://phabricator.wikimedia.org/P12975 and previous config saved to /var/cache/conftool/dbconfig/20201013-124940-marostegui.json
  • 12:20 moritzm: imported dh-php, php-acpu, php-imagick to component/icu63 T264991
  • 11:22 moritzm: imported php-defaults, php-excimer, php-luasandbox, php-geoip to component/icu63 T264991
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 90028b4: Add suppressredirect right to reviewers on bnwiki (T265169) (duration: 00m 58s)
  • 11:14 Urbanecm: Start of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix # T265336`
  • 11:13 volans: installed spicerack_0.0.43-1+deb10u1_amd64.deb on cumin2001 , need to wait a long-rnning cookbook to end to upgrade both hosts
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e61fceb: Add namespace aliases for Turkish Wikipedia (T265336) (duration: 00m 59s)
  • 10:47 jayme: no-change rolling restart of push-notifications in codfw - T265258
  • 10:29 volans: upgrading spicerack on cumin2001 to 0.0.44
  • 10:19 ema: cp3050: clear varnishkafka-webrequest's vut->sighup via stap T264074
  • 10:09 ema: cp3050: *reload* varnishkafka-webrequest T264074
  • 10:04 volans: uploaded spicerack_0.0.44 to apt.wikimedia.org buster-wikimedia
  • 09:55 ema: cp3054: systemctl restart varnishkafka-webrequest.service T264074
  • 09:51 ema: cp3052: systemctl restart varnishkafka-webrequest.service T264074
  • 09:39 kormat: running schema change against s1 in eqiad T259831
  • 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:32 ema: cp3050: set grouping by request (vut->g_arg = 2) on varnishkafka-webrequest T264074
  • 08:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:55 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:43 kormat: running schema change against s3 in eqiad T259831
  • 07:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:37 moritzm: installing ruby security updates on stretch
  • 07:02 moritzm: installing PHP 7.0 security updates
  • 06:39 moritzm: Installing httpcomponents-client security updates for Stretch
  • 05:35 marostegui: Set global innodb_change_buffering = inserts; on pc2009 T263443

2020-10-12

  • 17:03 jayme: fixed /var/lock/ permission (1777) on ms-be2036 - T265208
  • 15:41 godog: roll-restart logstash5 in codfw
  • 14:44 _joe_: freed 1.5 GB of space on ms-be2036 by running "apt-get clean"
  • 14:05 moritzm: uploaded php7.2 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 to component/icu63 T264991
  • 12:39 moritzm: installing rails security updates on Stretch
  • 12:26 moritzm: installing spice security updates on Buster
  • 11:38 Urbanecm: EU B&C done
  • 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fff2532: [testwiki, test2wiki] Allow bureaucrats to grant import rights (duration: 00m 58s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4966e8a: Enable wgCheckUserLogLogins at all wikis but few large wikis (T253802) (duration: 00m 58s)
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Require autoconfirmed status to edit Wikidata Properties (T254280) (duration: 01m 00s)
  • 10:26 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:26 hnowlan: roll-restarting restbase201[345678] for cert refresh
  • 08:50 moritzm: uploaded libxml2 2.9.4+dfsg1-2.2+deb9u3+wmf1 to component/icu63 T264991
  • 07:54 godog: reboot ms-be2036 - T265208
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime

2020-10-10

2020-10-09

  • 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Wikidata (T264799) (duration: 00m 59s)
  • 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Commons (T264799) (duration: 00m 59s)
  • 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role (T260271)
  • 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
  • 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group1, except Commons/Wikidata (T264799) (duration: 00m 57s)
  • 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 04s)
  • 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group0 (T264799) (duration: 00m 59s)
  • 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 06s)
  • 22:01 tgr_: rolling out T264799#6533622
  • 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # T263935
  • 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
  • 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
  • 20:04 dwisehaupt: upgrading payments1001 to buster
  • 19:14 dwisehaupt: upgrading payments1002 to buster
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:30 dwisehaupt: upgrading payments1003 to buster
  • 17:53 dwisehaupt: upgrading payments1004 to buster
  • 17:52 cstone: civicrm revision changed from b86a15a430 to 585eb835d8, config revision is 57843925bb
  • 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:45 jayme: helm rollback push-notification in eqiad to revision 8
  • 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:07 XioNoX: remove user from all network devices
  • 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
  • 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:36 moritzm: installing xen security updates for buster (libs only)
  • 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission

2020-10-08

  • 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
  • 23:37 ryankemper: `cloudelastic1005` done
  • 23:31 ryankemper: `cloudelastic1004` done
  • 23:27 ryankemper: `cloudelastic1003` done
  • 23:23 ryankemper: `cloudelastic1002` done
  • 23:16 tgr_: Evening deploys done
  • 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
  • 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes everywhere (T264793) (duration: 01m 01s)
  • 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
  • 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
  • 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
  • 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
  • 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
  • 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
  • 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
  • 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - T264793 (duration: 01m 01s)
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:43 volans: deploying Netbox DNS zone consolidation - T264273
  • 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
  • 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
  • 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide: (duration: 00m 06s)
  • 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
  • 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate by default on production (T264357) (duration: 01m 06s)
  • 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
  • 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
  • 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - T210137
  • 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
  • 16:19 hashar: Restarting CI Jenkins
  • 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:21 marostegui: Set global innodb_change_buffering = all; on pc2009 T263443
  • 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 T264991
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:29 kart_: Updated cxserver to 2020-10-08-053343-production (T264407, T264859)
  • 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
  • 10:37 moritzm: installing Postgres security updates on netboxdb1001
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
  • 10:32 moritzm: installing Postgres security updates on netboxdb2001
  • 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
  • 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 godog: roll-restart swift-object-replicator on ms-be2* - T261633
  • 08:19 kormat: running schema change against s8 in eqiad T259831
  • 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 gehel: repooling wdqs2002
  • 07:55 marostegui: Rebuild db2125 from snapshots - T260670
  • 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
  • 07:40 gehel: depooled wdqs2002 to catch up on lag
  • 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
  • 07:23 moritzm: installing pyzmq updates from Buster point release
  • 07:00 dcausse: depooling wdqs2002 (catching-up lag)
  • 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) T242453
  • 06:51 _joe_: enable notifications for wdqs-ssl-codfw
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:05 ejegg: updated fundraising python tools from 5515923ef7 to d4e08c52de
  • 00:31 tgr_: evening deploys done
  • 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (again, forgot to rebase the previous time) (duration: 00m 59s)
  • 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (duration: 00m 57s)
  • 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group0 (T264793) (duration: 00m 58s)

2020-10-07

  • 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: Log when SessionManager is emitting cookies (T264793) (duration: 01m 00s)
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
  • 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
  • 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in T264859. https://en.wikipedia.org/wiki/Inari_Sami | https://iso639-3.sil.org/code/smn |
  • 18:30 ryankemper: search team's backport deploy is complete
  • 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: cloudelastic: envoy sits in front now (T263073) (duration: 00m 58s)
  • 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'`
  • 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
  • 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
  • 18:20 ryankemper: (backport) HEAD set to 834b457 as expected
  • 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - T261260 (duration: 01m 01s)
  • 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680 on deployment staging area and mw2001
  • 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 jgleeson: updated civicrm from 39b4f954ed to b86a15a430
  • 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo T242602
  • 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - T259780
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 (T263986)
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
  • 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 04s)
  • 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
  • 11:22 Urbanecm: EU B&C window done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f85bc30: Enable bot passwords at all fishbowl and private wikis (T258356) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
  • 11:14 urbanecm@deploy1001: sync-file aborted: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6cdeea2: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia (T264161) (duration: 00m 59s)
  • 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 T263443
  • 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups T259831', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
  • 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
  • 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 T264755 ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
  • 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - T264588
  • 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
  • 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl T264700', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
  • 07:14 marostegui: Stop MySQL es2015 for decommissioning T264700
  • 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 02:37 eileen: civicrm revision changed from a30da7f92a to 39b4f954ed, config revision is 0ca9a3a055
  • 01:00 cdanis: repool esams; cr2-esams router upgrade complete
  • 00:43 cdanis: T259621 cdanis@re1.cr2-esams> request chassis routing-engine master switch
  • 00:40 cdanis: T259621 cdanis@re1.cr2-esams> request system reboot other-routing-engine
  • 00:36 cdanis: T259621 cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
  • 00:26 cdanis: T259621 cdanis@re0.cr2-esams> request chassis routing-engine master switch
  • 00:22 cdanis: T259621 cdanis@re0.cr2-esams> request system reboot other-routing-engine
  • 00:15 cdanis: T259621 cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
  • 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs (T252526)

2020-10-06

  • 23:55 mutante: 🖧 switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* (T252526) 🖧
  • 23:53 mutante: 🖧 switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* (T252526) 🖧
  • 23:52 mutante: 🖧 switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* (T252526) 🖧
  • 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
  • 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
  • 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
  • 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
  • 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:40 Urbanecm: Morning B&C done
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: 2118d26: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 00s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: d428ccb: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 03s)
  • 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 T263493 T259622 (duration: 00m 58s)
  • 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 T263493 T259622 (duration: 00m 59s)
  • 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 T264043 (duration: 00m 59s)
  • 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 T264637 (duration: 00m 58s)
  • 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 T264637 (duration: 00m 58s)
  • 15:41 godog: centrallog* delete archived logs from old, single file, organization
  • 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
  • 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - T263789
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - T262946
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
  • 14:36 hnowlan: repooling restbase2009
  • 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
  • 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
  • 14:08 marostegui: Reboot db1076 for kernel upgrade T264755
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 14:03 marostegui: Power cycle db1076 T264755
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
  • 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
  • 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
  • 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - T264157
  • 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 T263443
  • 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
  • 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
  • 12:08 jbond42: deploy puppetlabs-stdlib 5.2
  • 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # T264430 # P12930
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 07c19f9: arbcom_ruwiki: Set AK as alias for NS_PROJECT (T264430) (duration: 00m 58s)
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b1a4fa: ruewiki: Add rollbacker, grantable and revokable by sysops (T264147) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5cc7027: Allow bureaucrats to remove sysop permissions on Commons (T261481) (duration: 00m 58s)
  • 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5f9721b: GrowthExperiments: Change Help Page URL for kowiki (T254364) (duration: 01m 00s)
  • 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
  • 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:48 effie: set mw2279.codfw.wmnet as inactive T264698
  • 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
  • 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
  • 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
  • 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
  • 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
  • 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
  • 09:59 effie: enable puppet on mc20*
  • 09:41 effie: enable puppet on mc10*
  • 09:38 effie: disable puppet on mc*
  • 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
  • 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
  • 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 T263443
  • 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo T264700 T264386
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 T264700 ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
  • 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl T264386', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json

2020-10-05

  • 23:11 ejegg: updated payments staging from 52704ffe24 to db03677b2d
  • 22:27 mutante: removing shinken puppet module and role
  • 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster T264053
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings T264053
  • 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings T264053
  • 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings T264053
  • 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
  • 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
  • 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
  • 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
  • 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: T263133 T264035 (duration: 22m 23s)
  • 14:25 elukey: shutdown an-master1001 for ram expansion
  • 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: T263133 T264035
  • 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:54 elukey: shutdown stat1005 for ram upgrade
  • 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
  • 12:39 moritzm: installing curl security updates on remaining hosts
  • 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" (T264295) (duration: 00m 59s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: be73f15: Move changetags right from users to sysop [trwiki] (T264508) (duration: 00m 59s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cd30b62: wgSkipSkins: Exclude contenttranslation skin from skin options for users (T263093) (duration: 00m 59s)
  • 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 T264398
  • 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 T264398
  • 10:08 moritzm: installing ldap-replica1002 T264390
  • 09:52 moritzm: installing ldap-replica1001 T264390
  • 09:22 moritzm: installing ldap-replica2003 T264390
  • 09:02 hnowlan: bootstrapping restbase1030-b
  • 08:57 moritzm: installing ldap-replica2004 T264390
  • 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
  • 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
  • 08:23 godog: prometheus codfw/ops, add 100G to the LV
  • 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 07:46 marostegui: Stop mysql on es2017 T264386
  • 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 06:52 XioNoX: add static NAT to pfw3-eqiad - T264356
  • 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 T264386 ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json

2020-10-03

  • 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: 840545f: Restrict flow-hide right to autoconfirmed users on zhwiki (T264489) (duration: 01m 17s)
  • 00:08 ejegg: updated fundraising CiviCRM from 256adda03c to a30da7f92a

2020-10-02

  • 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
  • 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:27 effie: enable puppet on mw2271
  • 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
  • 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
  • 17:15 mutante: submitted puppet refactoring change on maps servers
  • 16:49 effie: disable puppet on mw2271 and briefly depool it
  • 15:39 _joe_: restarting redis on rdb2003, instance 6380
  • 15:28 hnowlan: bootstrapping restbase1030-a
  • 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server T261531 4573776bd 2fb4c20ae (duration: 01m 01s)
  • 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
  • 14:08 effie: enable puppet on mwdebug1001
  • 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
  • 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis (T258356)
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
  • 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
  • 12:26 effie: disable puppet on mwdebug1001
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
  • 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
  • 12:05 hnowlan: bootstrapping restbase1029-c
  • 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
  • 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
  • 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
  • 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
  • 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
  • 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255875 T255869
  • 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875 T255869
  • 09:12 jayme: running puppet on lvs servers - T255875 T255869
  • 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 (T264221)
  • 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:07 hnowlan: bootstrapping restbase1029-b cassandra
  • 09:05 hashar: gerrit: running garbage collector
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
  • 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
  • 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
  • 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 08:29 moritzm: installing pyzmq bugfix update from buster point release
  • 08:24 moritzm: installing nginx security updates on puppetdb*
  • 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
  • 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 07:42 moritzm: installing libcommons-compress-java security updates
  • 07:35 godog: swift codfw-prod bump weight for ms-be2057 - T261633
  • 07:29 godog: prometheus codfw/k8s, add 50G to the LV
  • 07:23 moritzm: installing libx11 security updates on buster
  • 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at T264362
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl T264261', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json

2020-10-01

  • 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
  • 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
  • 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at 7ab9a74 for T264365
  • 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well T264363
  • 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at 796693c for T264365
  • 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
  • 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 refs T263177 (duration: 01m 06s)
  • 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11 refs T263177
  • 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train T264257 T263177 (duration: 00m 59s)
  • 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days T264053 (duration: 00m 59s)
  • 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 13m 42s)
  • 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 01m 34s)
  • 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
  • 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:46 volans: migrating esams DNS records to the autogenerated ones from Netbox - T258729
  • 16:19 bblack: rebooting lvs1016 to a fresh state for interface config and error counters, etc - T264227
  • 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously - T264227
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously
  • 14:55 jayme: running ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889 on lvs1015.eqiad.wmnet - T244843 T255878
  • 14:55 moritzm: installing npm security updates on buster
  • 14:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:53 jayme: running ipvsadm -D -t 10.2.1.10:8081; ipvsadm -D -t 10.2.1.47:8889 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:48 jayme: restarting pybal on lvs2010.codfw.wmnet - T244843 T255878
  • 14:42 jayme: running puppet on lvs servers - T244843 T255878
  • 14:35 Urbanecm: Create bot_passwords table at all private wikis (T258356)
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12886 and previous config saved to /var/cache/conftool/dbconfig/20201001-142156-kormat.json
  • 14:14 andrewbogott: reimaging cloudvirt-wdqs1001 to buster
  • 14:12 effie: enable puppet on mw2271
  • 14:08 moritzm: installing pillow security updates
  • 14:06 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12885 and previous config saved to /var/cache/conftool/dbconfig/20201001-140653-kormat.json
  • 13:59 moritzm: installing nginx security updates on schema*
  • 13:51 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12884 and previous config saved to /var/cache/conftool/dbconfig/20201001-135149-kormat.json
  • 13:50 klausman: rebooting an-worker1096 for cluster maintenance
  • 13:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:43 vgutierrez: use synthetic warning for 2% of ECDHE-ECDSA-AES128-SHA pageviews - T258405
  • 13:29 moritzm: restarting mw canaries to pick up curl update
  • 13:22 moritzm: installing curl security updates on stretch
  • 12:57 kormat@cumin1001: dbctl commit (dc=all): 'db2136 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12883 and previous config saved to /var/cache/conftool/dbconfig/20201001-125707-kormat.json
  • 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12882 and previous config saved to /var/cache/conftool/dbconfig/20201001-123925-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12881 and previous config saved to /var/cache/conftool/dbconfig/20201001-122422-kormat.json
  • 12:15 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 00m 59s)
  • 12:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 01m 00s)
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12880 and previous config saved to /var/cache/conftool/dbconfig/20201001-120919-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12879 and previous config saved to /var/cache/conftool/dbconfig/20201001-115415-kormat.json
  • 11:14 arturo: pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 (T263284)
  • 11:09 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=kuwiktionary --fix # T262046
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 58a8c82: kuwiktionary: Create Jinûvesazî namespace (T262046) (duration: 01m 01s)
  • 10:47 kormat@cumin1001: dbctl commit (dc=all): 'db2119 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12878 and previous config saved to /var/cache/conftool/dbconfig/20201001-104716-kormat.json
  • 10:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:55 hnowlan: adding buster host restbase1028-b to cassandra
  • 08:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:38 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P12877 and previous config saved to /var/cache/conftool/dbconfig/20201001-083321-marostegui.json
  • 08:28 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:27 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:22 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 ', diff saved to https://phabricator.wikimedia.org/P12875 and previous config saved to /var/cache/conftool/dbconfig/20201001-081308-marostegui.json
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P12874 and previous config saved to /var/cache/conftool/dbconfig/20201001-071442-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091 ', diff saved to https://phabricator.wikimedia.org/P12873 and previous config saved to /var/cache/conftool/dbconfig/20201001-071413-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12872 and previous config saved to /var/cache/conftool/dbconfig/20201001-071347-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12871 and previous config saved to /var/cache/conftool/dbconfig/20201001-071321-marostegui.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2083', diff saved to https://phabricator.wikimedia.org/P12870 and previous config saved to /var/cache/conftool/dbconfig/20201001-071241-marostegui.json
  • 07:12 elukey: restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2083', diff saved to https://phabricator.wikimedia.org/P12869 and previous config saved to /var/cache/conftool/dbconfig/20201001-071155-marostegui.json
  • 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Make es2033 master of es2 T261717', diff saved to https://phabricator.wikimedia.org/P12867 and previous config saved to /var/cache/conftool/dbconfig/20201001-063104-marostegui.json
  • 06:18 jayme: imported envoyproxy 1.15.1 to buster-wikimedia, stretch-wikimedia - T264157
  • 05:45 marostegui: Stop MySQL on es2011 T264261
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 T264261', diff saved to https://phabricator.wikimedia.org/P12866 and previous config saved to /var/cache/conftool/dbconfig/20201001-054335-marostegui.json
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Deploy schema change on s3 (testwikidatawiki) T264109
  • 05:19 marostegui: Repool labsdb1011
  • 04:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:27 krinkle@deploy1001: Synchronized php-1.36.0-wmf.10/includes/parser/: Ia3357b2f593c (duration: 00m 58s)
  • 01:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: 1721d2aa0 - Reject ParserCache entries from the last wmf.11 deployment (duration: 05m 13s)

2020-09-30

  • 22:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:46 cdanis: depool mw2356 and mw2319
  • 21:45 eileen: civicrm revision changed from 5a53bfe6ed to 256adda03c, config revision is 646817a2c0
  • 21:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 also
  • 21:19 ejegg: updated fundraising CiviCRM from 6e843649ac to 5a53bfe6ed
  • 21:04 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback
  • 21:00 twentyafterfour@deploy1001: scap failed: average error rate on 5/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 20:58 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 20s)
  • 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
  • 20:47 mutante: temp disabling puppet on C:profile::swift::stats_reporter hosts, applying gerrit:631158 refactoring change
  • 20:36 mutante: temp disabling puppet on swift::storage (swift-be) hosts, applying gerrit:631157 refactoring change
  • 19:21 mutante: activating DHCP and squid on install[345]001.wikimedia.org
  • 19:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 19:01 effie: disable puppet on mw2271 and use onhost memcached - T263958
  • 19:00 hoo@deploy1001: Synchronized wmf-config/: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:58 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on svwiki (T257220) (duration: 00m 58s)
  • 18:36 bblack: lvs1016 pybal diff alerts downtimed in icinga for ~48h to reduce annoying flappy alert spam, with reference to https://phabricator.wikimedia.org/T264227
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki (T225027) (duration: 00m 58s)
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put search in header for anons on all wikis, not just desktop-improvements wikis (T263032) (duration: 00m 59s)
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable clientError on Wikidata and all Wikipedias except enwiki (T255585) (duration: 00m 58s)
  • 18:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move search in header for anons (T263032) (duration: 00m 59s)
  • 17:52 bblack: lvs1016: restart pybal
  • 17:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:01 hnowlan: finished adding restbase2018-a to the cassandra cluster
  • 16:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 cicalese@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Add beta config for API Portal/OAuth communications (duration: 00m 58s)
  • 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:21 mutante: re-enabled puppet on install2003
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:28 moritzm: removed librsvg 2.40.20-3+wmf1+stretch1 from component/thumbor, superseded by 2.40.21-0+deb9u1 released via stretch-security
  • 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 cmjohnson1: powering down ores100[3-9 to upgrade memory in each T259909
  • 14:05 elukey: create thirdparty/amd-rocm33 for stretch-wikimedia
  • 14:03 cmjohnson1: powering down ores1002 to upgrade memory T259909
  • 13:55 cmjohnson1: powering down ores1001 to upgrade memory T259909
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 hnowlan: started bootstrapping restbase1028-a, first buster restbase host
  • 12:39 marostegui: Deploy schema change on db2080, db2081 T264109
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P12858 and previous config saved to /var/cache/conftool/dbconfig/20200930-123851-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P12857 and previous config saved to /var/cache/conftool/dbconfig/20200930-123824-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P12856 and previous config saved to /var/cache/conftool/dbconfig/20200930-123753-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080', diff saved to https://phabricator.wikimedia.org/P12855 and previous config saved to /var/cache/conftool/dbconfig/20200930-123659-marostegui.json
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 effie: enable puppet P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 11:21 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Enable Special:TranslationStats (T263004) (duration: 00m 59s)
  • 11:06 effie: disable puppet on P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 10:57 moritzm: installing librsvg security updates
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:21 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:07 kormat: deploying schema change to s4/eqiad T259831
  • 10:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:50 jayme: imported envoyproxy 1.15.1 to buster-wikimedia component/envoy-future - T264157
  • 09:12 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:45 kormat: deploying schema change to s7/eqiad T259831
  • 08:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2016 from dbctl T264156', diff saved to https://phabricator.wikimedia.org/P12853 and previous config saved to /var/cache/conftool/dbconfig/20200930-080817-marostegui.json
  • 08:06 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:00 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 07:56 akosiaris: upgrade termbox to latest chart, fixing various prometheus-statsd-export configuration minor issues.
  • 07:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 07:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1131 on s6 eqiad master T263227, also give weight to db1093 as new API host', diff saved to https://phabricator.wikimedia.org/P12852 and previous config saved to /var/cache/conftool/dbconfig/20200930-074417-marostegui.json
  • 07:41 marostegui: Starting s6 eqiad failover from db1093 to db1131 - T263227
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T263227', diff saved to https://phabricator.wikimedia.org/P12851 and previous config saved to /var/cache/conftool/dbconfig/20200930-071841-marostegui.json
  • 07:05 marostegui: Stop mysql on es2016 before decommissioning T264156
  • 07:01 elukey@deploy1001: Finished deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 (duration: 00m 49s)
  • 07:00 elukey@deploy1001: Started deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2016 T264156', diff saved to https://phabricator.wikimedia.org/P12850 and previous config saved to /var/cache/conftool/dbconfig/20200930-065838-marostegui.json
  • 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:19 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2082', diff saved to https://phabricator.wikimedia.org/P12849 and previous config saved to /var/cache/conftool/dbconfig/20200930-061036-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P12848 and previous config saved to /var/cache/conftool/dbconfig/20200930-061005-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12847 and previous config saved to /var/cache/conftool/dbconfig/20200930-060754-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12846 and previous config saved to /var/cache/conftool/dbconfig/20200930-060705-marostegui.json
  • 05:43 marostegui: Remove es2019 from tendril and zarcillo T264063
  • 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Reduce busy-time from 3600 to 1800 on labsdb1010
  • 02:30 eileen: process-control config revision is 646817a2c0
  • 00:41 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/: Backport: Ensure variant A homepage sidebar is always at least 300px (T263905) (duration: 01m 01s)

2020-09-29

  • 23:35 mutante: created testvm3001.esams.wmnet to test install3001
  • 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Echo app push on all Wikipedias (T262936) (duration: 00m 59s)
  • 23:20 Urbanecm: Evening B&C window completed
  • 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 68d7af9: Enable watchlist expiry feature (wikisource; T260461) (duration: 00m 58s)
  • 23:18 eileen: process-control config revision is 8b39770e93
  • 23:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bc6dda2: Enable watchlist expiry feature (T260461) (duration: 00m 58s)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:52 eileen: process-control config revision is 16a6dcafd6
  • 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 eileen: civicrm revision changed from 035ad1c351 to 06a5289d1a, config revision is 2622fd2c09
  • 22:45 eileen: process-control config revision is 2622fd2c09 jobs disabled
  • 22:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:26 mutante: phab1001 - re-enabled puppet and running it
  • 22:24 ejegg: CiviCRM rolled back from 4aa0aeccd1 to 035ad1c351
  • 22:16 eileen: civicrm revision changed from 035ad1c351 to 4aa0aeccd1, config revision is b9120969bf
  • 21:59 mutante: temp. disabled puppet on phab1001
  • 21:49 mutante: restarted aphlict service on aphlict1001
  • 21:47 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 13m 45s)
  • 21:34 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 21:30 mutante: started DHCP service on install2003 again
  • 21:22 mutante: temp stopping DHCP service on install2003 for a test
  • 21:09 mutante: rebooting testvm5001 for install test after switching DHCP/TFTP in eqsin to new dedicated VM
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 cdanis@cumin1001: dbctl commit (dc=all): 'depool db2125', diff saved to https://phabricator.wikimedia.org/P12843 and previous config saved to /var/cache/conftool/dbconfig/20200929-205453-cdanis.json
  • 20:51 mutante: DHCP server for EQSIN switched from bast5001 to install5001 (T252526)
  • 20:45 twentyafterfour@deploy1001: Finished scap: testwikis to 1.36.0-wmf.11 refs T263177 (duration: 69m 57s)
  • 19:44 andrewbogott: apt-get update && apt-get upgrade on wikitech-static
  • 19:40 mutante: temp. disabling puppet on ms-fe (swift-proxy) hosts, applying puppet refactoring change carefully
  • 19:35 twentyafterfour@deploy1001: Started scap: testwikis to 1.36.0-wmf.11 refs T263177
  • 19:29 twentyafterfour: Checked out mediawiki 1.36.0-wmf.11 on deploy1001 see T263177
  • 17:30 hnowlan: ported cassandra-tools-wmf to wikimedia-buster
  • 17:12 jbond42: update libdbi-perl on dbmonitor1001 and helium
  • 17:02 jbond42: re-enable puppet to post deploy puppetdb change
  • 16:57 jbond42: disable puppet to deploy puppetdb change
  • 16:34 chaomodus: deploying eqsin automated DNS
  • 15:51 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:39 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:23 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:00 vgutierrez: restarting acme-chief on acmechief1001
  • 14:48 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:41 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:32 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 14:30 bblack: switching eqsin and esams public-facing unified certs to letsencrypt - https://gerrit.wikimedia.org/r/c/operations/puppet/+/630847
  • 14:06 moritzm: installing facter updates from Buster 10.6 point release
  • 13:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:49 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2126 from dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12841 and previous config saved to /var/cache/conftool/dbconfig/20200929-134926-kormat.json
  • 13:47 ema: text@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12840 and previous config saved to /var/cache/conftool/dbconfig/20200929-134018-kormat.json
  • 13:36 ema: upload@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:28 moritzm: installing lua5.3 security updates
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12839 and previous config saved to /var/cache/conftool/dbconfig/20200929-132515-kormat.json
  • 13:10 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12838 and previous config saved to /var/cache/conftool/dbconfig/20200929-131011-kormat.json
  • 12:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 12:55 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12837 and previous config saved to /var/cache/conftool/dbconfig/20200929-125508-kormat.json
  • 12:53 moritzm: installing QT security updates
  • 12:29 kormat@cumin1001: dbctl commit (dc=all): 'db2108 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12836 and previous config saved to /var/cache/conftool/dbconfig/20200929-122914-kormat.json
  • 12:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:28 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db2126 to dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12835 and previous config saved to /var/cache/conftool/dbconfig/20200929-122811-kormat.json
  • 12:05 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:54 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:28 vgutierrez: disabling DHE-RSA-AES128-SHA support - T258405
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12834 and previous config saved to /var/cache/conftool/dbconfig/20200929-111804-root.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12833 and previous config saved to /var/cache/conftool/dbconfig/20200929-110300-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12832 and previous config saved to /var/cache/conftool/dbconfig/20200929-104757-root.json
  • 10:42 XioNoX: re-enable TFTP ALGs on all mr
  • 10:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 moritzm: installing libdbi-perl security updates for stretch/buster
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12831 and previous config saved to /var/cache/conftool/dbconfig/20200929-103253-root.json
  • 10:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1104 on s8 eqiad master T239238', diff saved to https://phabricator.wikimedia.org/P12830 and previous config saved to /var/cache/conftool/dbconfig/20200929-100723-kormat.json
  • 10:05 kormat: Starting s8 eqiad failover from db1109 to db1104 - T239238
  • 10:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 kormat@cumin1001: dbctl commit (dc=all): 'Set db1104 with weight 0 T239238', diff saved to https://phabricator.wikimedia.org/P12829 and previous config saved to /var/cache/conftool/dbconfig/20200929-095135-kormat.json
  • 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:17 marostegui: Depool labsdb1010 from web role
  • 09:08 jbond42: update rails on puppetmasters
  • 08:21 jayme: switching esams pybal back to conf1006 - T196487
  • 08:01 ema: cp3050: varnish upgrade to 6.0.6-1wm1 T263557
  • 07:55 gehel: badblocks check on wdqs1009 - T263125
  • 07:46 marostegui: Stop MySQL on es2019 before decommissioning T264063
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2019 from dbctl T264063', diff saved to https://phabricator.wikimedia.org/P12825 and previous config saved to /var/cache/conftool/dbconfig/20200929-074602-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2019 T264063', diff saved to https://phabricator.wikimedia.org/P12824 and previous config saved to /var/cache/conftool/dbconfig/20200929-060538-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 as es3 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12823 and previous config saved to /var/cache/conftool/dbconfig/20200929-060253-marostegui.json
  • 05:13 marostegui: Stop mysql and reboot es2026 - T263837
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 T263837', diff saved to https://phabricator.wikimedia.org/P12822 and previous config saved to /var/cache/conftool/dbconfig/20200929-051236-marostegui.json
  • 05:10 marostegui: Remove es2013 from tendril and zarcillo T263740
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:32 tgr_: B&C done
  • 00:31 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: Backport: Add (and increment) CacheDecorator cache version ([PHABRICATOR-TASK]) (duration: 00m 58s)
  • 00:09 mutante: TFTP/install server for eqsin switched from bast5001 to install5001 - T252526

2020-09-28

  • 23:56 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T264053: Remove commonswiki from sidebar search (duration: 01m 09s)
  • 23:42 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/ConfigurationLoader/PageConfigurationLoader.php: Backport: Properly handle namespaces in tasktype template configuration (T264029) (duration: 01m 03s)
  • 22:27 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:25 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:24 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:51 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:10 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:12 ejegg: updated staging payments-wiki from 43470629cc to 885d87a905
  • 18:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:15 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:15 Urbanecm: Morning B&C done
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c7e08bc: Enable search in header A/B test for logged in users (T263032) (duration: 00m 58s)
  • 17:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 ejegg: updated payment-wiki from b2eb456ed1 to 2083498811
  • 16:34 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 16:20 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:08 hnowlan: reimaging new restbase hosts - restbase1028, restbase1029, restbase1030
  • 16:08 XioNoX: push pfw policies - T264013
  • 15:51 papaul: poweroff elastic2037 for DIMM replacing
  • 15:26 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1114 T196487', diff saved to https://phabricator.wikimedia.org/P12818 and previous config saved to /var/cache/conftool/dbconfig/20200928-152635-kormat.json
  • 15:25 hashar: Restarting CI Jenkins for plugins uninstallation T260565
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 moritzm: installing glib-networking security updates
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:40 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1006.eqiad.wmnet
  • 14:33 XioNoX: repool eqiad
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:05 moritzm: uploaded libdbi-perl 1.631-3+wmf1 for jessie-wikimedia T259102
  • 13:58 XioNoX: asw2-d-eqiad# run request system power-off member 4
  • 13:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1006.eqiad.wmnet
  • 13:45 XioNoX: downtiming all eqiad row D hosts - T196487
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:38 godog: roll restart object-replicator on ms-be2* for higher concurrency - T261633
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:20 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:19 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 13:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:57 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:37 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:31 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:29 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript resetUserEmail.php --wiki=arbcom_ruwiki 'Adamant.pwn' 'adamant.pwn@hotmail.com' # T262812
  • 12:28 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript createAndPromote.php --wiki=arbcom_ruwiki --bureaucrat --sysop 'Adamant.pwn' <PASSWORD REDACTED> # T262812
  • 12:26 Urbanecm: arbcom_ruwiki is created (T262812)
  • 12:26 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 48s)
  • 12:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:23 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arbcom_ruwiki (T262812)
  • 12:20 urbanecm@deploy1001: Synchronized dblists: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:19 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:17 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:59 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: prep for rack switch upgrade T196487', diff saved to https://phabricator.wikimedia.org/P12815 and previous config saved to /var/cache/conftool/dbconfig/20200928-115904-kormat.json
  • 11:43 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 56s)
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 61eac95: Creation of patroller group on arz.wikipedia (T262218) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 57s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:33 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:25 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 09:48 ema: upload@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:29 ema: text@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:17 _joe_: changing the restbase public TLS certs to include restbase-async.discovery.wmnet
  • 09:17 XioNoX: restart bird on dns2001 - T262372
  • 09:15 jynus: restart db1077 for upgrade and cleanup T187984
  • 09:06 XioNoX: restart bird on centrallog2001 - T262372
  • 09:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:00 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:56 dcausse: T263970: recovering lost apifeature indices (copying eqiad indices -> codfw)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:46 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:37 elukey: decommission the hadoop test cluster (analytics1028->41)
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:36 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:34 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:32 ema: text@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:28 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12813 and previous config saved to /var/cache/conftool/dbconfig/20200928-082825-kormat.json
  • 08:21 ema: upload@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:21 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2113 from contributions/logpager/recentchanges*/watchlist T263842', diff saved to https://phabricator.wikimedia.org/P12812 and previous config saved to /var/cache/conftool/dbconfig/20200928-082114-kormat.json
  • 08:13 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12811 and previous config saved to /var/cache/conftool/dbconfig/20200928-081321-kormat.json
  • 08:07 jayme: restarting pybal on lvs3005 for switching to conf1005 - T196487
  • 08:06 jayme: restarting pybal on lvs3006 for switching to conf1005 - T196487
  • 08:02 jayme: restarting pybal on lvs3007 for switching to conf1005 - T196487
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 07:58 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12810 and previous config saved to /var/cache/conftool/dbconfig/20200928-075817-kormat.json
  • 07:54 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 07:43 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12809 and previous config saved to /var/cache/conftool/dbconfig/20200928-074313-kormat.json
  • 07:29 _joe_: restarting pybal on the LVS primaries
  • 07:24 dcausse: T263970: forcing allocation of enwiki_general_1587198756 (chi@eqiad)
  • 07:18 _joe_: restarting pybal on the backup LVS in eqiad, codfw to pick up the new wikifeeds endpoint
  • 07:17 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
  • 07:09 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2028 as es1 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12806 and previous config saved to /var/cache/conftool/dbconfig/20200928-065938-marostegui.json
  • 06:15 marostegui: Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) T263443
  • 05:55 marostegui: Stop MySQL on es2013 before decommissioning it T263740
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2013 from dbctl T263740', diff saved to https://phabricator.wikimedia.org/P12805 and previous config saved to /var/cache/conftool/dbconfig/20200928-055410-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013 T263740', diff saved to https://phabricator.wikimedia.org/P12804 and previous config saved to /var/cache/conftool/dbconfig/20200928-054846-marostegui.json
  • 05:22 marostegui: Decrease labsdb1011 weight

2020-09-27

  • 06:36 elukey: powercycle analytics1048

2020-09-26

  • 19:20 chrisalbon: sudo service uwsgi-ores restart
  • 02:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
  • 01:56 cdanis: ❌cdanis@cumin2001.codfw.wmnet ~ 🕙🍺 sudo cumin 'A:ores and A:codfw' 'systemctl restart celery-ores-worker.service uwsgi-ores.service '
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ 🕤🍺 sudo systemctl restart uwsgi-ores.service
  • 01:11 cdanis: ✔️ cdanis@ores2001.codfw.wmnet ~ 🕘🍺 sudo systemctl restart celery-ores-worker.service
  • 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2020-09-25

  • 23:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables (duration: 26m 57s)
  • 22:36 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables
  • 22:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity (duration: 10m 42s)
  • food: updated fundraising CiviCRM from eb90dbcfd3 to 035ad1c351
  • 22:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity
  • 21:23 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment (duration: 11m 33s)
  • 21:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment
  • 20:26 effie: installing memcached 1.4.33-1+deb9u1 on mwdebug1001
  • 19:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1 (duration: 53m 58s)
  • 18:40 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1
  • 17:47 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/MobileFrontend/: Backport: Make all section `collapsible` during server side rendering (T263832) (duration: 00m 59s)
  • 17:37 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3 (duration: 02m 01s)
  • 17:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3
  • 16:35 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import (duration: 01m 10s)
  • 16:34 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import
  • 16:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Promote 1.35.0 to stable in extensiondistributor (duration: 00m 57s)
  • 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:23 jynus: fixing enwikivoyage ipblocks inconsistency cluster-wide T263842
  • 14:54 elukey: install linux-image-4.19-amd64 on an-worker1096 + reboot
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Add db2113 to various groups T263842', diff saved to https://phabricator.wikimedia.org/P12797 and previous config saved to /var/cache/conftool/dbconfig/20200925-121332-kormat.json
  • 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:23 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:10 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 10:42 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:28 moritzm: reimaging sretest1002 to validate puppetised sources.list with a new installation T158562
  • 09:58 moritzm: restarting archiva to pick up Java security update
  • 09:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 ema: upload@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 ema: text@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 06:50 elukey: shutdown ganeti5002 (mistakenly powercycled it without seeing T261130)
  • 06:40 elukey: powercycle ganeti5002 (no instances running on it, mgmt console shows no tty usable)
  • 06:34 elukey: reboot stat1004 to pick up kernel settings
  • 03:10 ejegg: updated payments-wiki from f89c594e12 to b2eb456ed1
  • 02:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: new codfw, T263798 (duration: 09m 05s)
  • 02:27 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 00m 07s)
  • 02:27 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 02:20 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: new codfw, T263798
  • 02:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798 (duration: 06m 09s)
  • 02:14 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798

2020-09-24

  • 23:39 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 01m 58s)
  • 23:37 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 21:40 mutante: mw1349 - systemctl reset-failed
  • 21:03 cdanis: reprepro: add backported ipvsadm 1:1.31-1+deb10u1 to buster-wikimedia
  • 21:00 andrew@deploy1001: Finished deploy [horizon/deploy@404e205]: (no justification provided) (duration: 01m 05s)
  • 20:59 andrew@deploy1001: Started deploy [horizon/deploy@404e205]: (no justification provided)
  • 20:41 andrew@deploy1001: Finished deploy [horizon/deploy@24368a5]: (no justification provided) (duration: 02m 10s)
  • 20:39 andrew@deploy1001: Started deploy [horizon/deploy@24368a5]: (no justification provided)
  • 20:35 andrew@deploy1001: Finished deploy [horizon/deploy@85125d1]: (no justification provided) (duration: 00m 52s)
  • 20:34 andrew@deploy1001: Started deploy [horizon/deploy@85125d1]: (no justification provided)
  • 19:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 19:47 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: cloudelastic: envoy sits in front now (duration: 00m 59s)
  • 19:41 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 00m 36s)
  • 19:41 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:39 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 01m 08s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:30 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: dev (duration: 00m 44s)
  • 19:29 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: dev
  • 19:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.10
  • 19:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bcf9fcb: Enable mobile block notice tracking in MobileFrontend (T260218) (duration: 01m 04s)
  • 18:58 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate on itwiki and svwiki (T262436) (duration: 01m 05s)
  • 18:01 mutante: temp. disabled puppet on install4001/install5001 - applying install_server role to new servers, starting with install3001
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:24 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:21 jbond42: enable puppet fleet wide post update puppetdb postgres logging
  • 17:19 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 jbond42: disable puppet fleet wide to update puppetdb postgres loggin
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:09 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:04 mutante: syncing facts to puppet compiler hosts
  • 17:01 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:26 robh: properly pooled mw1360 this time T262151
  • 16:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:04 XioNoX: pfw3-eqiad> restart security-log gracefully
  • 15:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/AbuseFilter/includes/Hooks/AbuseFilterHookRunner.php: 5e88c36: HookRunner: onAbuseFilterGenerateUserVars should run generateUserVars (T263750) (duration: 01m 06s)
  • 15:46 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=simplewiki --username="Oversight~simplewiki"` (T263760)
  • 15:44 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=enwiki --username=Oversight` (T263760)
  • 15:43 Urbanecm: Rename all local Oversight accounts but enwiki to Oversight~dbname, see task for full list (T263760)
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12794 and previous config saved to /var/cache/conftool/dbconfig/20200924-152626-root.json
  • 15:15 robh: mw1360 scap and repooled post work via T262151
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 66%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12793 and previous config saved to /var/cache/conftool/dbconfig/20200924-151120-root.json
  • 15:10 jayme: switched zotero service-proxy listener to use TLS - T255869
  • 15:00 XioNoX: repool eqiad - T256112
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 33%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12792 and previous config saved to /var/cache/conftool/dbconfig/20200924-145617-root.json
  • 14:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:28 XioNoX: [Netops] In window: turn VC-ports on/off for proper cabling: - T256112
  • 14:19 XioNoX: remove damping on anycast group for cr2-codfw
  • 14:18 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255869
  • 14:16 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255869
  • 14:16 XioNoX: [Netops] Disable unused VC ports to not risk them going online at connect: - T256112
  • 14:09 jayme: running puppet on lvs servers - T255869
  • 14:09 cmjohnson1: removing the cable connected to FPC1:1/0 (DAC 3m) FPC8:1/0 (DAC 3m)
  • 13:58 moritzm: upgrading mariadb on cloudcontrol-2001/2003/2004
  • 13:52 XioNoX: depool eqiad for row D recabling - T256112
  • 13:32 ottomata: Increased retention time for *.mediawiki.job.processMediaModeration topics in kafka main-eqiad and main-codfw to 31 days (as per request from Pchelolo )
  • 13:22 elukey: moved the hadoop cluster to puppet TLS certificates - T253957
  • 13:17 XioNoX: add damping to anycast BGP - T262372
  • 12:58 jayme: switched mathoid service-proxy listener to use TLS - T255875
  • 12:50 moritzm: upgrading bird on centtrallog1001
  • 12:43 gehel: restarting wdqs-categories on wdqs1009
  • 12:43 moritzm: installing netty-3.9 security updates
  • 12:42 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 12:30 ema: upload@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:29 godog: swift codfw-prod: rebalance only, no weight change
  • 12:27 kormat: powering off db2125 for maintenance T260670
  • 12:25 moritzm: installing xorg-server security updates
  • 12:09 ema: text@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:02 ema: cp4022: upgrade varnish to 6.0.6-1wm1 T263557
  • 11:40 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/Translate/tag/TPSection.php: fa4900e: Fix validation of translation unit section names (T263546) (duration: 01m 07s)
  • 11:25 jbond42: re-enable puppet fleet wide
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fdab74c: Enable ContentTranslation in Bashkir, Urdu and Welsh WPs as a default tool (T258504; T260022; T260024) (duration: 01m 05s)
  • 11:21 jbond42: disable puppet fleet wide to reduce log level on puppetdb
  • 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 90c7291: Move DiscussionTools out of beta on arwiki, cswiki, huwiki (T249394); d8553f3: Simplify DiscussionTools config (duration: 01m 11s)
  • 11:06 moritzm: installing imagemagick security updates on stretch
  • 11:02 jbond42: re-enable puppet fleet wide
  • 10:51 jbond42: disable puppet fleet wide to deploy a puppetmaster change
  • 10:49 moritzm: installing libproxy security updates
  • 10:23 volans: uploaded python3-wmflib_0.0.2 to apt.wikimedia.org buster-wikimedia
  • 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12789 and previous config saved to /var/cache/conftool/dbconfig/20200924-102025-kormat.json
  • 10:05 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12788 and previous config saved to /var/cache/conftool/dbconfig/20200924-100521-kormat.json
  • 10:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12787 and previous config saved to /var/cache/conftool/dbconfig/20200924-095018-kormat.json
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:48 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875
  • 09:46 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875
  • 09:43 jayme: running puppet on lvs servers - T255875
  • 09:35 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12786 and previous config saved to /var/cache/conftool/dbconfig/20200924-093514-kormat.json
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:20 ema: cp4021: repool with varnish 6.0.6-1wm1 T263557
  • 09:19 ema: cp4021: redepool with varnish to 6.0.6-1wm1 T263557
  • 09:14 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12785 and previous config saved to /var/cache/conftool/dbconfig/20200924-091445-kormat.json
  • 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 ema: cp4021: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 09:05 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12784 and previous config saved to /var/cache/conftool/dbconfig/20200924-082443-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12783 and previous config saved to /var/cache/conftool/dbconfig/20200924-082319-root.json
  • 08:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 08:15 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:15 XioNoX: configure vrrp_master_pinning in codfw - T263212
  • 08:10 moritzm: installing mariadb-10.1/mariadb-10.3 updates (packaged version from Debian, not the wmf-mariadb variants we used for mysqld)
  • 08:09 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:08 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 66%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12782 and previous config saved to /var/cache/conftool/dbconfig/20200924-080816-root.json
  • 07:58 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:57 marostegui: Remove es2018 from tendril and zarcillo T263613
  • 07:57 XioNoX: configure vrrp_master_pinning in eqiad - T263212
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 33%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12781 and previous config saved to /var/cache/conftool/dbconfig/20200924-075312-root.json
  • 07:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:49 godog: roll restart logstash codfw, gc death
  • 07:25 XioNoX: push pfw policies - T263674
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Place db2073 into vslow, not api in s4', diff saved to https://phabricator.wikimedia.org/P12780 and previous config saved to /var/cache/conftool/dbconfig/20200924-064018-marostegui.json
  • 06:22 elukey: powercycle elastic2037 (host stuck, no mgmt serial console working, DIMM errors in racadm getsel)
  • 05:57 marostegui: Remove es2012 from tendril and zarcillo T263613
  • 05:41 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2012 and es2018 from dbctl - T263615 T263613', diff saved to https://phabricator.wikimedia.org/P12778 and previous config saved to /var/cache/conftool/dbconfig/20200924-053001-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12777 and previous config saved to /var/cache/conftool/dbconfig/20200924-052207-marostegui.json
  • 01:25 ryankemper: Root cause of sigkill of `elasticsearch_5@production-logstash-eqiad.service` appears to be OOMKill of the java process: `Killed process 1775 (java) total-vm:8016136kB, anon-rss:4888232kB, file-rss:0kB, shmem-rss:0kB`. Service appears to have restarted itself and is healthy again
  • 01:21 ryankemper: Observed that `elasticsearch_5@production-logstash-eqiad.service` is in a `failed` state since `Thu 2020-09-24 00:53:53 UTC`; appears the process received a SIGKILL - not sure why
  • 01:19 ryankemper: Getting `connection refused` when trying to `curl -X GET 'http://localhost:9200/_cluster/health'` on `logstash1009`
  • 01:16 ryankemper: (after) `{"cluster_name":"production-elk7-codfw","status":"green","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":868,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:16 ryankemper: Ran `curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'`, cluster status is green again
  • 01:15 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:14 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0

2020-09-23

  • 23:52 mutante: alert1001 - systemctl restar ircecho because icinga-wm left the chat
  • 23:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cbd77e3: Add new Racine namespace to frwiktionary (T263525) (duration: 01m 05s)
  • 23:44 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:40 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 22382a9: remove wtp2005 from wgLinterSubmitterWhitelist (T257903) (duration: 01m 04s)
  • 23:14 eileen: civicrm revision changed from 32a82aa1b7 to eb90dbcfd3, config revision is 2a55766237
  • 23:13 eileen: civicrm revision is 32a82aa1b7, config revision is 2a55766237
  • 23:10 mutante: ganeti5003 - rebooting install5001 - OS install on 3001/4001/5001 T263684
  • 23:04 mutante: ganeti4003 - rebooting install4001
  • 22:51 mutante: ganeti5003 - rebooting install5001
  • 22:27 mutante: ganeti5003 - gnt-instance start install5001
  • 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:38 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:30 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.10 (duration: 01m 04s)
  • 21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.10
  • 21:24 dancy@deploy1001: Finished scap: (no justification provided) (duration: 42m 52s)
  • 21:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:06 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:57 mepps: updated payments-wiki from 7bb99ce03a to f89c594e12
  • 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 20:42 dancy: dancy@deploy1001 Started scap: Deploying fixes for T263601 and T263675 to 1.36.0-wmf.10
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:41 dancy@deploy1001: Started scap: (no justification provided)
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:36 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:36 eileen: civicrm revision changed from a789afd79b to 32a82aa1b7, config revision is 2a55766237
  • 20:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:30 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 20:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:27 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:22 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:18 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:08 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:06 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:02 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:42 robh: ganeti5002 firmware update before hw testing via T261130
  • 18:57 ryankemper: (Above deploy complete)
  • 18:54 ryankemper: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'` from `ryankemper@deploy1001:/srv/mediawiki-staging`
  • 18:47 ryankemper: Above deploy appears successful, test requests seem to be taking 40ms instead of the previous 140ms
  • 18:31 ryankemper: HEAD of `/srv/mediawiki-staging` is now at 7a96d63 as expected
  • 18:13 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:13 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:12 Urbanecm: urbanecm@deploy1001: scap sync-file wmf-config/InitialiseSettings.php 'b1554f36be68106c9364f4aa2fd70d759ad74356: Set $wgCategoryCollation = uca-tr on trwikiquote (T263628)'
  • 18:11 Urbanecm: Logmsgbot seems to be down
  • 17:29 robh: migrating ganeti instances off ganeti5002 for troubleshooting per T261130
  • 16:37 sukhe: upload dnsdist_1.4.0-1~deb10u2 to apt.wm.o (buster) - T252132
  • 16:00 herron: switching icinga over from icinga1001 to alert1001 T247966
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2088:3312 from api now that db2104/db2126 are done T259831', diff saved to https://phabricator.wikimedia.org/P12775 and previous config saved to /var/cache/conftool/dbconfig/20200923-160010-kormat.json
  • 15:58 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12774 and previous config saved to /var/cache/conftool/dbconfig/20200923-155819-kormat.json
  • 15:57 robh: updating firmware on mw1360, troubleshooting nic failure issue T262151
  • 15:57 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialBlock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 07s)
  • 15:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialUnblock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 08s)
  • 15:53 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:51 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:48 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:45 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:44 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:43 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12773 and previous config saved to /var/cache/conftool/dbconfig/20200923-154315-kormat.json
  • 15:40 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:37 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:28 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12772 and previous config saved to /var/cache/conftool/dbconfig/20200923-152812-kormat.json
  • 15:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:13 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12771 and previous config saved to /var/cache/conftool/dbconfig/20200923-151308-kormat.json
  • 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 kormat@cumin1001: dbctl commit (dc=all): 'db2126 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12770 and previous config saved to /var/cache/conftool/dbconfig/20200923-144441-kormat.json
  • 14:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 herron: grew prometheus1004 prometheus-ops filesystem to 1.6T
  • 14:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable repo config propagateChangeVisibility everywhere, 2/2 (duration: 01m 06s)
  • 14:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable repo config propagateChangeVisibility everywhere, 1/2 (duration: 01m 06s)
  • 13:50 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12769 and previous config saved to /var/cache/conftool/dbconfig/20200923-135028-kormat.json
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12768 and previous config saved to /var/cache/conftool/dbconfig/20200923-133525-kormat.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 100%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12766 and previous config saved to /var/cache/conftool/dbconfig/20200923-132918-root.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12765 and previous config saved to /var/cache/conftool/dbconfig/20200923-132022-kormat.json
  • 13:20 moritzm: installing ruby-json security updates
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 75%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12764 and previous config saved to /var/cache/conftool/dbconfig/20200923-131414-root.json
  • 13:05 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12763 and previous config saved to /var/cache/conftool/dbconfig/20200923-130518-kormat.json
  • 13:04 moritzm: installing multipath-tools bugfix updates from buster 10.5 point release
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12762 and previous config saved to /var/cache/conftool/dbconfig/20200923-125911-root.json
  • 12:49 moritzm: installing libunwind bugfix updates from buster 10.5 point release
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2104 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12761 and previous config saved to /var/cache/conftool/dbconfig/20200923-123922-kormat.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P12760 and previous config saved to /var/cache/conftool/dbconfig/20200923-123806-marostegui.json
  • 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Add db2088:3312 to api while db2104 gets depooled T259831', diff saved to https://phabricator.wikimedia.org/P12759 and previous config saved to /var/cache/conftool/dbconfig/20200923-123649-kormat.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly db2074 ', diff saved to https://phabricator.wikimedia.org/P12758 and previous config saved to /var/cache/conftool/dbconfig/20200923-123528-root.json
  • 12:22 ema: cp4027: repool with varnish 6.0.6-1wm1 T263557
  • 12:09 ema: cp4027: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 11:52 moritzm: installing GNUTLS bugfix updates from buster 10.5 point release
  • 11:51 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.GrowthTasksApi.js: 73b5ce8: Fix GrowthTasksApi lazy-loading flags for pages with no views (T263611) (duration: 01m 05s)
  • 11:49 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEdit.js: 1ab31a9: Mark pageviews as not used in the mobile postedit notice (T263611) (duration: 01m 06s)
  • 11:38 Urbanecm: Revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629188 and fetch to deploy1001 to unblock EU B&C deployment (T237467; cc twentyafterfour)
  • 11:27 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12756 and previous config saved to /var/cache/conftool/dbconfig/20200923-112712-kormat.json
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12755 and previous config saved to /var/cache/conftool/dbconfig/20200923-111209-kormat.json
  • 11:08 Urbanecm: Create ContentTranslation tables at testwiki using SQL files from `/srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql` (T263417
  • 10:57 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12754 and previous config saved to /var/cache/conftool/dbconfig/20200923-105705-kormat.json
  • 10:42 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12753 and previous config saved to /var/cache/conftool/dbconfig/20200923-104202-kormat.json
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12752 and previous config saved to /var/cache/conftool/dbconfig/20200923-102120-kormat.json
  • 10:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12751 and previous config saved to /var/cache/conftool/dbconfig/20200923-100156-marostegui.json
  • 10:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Configure entityDataCachePaths for Wikibase (duration: 01m 05s)
  • 09:59 elukey: update puppet compiler's facts
  • 09:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 2/2 (production no-op) (duration: 01m 04s)
  • 09:55 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 1/2 (duration: 01m 16s)
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12750 and previous config saved to /var/cache/conftool/dbconfig/20200923-094511-marostegui.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12748 and previous config saved to /var/cache/conftool/dbconfig/20200923-083200-marostegui.json
  • 08:29 moritzm: installing dbus security updates on buster
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12747 and previous config saved to /var/cache/conftool/dbconfig/20200923-080651-marostegui.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12746 and previous config saved to /var/cache/conftool/dbconfig/20200923-071129-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 to re-add change_revision_id index T262856', diff saved to https://phabricator.wikimedia.org/P12745 and previous config saved to /var/cache/conftool/dbconfig/20200923-070926-marostegui.json
  • 06:34 marostegui: Stop MySQL on es2012 and es2018 T263613 T263615
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2018 T263615', diff saved to https://phabricator.wikimedia.org/P12744 and previous config saved to /var/cache/conftool/dbconfig/20200923-063140-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.org/P12743 and previous config saved to /var/cache/conftool/dbconfig/20200923-060812-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index removal T262856', diff saved to https://phabricator.wikimedia.org/P12742 and previous config saved to /var/cache/conftool/dbconfig/20200923-055850-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 T262856', diff saved to https://phabricator.wikimedia.org/P12741 and previous config saved to /var/cache/conftool/dbconfig/20200923-055531-marostegui.json
  • 05:37 marostegui: Purge global_status_log table on tendril - T252331
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:03 marostegui: Remove triggers from db2094:3313 for MCR schema change T238966
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12739 and previous config saved to /var/cache/conftool/dbconfig/20200923-050234-marostegui.json
  • 04:25 eileen: civicrm revision changed from 8f32b6301f to a789afd79b, config revision is 9933605187

2020-09-22

  • 23:27 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: clientError: enable on ja,es,de,ru,it,zh,pt wikipedias (T255585) (duration: 01m 04s)
  • 23:24 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry feature (T261249) (duration: 01m 06s)
  • 21:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:46 ebernhardson: T259539 enabled adaptive replica selection on elasticsearch at search.svc.eqiad.wmnet:9[246]43
  • 20:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:43 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.10
  • 20:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 42m 21s)
  • 20:30 mutante: gerrit2001 (gerrit-replica) restarting gerrit service
  • 19:49 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 19:44 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.5 (duration: 17m 59s)
  • 19:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 16:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:00 robh: running dell epsa test on down host mw1360 per T262151
  • 14:34 moritzm: installing nginx security updates on buster
  • 14:33 shdubsh: restart apache on prometheus nodes to pick up new ext endpoint
  • 14:24 ema: upload libvmod-re2 1.5.3-1 to buster-wikimedia component/varnish6 T261632
  • 14:24 papaul: rebooting ms-be2019
  • 14:15 XioNoX: upgrade FNM on netflow2001 - T257035
  • 14:12 jayme: running ipvsadm -D -t 10.2.1.19:1970; ipvsadm -D -t 10.2.1.21:24766 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:12 jayme: running ipvsadm -D -t 10.2.2.19:1970; ipvsadm -D -t 10.2.2.21:24766 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255868 T255877
  • 14:11 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:10 XioNoX: upgrade FNM on netflow5001 - T257035
  • 14:09 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255868 T255877
  • 14:09 shdubsh: restart statsv on webperf[1-2]001 to route metrics through statsd-exporter
  • 14:09 XioNoX: upgrade FNM on netflow1001 - T257035
  • 14:06 XioNoX: upgrade FNM on netflow3001 - T257035
  • 14:05 jayme: running puppet on lvs servers - T255868 T255877
  • 14:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:02 hnowlan: roll-restarting restbase codfw for java updates
  • 13:59 XioNoX: add fastnetmon_1.1.7 to buster-wikimedia repo - T257035
  • 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:55 ema: upload varnish-modules 0.15.0-1+wmf1 to buster-wikimedia component/varnish6 T261632
  • 13:49 marostegui: Deploy MCR change on db2098:3313 - T238966
  • 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:39 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:35 ema: upload libvmod-netmapper 1.8-1 to buster-wikimedia component/varnish6 T261632
  • 12:54 ema: upload varnishkafka 1.1.0-1 to buster-wikimedia component/varnish6 T261632
  • 12:11 moritzm: installing python3.7 security updates on Buster
  • 12:09 moritzm: installing bundler updates on buster
  • 11:59 Urbanecm: Reset password for SUL User:Freibo
  • 11:58 Lucas_WMDE: EU backport&config window done
  • 11:56 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource --fix | tee T263358.fix # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:55 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource | tee T263358.dryrun # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Create Portal and Portal_talk namespaces on trwikisource, and fix an incorrect alias (T263358) (duration: 00m 57s)
  • 11:47 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Removing Wikipedia store link from enwiki (T262329) (duration: 00m 57s)
  • 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set timezone for wikis of the CWIRP to Europe/Rome (T263123) (duration: 00m 59s)
  • 11:35 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:35 hnowlan: roll-restarting restbase eqiad for java updates
  • 11:25 ema: upload varnish 6.0.6-1wm1 to buster-wikimedia component/varnish6 T261632
  • 11:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:13 moritzm: installing intel-microcode 3.20200616.1 on Buster baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 11:00 moritzm: installing intel-microcode 3.20200616.1 on Stretch baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 10:51 XioNoX: Add policy-options for primary IXPs to all routers - T262517
  • 10:48 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:48 hnowlan: roll-restarting sessionstore for java security updates
  • 10:44 moritzm: installing bacula security updates on stretch
  • 10:33 moritzm: installing remaining libx11 security updates
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12733 and previous config saved to /var/cache/conftool/dbconfig/20200922-101342-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12732 and previous config saved to /var/cache/conftool/dbconfig/20200922-101324-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12731 and previous config saved to /var/cache/conftool/dbconfig/20200922-101308-root.json
  • 10:00 kormat: deploying schema change to s2 in eqiad. labsdb will have s2 lag until this finishes. T259831
  • 09:59 jayme: running ipvsadm -D -t 10.2.1.45:34192; ipvsadm -D -t 10.2.1.42:35192 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:59 jayme: running ipvsadm -D -t 10.2.2.45:34192; ipvsadm -D -t 10.2.2.42:35192 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255873 T255870
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12730 and previous config saved to /var/cache/conftool/dbconfig/20200922-095839-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12729 and previous config saved to /var/cache/conftool/dbconfig/20200922-095821-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12728 and previous config saved to /var/cache/conftool/dbconfig/20200922-095805-root.json
  • 09:57 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:55 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255873 T255870
  • 09:51 jayme: running puppet on lvs servers - T255873 T255870
  • 09:46 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
  • 09:46 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12727 and previous config saved to /var/cache/conftool/dbconfig/20200922-094336-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12726 and previous config saved to /var/cache/conftool/dbconfig/20200922-094317-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12725 and previous config saved to /var/cache/conftool/dbconfig/20200922-094302-root.json
  • 09:30 volans: repooling ulsfo after merging DNS migration to Netbox zonefiles - T258729
  • 09:30 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.uptime (exit_code=0)
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12724 and previous config saved to /var/cache/conftool/dbconfig/20200922-092832-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12723 and previous config saved to /var/cache/conftool/dbconfig/20200922-092814-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12722 and previous config saved to /var/cache/conftool/dbconfig/20200922-092758-root.json
  • 09:26 jbond@cumin1001: START - Cookbook sre.pdus.uptime
  • 09:24 XioNoX: replace BGP_IXP_in with BGP_IXP_PRIMARY_in on cr3-ulsfo IX BGP group - T262517
  • 09:22 XioNoX: add BGP_IXP_PRIMARY_in to cr3-ulsfo - T262517
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12721 and previous config saved to /var/cache/conftool/dbconfig/20200922-091329-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12720 and previous config saved to /var/cache/conftool/dbconfig/20200922-091310-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12719 and previous config saved to /var/cache/conftool/dbconfig/20200922-091255-root.json
  • 09:11 jbond42: update snmp string on ps1-a8-codfw
  • 09:05 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12718 and previous config saved to /var/cache/conftool/dbconfig/20200922-090520-kormat.json
  • 08:58 _joe_: restart pybal on lvs2009
  • 08:56 _joe_: restarting pybal on lvs2010
  • 08:54 _joe_: restarted pybal on lvs1015
  • 08:50 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12717 and previous config saved to /var/cache/conftool/dbconfig/20200922-085017-kormat.json
  • 08:36 _joe_: restarting pybal low-traffic in eqiad to pick up lvs changes
  • 08:35 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12715 and previous config saved to /var/cache/conftool/dbconfig/20200922-083514-kormat.json
  • 08:22 volans: migrating ulsfo public DNS records to the Netbox-generated ones - T258729
  • 08:20 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12714 and previous config saved to /var/cache/conftool/dbconfig/20200922-082010-kormat.json
  • 08:13 kormat: uploaded wmfmariadbpy v0.5 to apt. deploying now to fleet
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2032, es2033 and es2034 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12713 and previous config saved to /var/cache/conftool/dbconfig/20200922-081154-marostegui.json
  • 07:57 volans: migrating ulsfo private DNS records to the Netbox-generated ones - T258729
  • 07:54 kormat@cumin1001: dbctl commit (dc=all): 'db2076 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12712 and previous config saved to /var/cache/conftool/dbconfig/20200922-075429-kormat.json
  • 07:51 jayme: running ipvsadm -D -t 10.2.1.18:8080; ipvsadm -D -t 10.2.1.46:3030 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:49 jayme: running ipvsadm -D -t 10.2.2.18:8080; ipvsadm -D -t 10.2.2.46:3030 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255879 T254581
  • 07:46 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:42 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255879 T254581
  • 07:39 jayme: running puppet on lvs servers - T255879 T254581
  • 07:34 volans: depooling ulsfo to merge DNS migration to Netbox zonefiles - T258729
  • 07:24 marostegui: Stop MySQL on es2014 - host will be decommissioned T262889
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2014 from dbctl T262889', diff saved to https://phabricator.wikimedia.org/P12711 and previous config saved to /var/cache/conftool/dbconfig/20200922-071435-marostegui.json
  • 07:11 XioNoX: cr1-codfw# run clear bfd session address fe80::f27c:c7ff:fe11:2c1b
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 for decommissioning T262889', diff saved to https://phabricator.wikimedia.org/P12710 and previous config saved to /var/cache/conftool/dbconfig/20200922-061815-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 100%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12709 and previous config saved to /var/cache/conftool/dbconfig/20200922-054455-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12708 and previous config saved to /var/cache/conftool/dbconfig/20200922-054438-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12707 and previous config saved to /var/cache/conftool/dbconfig/20200922-054430-root.json
  • 05:41 marostegui: Log remove triggers on revision table on db1124:3313 T238966
  • 05:39 marostegui: Deploy MCR schema change on s3 eqiad, this will generate lag on s3 on labsdb T238966
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2032, es2033 and es2034 into dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12706 and previous config saved to /var/cache/conftool/dbconfig/20200922-053346-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 75%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12705 and previous config saved to /var/cache/conftool/dbconfig/20200922-052951-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12704 and previous config saved to /var/cache/conftool/dbconfig/20200922-052935-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12703 and previous config saved to /var/cache/conftool/dbconfig/20200922-052926-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 50%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12702 and previous config saved to /var/cache/conftool/dbconfig/20200922-051448-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12701 and previous config saved to /var/cache/conftool/dbconfig/20200922-051431-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12700 and previous config saved to /var/cache/conftool/dbconfig/20200922-051423-root.json
  • 05:00 marostegui: Add es2032 es2033 and es2034 to tendril and zarcillo T261717
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 25%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12699 and previous config saved to /var/cache/conftool/dbconfig/20200922-045944-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12698 and previous config saved to /var/cache/conftool/dbconfig/20200922-045928-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12697 and previous config saved to /var/cache/conftool/dbconfig/20200922-045919-root.json
  • 01:35 ryankemper: `sudo cumin C:profile::services_proxy::envoy 'enable-puppet "adding cloudelastic to the service proxy --rkemper"'` done
  • 01:35 ryankemper: woot! `curl -X GET -s 'http://localhost:6105/_cluster/health'` gives a response as expected. (As do 6106 and 6107). Re-enabling puppet across the fleet...
  • 01:32 ryankemper: `sudo run-puppet-agent -e "adding cloudelastic to the service proxy --rkemper"` on `mwdebug1002.eqiad.wmnet`
  • 01:28 ryankemper: `sudo puppet-merge` done, now will run puppet on a single eqiad appserver and verify we can curl `localhost:610{5,6,7}`
  • 01:17 ryankemper: Disabling puppet on affected nodes via `sudo cumin C:profile::services_proxy::envoy 'disable-puppet "adding cloudelastic to the service proxy --rkemper"'`
  • 01:17 ryankemper: Going to test patch to stick envoy in front of `cloudelastic`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/628243

2020-09-21

  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:39 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:36 mutante: debmonitor2002 - systemctl reset-failed
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:20 mutante: releases.wikimedia.org has been converted to an active-active service with geodns/ backends in both DCs
  • 21:56 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:54 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:51 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:49 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adjust enwiktionary completion search ranking (duration: 00m 57s)
  • 20:47 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/CirrusSearch/: Remove pages from completion search by page id (duration: 01m 00s)
  • 20:04 herron: moving prometheus instance from bast3004 to prometheus3001 T243057
  • 19:46 herron: moving prometheus instance from bast4002 to prometheus4001 T243057
  • 19:38 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Push notifications deployment (4/5) (duration: 00m 57s)
  • 19:34 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Push notifications deployment (3/5) (duration: 00m 57s)
  • 19:28 mholloway-shell@deploy1001: Synchronized wmf-config/ProductionServices.php: Push notifications deployment (2/5) (duration: 00m 57s)
  • 19:26 mholloway-shell@deploy1001: Synchronized wmf-config/LabsServices.php: Push notifications deployment (1/5) (duration: 00m 57s)
  • 19:19 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:18 mepps: updated crm to 8f32b6301f
  • 19:15 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:14 ejegg: updated fundraising CiviCRM from e5ebf9d18a to 8f32b6301f
  • 19:13 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:59 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622863 T249745 (duration: 00m 56s)
  • 18:57 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365 (duration: 06m 54s)
  • 18:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on plwiki (T254239) and ptwiki (T255027) (duration: 00m 56s)
  • 18:50 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365
  • 18:33 mepps: updated crm from cc1f7e6d13 to e5ebf9d18a
  • 18:26 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Define Chinese logo variants for Modern Vector (no-op) (part 2) (T261153) (duration: 00m 56s)
  • 18:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Define Chinese logo variants for Modern Vector (no-op) (T261153) (duration: 00m 57s)
  • 18:21 catrope@deploy1001: Synchronized static/images/mobile/copyright/: Update Chinese logo variants for Modern Vector (T261153) (duration: 00m 56s)
  • 18:08 XioNoX: add NAT rule to pfw3-codfw - T263488
  • 17:42 papaul: rebooting ps1-a8-codfw firmware upgrade
  • 16:46 papaul: shutting down ms-be2019 for BBU replacing
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12696 and previous config saved to /var/cache/conftool/dbconfig/20200921-162433-root.json
  • 16:17 papaul: replacing msw-c8-codfw
  • 16:16 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12695 and previous config saved to /var/cache/conftool/dbconfig/20200921-160929-root.json
  • 16:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12694 and previous config saved to /var/cache/conftool/dbconfig/20200921-155426-root.json
  • 15:51 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 56s)
  • 15:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 59s)
  • 15:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12693 and previous config saved to /var/cache/conftool/dbconfig/20200921-153923-root.json
  • 15:24 hnowlan: roll-restarting restbase-dev for java security updates
  • 15:24 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Take db2124 back out of dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12692 and previous config saved to /var/cache/conftool/dbconfig/20200921-151210-kormat.json
  • 15:10 moritzm: rolling restart of mw canaries in codfw to pick up libx11 update
  • 15:07 moritzm: installing libx11 security updates on stretch
  • 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12691 and previous config saved to /var/cache/conftool/dbconfig/20200921-150233-kormat.json
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12690 and previous config saved to /var/cache/conftool/dbconfig/20200921-144729-kormat.json
  • 14:40 moritzm: installing qemu security updates on ganeti* stretch nodes
  • 14:37 papaul: firmware upgrade on db2127
  • 14:36 moritzm: installing qemu security updates on ganeti2011 and gnt-instance reboot debmonitor2001
  • 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:32 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12689 and previous config saved to /var/cache/conftool/dbconfig/20200921-143226-kormat.json
  • 14:30 herron: moving prometheus from bast5001 to prometheus5001 T243057
  • 14:24 papaul: disconnecting mgmt on msw-c1-codfw to re-do cable end T263138
  • 14:21 marostegui: Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing T263443
  • 14:17 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12688 and previous config saved to /var/cache/conftool/dbconfig/20200921-141722-kormat.json
  • 14:11 papaul: disconnecting mgmt on msw-d6-codfw to re-do cable end T263138
  • 14:00 moritzm: installing Java security updates on restbase/sessionstore*
  • 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2117 for schema change, add db2124 to dump/vslow in the interim T259831', diff saved to https://phabricator.wikimedia.org/P12687 and previous config saved to /var/cache/conftool/dbconfig/20200921-135821-kormat.json
  • 13:21 moritzm: installing glib-networking security updates for Stretch
  • 13:21 marostegui: Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing T263443
  • 12:59 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=codfw
  • 12:38 XioNoX: set same OSPF metric on both eqiad/codfw links - T263230
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing T263443
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing T263443
  • 11:38 effie: restart pybal on lvs2009 and lvs1015 - T256973
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed', diff saved to https://phabricator.wikimedia.org/P12684 and previous config saved to /var/cache/conftool/dbconfig/20200921-113708-marostegui.json
  • 11:35 Urbanecm: EU B&C done
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend/includes/Transforms/MoveLeadParagraphTransform.php: 3fab588: Simplify lead paragraph check (duration: 00m 59s)
  • 11:22 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a62212a: Allow local steward group members to bigdelete (duration: 00m 57s)
  • 11:12 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=shnwiktionary --fix # T256348 # P12683
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1cf4664: Set WT namespace alias to NS_PROJECT in shn.wiktionary (T256348) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 01ba828: Add archive.wul.waseda.ac.jp to the wgCopyUploadDomains (T261037) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bd51f47: Add *.70yearsindonesiaaustralia.com to the wgCopyUploadsDomains allowlist of commonswiki (T262238) (duration: 00m 57s)
  • 11:02 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 10:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 12s)
  • 09:03 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12682 and previous config saved to /var/cache/conftool/dbconfig/20200921-090343-kormat.json
  • 08:48 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12681 and previous config saved to /var/cache/conftool/dbconfig/20200921-084840-kormat.json
  • 08:48 marostegui: Stop MySQL on db2127 for on-site maintenance - T262247
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 T262247', diff saved to https://phabricator.wikimedia.org/P12680 and previous config saved to /var/cache/conftool/dbconfig/20200921-084730-marostegui.json
  • 08:33 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12679 and previous config saved to /var/cache/conftool/dbconfig/20200921-083337-kormat.json
  • 08:21 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 08:18 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12678 and previous config saved to /var/cache/conftool/dbconfig/20200921-081833-kormat.json
  • 08:15 godog: roll-restart swift-object-replicator in codfw and eqiad for increased concurrency
  • 07:53 hashar: Upgrading all CI Jenkins jobs to Quibble 0.0.45
  • 07:05 XioNoX: upgrade FNM to 1.1.7 in ulsfo - T257035
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12677 and previous config saved to /var/cache/conftool/dbconfig/20200921-060053-marostegui.json
  • 05:48 marostegui: Set innodb_change_buffering = inserts; on db2129 (s6 master) for performance testing
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12676 and previous config saved to /var/cache/conftool/dbconfig/20200921-054730-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12675 and previous config saved to /var/cache/conftool/dbconfig/20200921-052704-marostegui.json
  • 05:18 marostegui: Stop mysql on: es2013 es2016 es2019 to clone es2032 es2033 es2034 - T261717
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12674 and previous config saved to /var/cache/conftool/dbconfig/20200921-050632-marostegui.json
  • 05:06 marostegui: Deploy MCR schema change on s8 eqiad master, lag will appear on s8 (wikidata) on labsdb hosts T238966
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013,es2016 and es2019 to clone new hosts T261717', diff saved to https://phabricator.wikimedia.org/P12673 and previous config saved to /var/cache/conftool/dbconfig/20200921-050305-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2015 as es2 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12672 and previous config saved to /var/cache/conftool/dbconfig/20200921-050228-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12671 and previous config saved to /var/cache/conftool/dbconfig/20200921-045919-marostegui.json
  • 04:37 marostegui: Set innodb_change_buffering = inserts; on db2116 for performance testing
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12670 and previous config saved to /var/cache/conftool/dbconfig/20200921-043154-marostegui.json

2020-09-20

  • 08:46 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Tepig10102020' 'Davidfromtheworld' # T263317
  • 07:42 gehel: depooling wdqs2002 to catch up on lag
  • 07:36 gehel: restarting blazegraph + updater on wdqs2002

2020-09-19

  • 19:03 ariel@deploy1001: Finished deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed (duration: 00m 04s)
  • 19:02 ariel@deploy1001: Started deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed
  • 16:49 ejegg: reverted PayPal failmail diversion - IPN verification is working again
  • 16:27 ejegg: Diverted SmashPig PayPal failmail to eeggleston only

2020-09-18

  • 21:48 tzatziki: changed password for Millennium bug@ptwiki
  • 19:28 eileen: process-control config revision is 739ea754ca
  • 18:52 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:44 ryankemper: `sudo kill 254017 254018 254028 254029` to kill some dangling serdi / gzip processes, all the wikidata cleanup should be complete
  • 18:38 ryankemper: `sudo kill 126121 126122 126124 126128 249520 249521 254016 254027` on `snapshot1008` to terminate wikidata dump jobs that are in a bad state
  • 18:10 ryankemper: Removed stale `wikidatardf-dumps` crontab entry from `dumpsgen@snapshot1008`, stored backup of previous state of crontab in the (admittedly verbose) `/tmp/dumpsgen_crontab_before_removing_stale_wikidata_dump_entry_see_gerrit_puppet_patch_622342`
  • 17:15 mutante: lists1001 - apt-get install pwgen to generate passwords (this was installed on previous list server but apparently not puppetized, puppet patch coming up)
  • 16:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:09 mutante: restarting gerrit service to apply gerrit::628338 to make it dump heap if out of memory (T263008)
  • 14:15 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 00m 56s)
  • 14:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 01m 00s)
  • 13:02 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:00 kormat@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
  • 12:41 kormat: reimaging db2125 T263244
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12665 and previous config saved to /var/cache/conftool/dbconfig/20200918-123947-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12664 and previous config saved to /var/cache/conftool/dbconfig/20200918-122444-kormat.json
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12663 and previous config saved to /var/cache/conftool/dbconfig/20200918-120940-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12662 and previous config saved to /var/cache/conftool/dbconfig/20200918-115437-kormat.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125', diff saved to https://phabricator.wikimedia.org/P12661 and previous config saved to /var/cache/conftool/dbconfig/20200918-113509-marostegui.json
  • 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12660 and previous config saved to /var/cache/conftool/dbconfig/20200918-111529-kormat.json
  • 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12659 and previous config saved to /var/cache/conftool/dbconfig/20200918-105645-kormat.json
  • 10:45 jiji@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:41 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12658 and previous config saved to /var/cache/conftool/dbconfig/20200918-104141-kormat.json
  • 10:35 jiji@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:28 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12657 and previous config saved to /var/cache/conftool/dbconfig/20200918-102638-kormat.json
  • 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12656 and previous config saved to /var/cache/conftool/dbconfig/20200918-101135-kormat.json
  • 09:55 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12655 and previous config saved to /var/cache/conftool/dbconfig/20200918-095554-kormat.json
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 twentyafterfour: deployed hotfix for T263063 to phab1001
  • 09:47 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - T262527
  • 09:46 jayme: uncordoned kubestage1001 - T262527
  • 09:46 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12654 and previous config saved to /var/cache/conftool/dbconfig/20200918-094608-kormat.json
  • 09:31 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 80%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12653 and previous config saved to /var/cache/conftool/dbconfig/20200918-093105-kormat.json
  • 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 60%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12652 and previous config saved to /var/cache/conftool/dbconfig/20200918-091601-kormat.json
  • 09:00 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 40%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12651 and previous config saved to /var/cache/conftool/dbconfig/20200918-090058-kormat.json
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:56 jayme: reboot kubestage1001 for clean state - T262527
  • 08:54 elukey: change analytics-in4/in6 filters on cr1/cr2 after https://gerrit.wikimedia.org/r/628300
  • 08:47 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:45 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12650 and previous config saved to /var/cache/conftool/dbconfig/20200918-084554-kormat.json
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme: reboot kubestage1001 for kernel upgrade - T262527
  • 08:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1001 for clean state testing - T262527
  • 08:22 kormat@cumin1001: dbctl commit (dc=all): 'db2124 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12648 and previous config saved to /var/cache/conftool/dbconfig/20200918-082223-kormat.json
  • 08:16 klausman: reinstalling stat1004 with Buster
  • 07:17 moritzm: installing xdg-utils security updates
  • 07:14 XioNoX: push pfw policies - T263168
  • 07:12 jayme: draining kubestage1001 for kernel upgrade - T262527
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12647 and previous config saved to /var/cache/conftool/dbconfig/20200918-062127-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12646 and previous config saved to /var/cache/conftool/dbconfig/20200918-060815-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after rack move', diff saved to https://phabricator.wikimedia.org/P12645 and previous config saved to /var/cache/conftool/dbconfig/20200918-060724-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12644 and previous config saved to /var/cache/conftool/dbconfig/20200918-060103-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12643 and previous config saved to /var/cache/conftool/dbconfig/20200918-053758-marostegui.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2029 and es2030 to dbctl depooled - T261717', diff saved to https://phabricator.wikimedia.org/P12642 and previous config saved to /var/cache/conftool/dbconfig/20200918-053604-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12641 and previous config saved to /var/cache/conftool/dbconfig/20200918-052608-marostegui.json
  • 05:15 marostegui: Restart wikibugs

2020-09-17

  • 23:41 ejegg: updated payments-wiki from 86c997fdb2 to 7bb99ce03a
  • 23:01 ejegg: updated payments-wiki from 1e5a52ed26 to 86c997fdb2
  • 20:47 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 19b9b98: Fix APCOND_FR_NEVERBLOCKED handling (part 3; T262970) (duration: 00m 57s)
  • 19:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Filomena ciavarella' 'Filomena Ciavarella' #T262657
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:29 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:11 Urbanecm: Morning B&C done
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 40591d3: Enable DiscussionTools beta on jawiki & viwiki (T261654; T262109) (duration: 00m 56s)
  • 18:06 Urbanecm: Move /srv/mediawiki-stagging/grep (owned by tstarling) to /home/urbanecm to make working directory clean (cc TimStarling)
  • 17:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 17:20 rzl: repooled eqiad at 17:11
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:03 papaul: restarting ps1-d8-codfw
  • 16:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 01m 12s)
  • 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 02m 50s)
  • 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 07m 26s)
  • 16:33 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema (duration: 06m 14s)
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 marostegui: Restart wikibugs
  • 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 papaul: replacing msw-d8-codfw
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1131 IP after moving it to a different rack T262901', diff saved to https://phabricator.wikimedia.org/P12639 and previous config saved to /var/cache/conftool/dbconfig/20200917-160540-marostegui.json
  • 16:03 marostegui: Recreate db1131 on tendril T262901
  • 15:59 marostegui: Update rack location on zarcillo for db1131 T262901
  • 15:57 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 100% T259831', diff saved to https://phabricator.wikimedia.org/P12638 and previous config saved to /var/cache/conftool/dbconfig/20200917-155708-kormat.json
  • 15:44 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 75% T259831', diff saved to https://phabricator.wikimedia.org/P12637 and previous config saved to /var/cache/conftool/dbconfig/20200917-154431-kormat.json
  • 15:43 mepps: updated payments-wiki from 3c073a6a56 to 1e5a52ed26
  • 15:35 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 50% T259831', diff saved to https://phabricator.wikimedia.org/P12636 and previous config saved to /var/cache/conftool/dbconfig/20200917-153514-kormat.json
  • 15:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 25% T259831', diff saved to https://phabricator.wikimedia.org/P12635 and previous config saved to /var/cache/conftool/dbconfig/20200917-152019-kormat.json
  • 15:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12634 and previous config saved to /var/cache/conftool/dbconfig/20200917-151347-marostegui.json
  • 15:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12633 and previous config saved to /var/cache/conftool/dbconfig/20200917-150234-marostegui.json
  • 15:02 jynus: deploying extended grants for admin account on sys/p_s at s8@codfw T195578
  • 15:00 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2114: depool for schema change T259831', diff saved to https://phabricator.wikimedia.org/P12632 and previous config saved to /var/cache/conftool/dbconfig/20200917-145451-kormat.json
  • 14:49 cmjohnson1: ending pdu maintenance in eqiad
  • 14:40 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12631 and previous config saved to /var/cache/conftool/dbconfig/20200917-143914-marostegui.json
  • 14:32 papaul: replacing msw-d1,d2,d3,d4,d5 and d6
  • 14:31 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12630 and previous config saved to /var/cache/conftool/dbconfig/20200917-141825-marostegui.json
  • 14:02 marostegui: Start mysql on db1125 after PDU maintenance T261459
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12629 and previous config saved to /var/cache/conftool/dbconfig/20200917-140014-marostegui.json
  • 13:33 jayme: ran ipvsadm -D -t 10.2.2.14:8888 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:32 jayme: ran ipvsadm -D -t 10.2.2.31:8748 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.31:8748 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.14:8888 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'Start depooling db2114 T259831', diff saved to https://phabricator.wikimedia.org/P12628 and previous config saved to /var/cache/conftool/dbconfig/20200917-132513-kormat.json
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:19 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet
  • 13:17 marostegui: Stop MySQL on db2125 for on-site maintenance T260670
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:13 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.9
  • 12:18 cmjohnson1: pdu swap maintenance beginning now for racks D1, D2 and C1 eqiad
  • 11:24 matthiasmullie: End Euro B&C
  • 11:24 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/NavigationTiming/: Account for empty layout shift sources array (duration: 01m 05s)
  • 11:22 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/WikimediaEvents/: Disable MediaSearch A/B test (duration: 01m 08s)
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12627 and previous config saved to /var/cache/conftool/dbconfig/20200917-111028-marostegui.json
  • 11:06 vgutierrez: update to acme-chief 0.29 on acmechief[12]001 - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:04 vgutierrez: upload acme-chief 0.29 to apt.wm.o (buster) - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:03 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:58 marostegui: Stop mysql on db1125 for PDU mainteanance, lag will appear on s2, s4, s6 and s7 on labsdb hosts T261459
  • 10:58 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=codfw
  • 10:51 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=codfw
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12626 and previous config saved to /var/cache/conftool/dbconfig/20200917-104816-marostegui.json
  • 10:46 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:40 oblivian@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=wikifeeds
  • 10:34 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:20 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:17 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:14 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 08:49 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - T262527
  • 08:43 jayme: uncordoned kubestage1002 after kernel upgrade - T262527
  • 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:37 godog: graphite compress /var/log/carbon logs older than 2d
  • 08:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1002 for kernel upgrade - T262527
  • 08:24 godog: graphite add 300G to /srv
  • 07:55 jayme: draining kubestage1002 for kernel upgrade - T262527
  • 07:55 jayme: cordoning kubestage1002 for kernel upgrade - T262527
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12624 and previous config saved to /var/cache/conftool/dbconfig/20200917-070145-marostegui.json
  • 06:55 hashar: Taking a heap dump of Gerrit JVM
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12623 and previous config saved to /var/cache/conftool/dbconfig/20200917-061931-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12622 and previous config saved to /var/cache/conftool/dbconfig/20200917-060312-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12621 and previous config saved to /var/cache/conftool/dbconfig/20200917-055219-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for on-site maintenace', diff saved to https://phabricator.wikimedia.org/P12620 and previous config saved to /var/cache/conftool/dbconfig/20200917-055158-marostegui.json
  • 05:46 marostegui: Stop mysql on db1131 - T262901
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2031 on es2 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12619 and previous config saved to /var/cache/conftool/dbconfig/20200917-054226-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12618 and previous config saved to /var/cache/conftool/dbconfig/20200917-053503-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12617 and previous config saved to /var/cache/conftool/dbconfig/20200917-052347-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2011 as es1 master and es2017 as es3 master and then depool es2018 and es2012 to clone es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12616 and previous config saved to /var/cache/conftool/dbconfig/20200917-051741-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12615 and previous config saved to /var/cache/conftool/dbconfig/20200917-050739-marostegui.json
  • 04:53 marostegui: Deploy schema change on s1 eqiad primary master - T238966
  • 01:22 Krinkle: krinkle@mwmaint1002 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
  • 01:22 Krinkle: krinkle@mwmaint2001 synced docroot/noc – https://gerrit.wikimedia.org/r/620138

2020-09-16

  • 23:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:37 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/GrowthExperiments/: Fix styling for mobile start module (T258008); Revert wider task card on desktop (T263042, T258704); Fix width of sidebar modules in narrow mode in variant A (T263068) (duration: 01m 09s)
  • 22:24 shdubsh: install prometheus-icinga-exporter 0.11 on icinga2001
  • 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:10 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Vector search in header on testwiki and officewiki (T262207) (duration: 01m 04s)
  • 18:00 brennen@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend: Backport: Check $coords matched some nodes before comparing contents (T263034) (duration: 01m 06s)
  • 17:58 joal@deploy1001: Finished deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0] (duration: 00m 08s)
  • 17:58 joal@deploy1001: Started deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0]
  • 17:51 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:50 joal@deploy1001: Started deploy [analytics/refinery@07056b0]: Regular analytics weekly train [analytics/refinery@07056b0]
  • 17:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:11 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:03 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:40 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:13 marostegui: Start mysql on db1093, db1109 and db1123 after pdu work is done
  • 16:12 ryankemper: `wdqs` deploy complete, service is healthy
  • 16:09 elukey: reinstall buster on an-tool1009 after a lot of tests (ganeti VM, so it is a manual work)
  • 16:00 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:49 ryankemper: sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'; sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'
  • 15:49 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b7e2d0b]: 0.3.48 (duration: 14m 40s)
  • 15:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Rename wmgWikibaseClientLocalEntitySourceName to wmgWikibaseClientItemAndPropertySourceName on Beta (T258060) (production no-op) (duration: 01m 04s)
  • 15:35 ryankemper: Canary `wdqs1003` query tests looks good, proceeding to wdqs deploy for rest of fleet
  • 15:33 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b7e2d0b]: 0.3.48
  • 15:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php (T258060) (duration: 01m 05s)
  • 15:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Use `wmgWikibaseClientItemAndPropertySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php (T258060) (duration: 01m 02s)
  • 15:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add `wmgWikibaseClientItemAndPropertySourceName` to InitialiseSettings.php (T258060) (duration: 01m 06s)
  • 14:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:41 volans: uploaded spicerack_0.0.43 to apt.wikimedia.org buster-wikimedia
  • 14:39 cmjohnson1: pdu swap rack d7-eqiad, missed this in earlier log entry
  • 14:34 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 14:02 Urbanecm: Change email address of User:Oversight@enwiki to oversight-en-wp@wikipedia.org as OTRS is back up (T262733)
  • 13:48 marostegui: Start mysql on db1121 after PDU work
  • 13:46 James_F: Restarting CI Jenkins for T262827
  • 13:08 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet
  • 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.9
  • 12:58 elukey: upload hue_4.7.1-1+deb10u1 to buster-wikimedia
  • 12:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 12:56 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 12:49 cmjohnson1: start pdu swap in racks c6 and c7, d8
  • 12:36 moritzm: powercycling mw2256 (went down with overheated CPU)
  • 12:29 moritzm: restarting exim on MXes to pick up GNUTLS update
  • 11:29 moritzm: restarting slapd on LDAP replicas to pick up GNUTLS update
  • 11:18 moritzm: installing gnutls28 security updates on remaining stretch hosts
  • 11:12 jforrester@deploy1001: Synchronized php-1.36.0-wmf.9/includes/filerepo/file: T263014 Revert "Remove support for (Archived|OldLocal)File::userCan without a user" (duration: 01m 04s)
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2027 and es2028 T261717', diff saved to https://phabricator.wikimedia.org/P12606 and previous config saved to /var/cache/conftool/dbconfig/20200916-103324-marostegui.json
  • 10:20 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.9
  • 10:14 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.9 (duration: 46m 07s)
  • 10:10 ema: upload python-acme 0.31.0-2wm1 to buster-wikimedia T263006
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12605 and previous config saved to /var/cache/conftool/dbconfig/20200916-100548-marostegui.json
  • 10:01 akosiaris: T187984 Shutdown mendelevium.
  • 09:43 jynus: deploying max_packet_size change to m3 instances, too
  • 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.9
  • 09:26 liw: moving train 1.36.0-wmf.9 to testwikis
  • 09:22 jynus: restarting gerrit service on gerrit1001, unresponsive
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12603 and previous config saved to /var/cache/conftool/dbconfig/20200916-091535-marostegui.json
  • 09:13 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 0 - T262290
  • 09:08 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 1 - T262290
  • 08:52 marostegui: Stop mysql on db1121, db1123, db1093 and db1109 for PDU work T261454 T261457
  • 08:52 XioNoX: asw-d-codfw> request system snapshot slice alternate all-members - T262290
  • 08:50 jynus: deploy new max_allowed_packet configuration to m1, m2 and m5 dbs
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12601 and previous config saved to /var/cache/conftool/dbconfig/20200916-084916-marostegui.json
  • 08:42 awight: finished security backport for https://phabricator.wikimedia.org/T262628
  • 08:41 awight@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FileImporter/src/Services/ImportPlanValidator.php: Security patch for T262628 (duration: 00m 59s)
  • 08:41 XioNoX: asw-c-codfw> request system snapshot slice alternate all-members - T262290
  • 08:27 XioNoX: asw-b-codfw> request system snapshot slice alternate all-members - T262290
  • 08:26 awight: beginning security backport for https://phabricator.wikimedia.org/T262628
  • 08:17 XioNoX: asw-a-codfw> request system snapshot slice alternate all-members - T262290
  • 08:04 akosiaris: T187984 Validated that ticket.wikimedia.org works, proceeding with a wider announcement
  • 08:02 XioNoX: asw2-d-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:49 akosiaris: T187984 Switch over ticket.discovery.wmnet to otrs1001
  • 07:48 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:44 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 07:40 XioNoX: asw2-c-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:37 akosiaris: T187984 Tested inbound email successfully
  • 07:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:26 akosiaris: T187984 Tested outbound email, switching inbound email configuration and performing tests
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12600 and previous config saved to /var/cache/conftool/dbconfig/20200916-072614-marostegui.json
  • 07:22 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 jayme@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 07:21 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:12 akosiaris: T187984 Disable gravatar in system configuration to avoid leaking agent PII through a 3rd party service
  • 07:03 akosiaris: T187984 validated that the OTRS installation is functional over SSH
  • 07:02 akosiaris: T187984 migration script done. Config updates, rebuilds, package upgrades/reinstall and index rebuilds done
  • 06:28 godog: codfw-prod: bump weight for ms-be2057 - T261633
  • 06:20 kart_: Updated cxserver to 2020-08-30-011854-production (T253439, T260557)
  • 06:20 XioNoX: asw2-b-eqiad> request system snapshot slice alternate all-members - T262290
  • 06:15 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:11 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12599 and previous config saved to /var/cache/conftool/dbconfig/20200916-061013-marostegui.json
  • 06:08 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12598 and previous config saved to /var/cache/conftool/dbconfig/20200916-060717-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 to clone es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12597 and previous config saved to /var/cache/conftool/dbconfig/20200916-055535-marostegui.json
  • 05:53 XioNoX: asw2-a-eqiad> request system snapshot slice alternate all-members - T262290
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12596 and previous config saved to /var/cache/conftool/dbconfig/20200916-055108-marostegui.json
  • 05:50 XioNoX: msw1-codfw> request system snapshot slice alternate - T262290
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2027 and es2028 to dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12595 and previous config saved to /var/cache/conftool/dbconfig/20200916-053918-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12594 and previous config saved to /var/cache/conftool/dbconfig/20200916-053507-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow', diff saved to https://phabricator.wikimedia.org/P12593 and previous config saved to /var/cache/conftool/dbconfig/20200916-052343-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12592 and previous config saved to /var/cache/conftool/dbconfig/20200916-052241-marostegui.json
  • 05:07 marostegui: Repool labsdb1010
  • 02:22 mutante: deneb - sudo systemctl start package_builder_Clean_up_build_directory to fix icinga alert after failed build attempts

2020-09-15

  • 23:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 1c0b0d1: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 56s)
  • 23:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 5beace3: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 58s)
  • 23:14 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: ac8bd38: flaggedrevs: Remove non-existent config options (duration: 00m 58s)
  • 23:07 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 23:00 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 62b21d5: Revert "Remove abusefilter-view right grant from wmf-config" (T255506) (duration: 00m 59s)
  • 20:44 brennen: removing extraneous recursive symlink /srv/mediawiki-staging/php-1.36.0-wmf.9/php-1.36.0-wmf.8
  • 18:32 Urbanecm: Morning B&C done
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 084729b: Remove abusefilter-view right grant from wmf-config (T255506) (duration: 00m 56s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1d34565: Enable MediaWiki client errors on frwiki (T255585) (duration: 00m 57s)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 79004b7: Enable the reverted tag on all wikis (T164307) (duration: 00m 56s)
  • 17:59 krinkle@deploy1001: Synchronized src/ServiceConfig.php: If727ae4335 (duration: 00m 56s)
  • 17:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out (duration: 37m 42s)
  • 17:05 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out
  • 17:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint (duration: 86m 46s)
  • 17:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:38 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:26 shdubsh: manual install prometheus-icinga-exporter upgrade on icinga2001
  • 14:53 godog: switch grafana to eqiad - T259143
  • 14:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:38 XioNoX: remove old SNMP community from all network devices
  • 14:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 56s)
  • 14:21 otto@deploy1001: sync-file aborted: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 06s)
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:18 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:14 cmjohnson1: beginning work inside racks c2, c3, c4 and c5 eqiad
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, s8, add db1092 temporarily', diff saved to https://phabricator.wikimedia.org/P12589 and previous config saved to /var/cache/conftool/dbconfig/20200915-121849-marostegui.json
  • 12:18 jbond42: update libxml2 on stretch and jessie
  • 12:08 jbond42: rolling restart of php7.2-fpm
  • 12:05 elukey: roll restart cassandra on aqs* to pick up openjdk upgrades
  • 12:05 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:44 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 294931f: Revert "Disable DynamicPageList on ruwikinews" (T262240; T262391) (duration: 00m 58s)
  • 11:17 effie: roll out scap 3.15.0-1 to all - T261234
  • 11:12 XioNoX: mass update SCS SNMP community in LibreNMS - T246890
  • 10:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:56 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:54 XioNoX: mass update PDU SNMP community in LibreNMS - T246890
  • 10:48 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 10:36 moritzm: uploaded libxml2 2.9.1+dfsg1-5+deb8u8+wmf1 for jessie-wikimedia
  • 10:33 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:22 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "testwikiswikis to 1.36.0-wmf.9"
  • 10:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 09:22 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:04 gehel: restart elasticsearch on elastic2029 (high GC
  • 09:01 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 08:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:53 elukey: roll restart druid zookeeper clusters for openjdk upgrades
  • 08:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:13 marostegui: Stop MySQL on labsdb1010 for PDU maintenance T261456
  • 08:05 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 11m 10s)
  • 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:01 akosiaris: T187984 migration script on otrs1001 proceeding as expected. Still in step 31/44, but that's what we saw in the test migration
  • 07:54 liw@deploy1001: Started scap: testwikis to 1.36.0-wmf.9
  • 07:24 godog: swift codfw add ms-be2057 at object weight 100 - T261633
  • 07:19 elukey: roll restart druid cluster to pick up openjdk updates
  • 07:19 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 07:16 XioNoX: pre-configure SGIX port on cr2-eqsin
  • 06:57 liw: 1.36.0-wmf.9 was branched at 7269b6b for T257977
  • 06:08 marostegui: Stop mysql on es2011 to clone es2028
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 to clone es2028', diff saved to https://phabricator.wikimedia.org/P12585 and previous config saved to /var/cache/conftool/dbconfig/20200915-060623-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2012 as es1 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12584 and previous config saved to /var/cache/conftool/dbconfig/20200915-060508-marostegui.json
  • 05:33 marostegui: Depool labsdb1010 for PDU maintenance
  • 05:10 marostegui: Restart sanitarium hosts on eqiad and codfw T262832

2020-09-14

  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:45 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 21:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:32 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:30 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'enable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:24 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'disable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:03 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a588eb0c6 T262087 modify wgEventStreams to reference NEL schema (duration: 00m 56s)
  • 19:00 Urbanecm: Morning B&C done
  • 18:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a5d56ed: e2f4798: Enable Special:Investigate on eswiki (T262436) (duration: 00m 56s)
  • 18:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:47 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:38 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 7d19393: Remove investigate from $wgAvailableRights (T260175) (duration: 00m 56s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d2fa653: Remove the investigate right from testwiki and frwiki (T260175) (duration: 00m 56s)
  • 18:30 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/EventStreamConfig/includes/: a4c8608: Default to using API json formatversion=2 (T251609) (duration: 00m 57s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 27ba5a1: add new parse* servers to $wgLinterSubmitterWhitelist (T247441) (duration: 00m 56s)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: 720e6cb: flaggedrevs: Move setting of wgFlaggedRevsAutopromote and wgFlaggedRevsAutoconfirm out of wgExtensionFunctions (T237191) (duration: 00m 56s)
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 55s)
  • 18:08 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 56s)
  • 17:51 mutante: all new parse* parsoid hardware pooled now and set to active in netbox, deploy in 10 min will add to $wgLinterSubmitterWhitelist (T247441)
  • 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 17:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 17:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2002.codfw.wmnet
  • 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:36 mutante: pooled the first of the new parsoid servers - parse2001 (T247441)
  • 16:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 16:04 elukey: completed the rollout of restrictive kafka ferm rules on the Kafka jumbo cluster
  • 16:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 16:01 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[0-2][0-9].codfw.wmnet
  • 15:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 15:58 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 15:54 moritzm: restarting apache on webperf* to pick up GNU TLS security update
  • 15:45 moritzm: restarting apache/FPM on mw2271/m2272 (codfw canaries) to pick up GNU TLS update
  • 15:35 moritzm: installing gnutls28 security updates on stretch
  • 15:23 elukey: enable stricter ferm rules on kafka-jumbo1007 and kafka-jumbo1005
  • 15:17 cicalese@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Allow public access to API Portal main page for private launch (duration: 00m 57s)
  • 15:17 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:11 cmjohnson1: completed pdu swap in eqiad racks d5/d6
  • 14:55 elukey: ferm rules added to kafka-jumbo1009, 1006 and 1008 up to now
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:16 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:42 moritzm: installing dbus security updates on stretch
  • 13:42 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:32 moritzm: installing websockify stretch updates
  • 13:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:51 cmjohnson1: correction it's replacing the pdu's in racks d5 and d6
  • 12:50 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1438 --new-data-type external-id (T262198)
  • 12:49 cmjohnson1: replacing pdu's in racks d4 and d5 eqiad
  • 12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-snmp (exit_code=1)
  • 12:30 ayounsi@cumin1001: START - Cookbook sre.pdus.rotate-snmp
  • 12:30 XioNoX: rotate SNMP community on all the PDUs - T246890
  • 12:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:24 moritzm: rebooting sodium for kernel update
  • 12:09 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:08 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:06 akosiaris: T187984 migration script on otrs1001 now in step 31/44
  • 12:03 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fea8861: Follow-up 0ee0d8f: [frwiktionary] Create `conj` alias (T262298) (duration: 00m 56s)
  • 11:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:48 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:48 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:46 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:45 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:41 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:41 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:40 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:39 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:36 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:35 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:27 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for MCR', diff saved to https://phabricator.wikimedia.org/P12578 and previous config saved to /var/cache/conftool/dbconfig/20200914-112648-marostegui.json
  • 11:24 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:20 marostegui: Remove triggers from db1124:3311 - T238966
  • 11:19 marostegui: Deploy MCR schema change on s1, this will generate lag on s1 labsdb - T238966
  • 11:13 Urbanecm: EU B&C window done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 47fe87c: [itwiki] Increase $wgAutoConfirmAge and $wgAutoConfirmCount (T262738) (duration: 00m 56s)
  • 11:09 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 11:05 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262298 # P12576
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0ee0d8f: [frwiktionary] Create new namespace "Conjugaison" & associated talk (T262298) (duration: 00m 56s)
  • 11:00 volans: Mass importing IPs from PuppetDB into Netbox T244153
  • 10:59 XioNoX: create LACP bundle to labtestvirt2003
  • 10:50 jbond42: enable git protocol version2 fleet wide
  • 10:43 effie: deploy scap 3.15.0-1 to canaries - T261234
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 09:27 akosiaris: T187984 migration script on otrs1001 now in step 8/44 (correction)
  • 09:26 akosiaris: T187984 migration script on otrs1001 now in step 8/41
  • 09:09 akosiaris: db1077. stop slave ; show slave status > /home/akosiaris/show_slave_status; reset slave all T187984
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2026 on es2 T261717', diff saved to https://phabricator.wikimedia.org/P12575 and previous config saved to /var/cache/conftool/dbconfig/20200914-085842-marostegui.json
  • 08:49 akosiaris: start the OTRS upgrade to 6.0.29 T187984
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12574 and previous config saved to /var/cache/conftool/dbconfig/20200914-084509-marostegui.json
  • 08:42 moritzm: upgrading remaining stretch systems to git 2.20 T262244
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12573 and previous config saved to /var/cache/conftool/dbconfig/20200914-083525-marostegui.json
  • 08:17 _joe_: restarting pybal on lvs2009
  • 08:16 _joe_: repooling mw2297
  • 08:14 _joe_: restarting php on mw2297, php-fpm stuck in SIGILL
  • 08:14 marostegui: Stop MySQL on db2125 for on-site maintenance - T260670
  • 08:12 _joe_: restarting pybal on lvs2010
  • 08:09 _joe_: restarting pybal on lvs1015
  • 08:05 godog: prometheus codfw ops, extend the lv by 100G
  • 08:04 marostegui: Stop MySQL on es2017 to clone es2027
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 to clone es2027 - T261717', diff saved to https://phabricator.wikimedia.org/P12572 and previous config saved to /var/cache/conftool/dbconfig/20200914-080344-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2018 as es3 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12571 and previous config saved to /var/cache/conftool/dbconfig/20200914-080239-marostegui.json
  • 07:58 _joe_: restarting pybal on lvs1015
  • 07:52 _joe_: restarting pybal on lvs1016
  • 07:40 jayme: shutting down etcd100[1-3] (sheduled for decommission, replaced by kubetcd100[4-6])
  • 07:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12570 and previous config saved to /var/cache/conftool/dbconfig/20200914-073919-marostegui.json
  • 06:56 elukey: slowly rollout ferm rules on Kafka-Jumbo hosts (see https://gerrit.wikimedia.org/r/611168)
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 05:54 elukey: execute "gnt-instance modify -B vcpus=4 an-tool1009.eqiad.wmnet" on ganeti1011 - T258768
  • 05:54 marostegui: Truncate tendril.general_log_sampled on db1115 - T262782
  • 05:47 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12569 and previous config saved to /var/cache/conftool/dbconfig/20200914-053844-marostegui.json

2020-09-13

  • 23:47 Urbanecm: Change email address of User:Oversight@enwiki to oversight-l@lists.wikimedia.org as part of OTRS downtime preparation (T262733)
  • 05:51 effie: sudo -i depool mw2297

2020-09-12

  • 01:07 mutante: people2001 - rsyncing user home dirs from people1002
  • 00:38 mutante: all issues with hosts doing stuff "on every run" have been fixed except one is left: analytics1034

2020-09-11

  • 22:54 mutante: starting people2001 VM
  • 17:30 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:22 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:12 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:55 jynus: starting snapshot of m2 from db1117
  • 08:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:59 XioNoX: remove BGP to AS64271 in AMS-IX (see peering@ email)
  • 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:17 moritzm: rebootin ldap-corp server for kernel update
  • 07:02 moritzm: remove git-core from stretch systems, it's a transition package no longer provided by the 2.20 backport from Buster
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:54 mutante: downtimes 48h for parse* hosts not in production yet but getting icinga checks from applied role
  • 01:53 mutante: ACKed alerts for eqiad power switches after making T262629
  • 01:53 mutante: initial puppet runs on parse2010 - parse2020, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:45 mutante: mw2296 - restarted php7.2-fpm
  • 01:42 mutante: mw2296 - systemctl restart apache2 - rescheduled icinga alerts for apache and php-fpm
  • 01:33 mutante: initial puppet runs on parse2001 - parse2010, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix (duration: 00m 07s)
  • 01:32 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20]: Simple hql syntax fix (duration: 08m 09s)
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:24 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20]: Simple hql syntax fix
  • 00:41 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca] (duration: 00m 08s)
  • 00:41 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca]
  • 00:40 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca] (duration: 08m 25s)
  • 00:38 mutante: generating mcrouter certs for parse2001 - parse2019 - mcrouter_generate_certs on puppetmaster1001 (T247441)
  • 00:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca]
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:01 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-09-10

  • 23:44 ejegg: updated payments-wiki from e41ab173e0 to 3c073a6a56
  • 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:43 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:31 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:11 ejegg: updated payments-wiki from be81063168 to e41ab173e0
  • 22:06 mutante: added mcrouter cert for parse2020, ran mcrouter_generate_certs
  • 21:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.8
  • 20:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:20 longma: correction: T257976 - 1.36.0-wmf.8 to all wikis
  • 20:20 longma: deploying 1.36.0-wmf.8 to all wikis
  • 20:02 krinkle@deploy1001: Synchronized php-1.36.0-wmf.8/includes/resourceloader/ResourceLoaderSkinModule.php: Ibe2c9f8d024f6 (duration: 01m 05s)
  • 19:44 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:12 mholloway-shell@deploy1001: Started restart [recommendation-api/deploy@db7fd80]: (no justification provided)
  • 19:07 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d2b57: Set $wgCategoryCollation = uca-tr on trwiktionary (T262163) (duration: 01m 05s)
  • 18:58 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262398
  • 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 09e487e: Add a new namespace to frwiktionary (T262398) (duration: 01m 04s)
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 03s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 07s)
  • 18:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 0cde0b1: Add throttle rule for Czech senior citizens course (T262415) (duration: 01m 05s)
  • 18:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:00 mutante: helium (former backup host) is being removed from ferm rules on all hosts, it was replaced by backup1001 (T260717)
  • 17:33 bblack: dns servers: upgrading remainder of fleet to gdnsd-3.3.0-1~wmf1
  • 16:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:25 bblack: authdns1001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:06 bblack: dns4001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:04 bblack: reprepro: uploaded gdnsd-3.3.0-1~wmf1 - T261340
  • 15:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:04 volans: uploaded cumin_4.0.0 to apt.wikimedia.org buster-wikimedia (no code changes)
  • 13:58 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:52 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:42 moritzm: rebooting etherpad1002 (etherpad.wikimedia.org) for kernel update
  • 13:24 moritzm: installing rake security updates on stretch
  • 13:10 ebernhardson: delete lldwiki_{content|general} indices from search.svc.{eqiad|codfw}.wmnet:9643 (psi), they should be on 9443 (omega)
  • 12:57 klausman: Ran puppet-merge to get my dotfiles from https://gerrit.wikimedia.org/r/c/operations/puppet/+/626367 out
  • 12:34 moritzm: installing firejail updates on maps/thumbor/restbase
  • 12:01 moritzm: upgrading deployment servers to git 2.20 T262244
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P12557 and previous config saved to /var/cache/conftool/dbconfig/20200910-113758-marostegui.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P12556 and previous config saved to /var/cache/conftool/dbconfig/20200910-113426-marostegui.json
  • 11:13 matthiasmullie: Euro B&C done
  • 11:13 moritzm: uploaded git 2.20.1-2+deb10u3~wmf1 to stretch-wikimedia/main T262244
  • 11:11 mlitn@deploy1001: Synchronized php-1.36.0-wmf.8//extensions/WikimediaEvents/: WikimediaEvents: Enable MediaSearch A/B test (duration: 01m 06s)
  • 10:42 duesen_: daniel@mwmaint2001:~$ mwscript maintenance/findBadBlobs.php jvwiki --revisions 214173 --mark T262457
  • 10:34 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:32 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 XioNoX: move VRRP master to cr2-esams
  • 10:21 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:45 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12555 and previous config saved to /var/cache/conftool/dbconfig/20200910-093106-marostegui.json
  • 09:26 dcausse: creating missing cirrus indices for jawikivoyage T262518
  • 09:24 dcausse: creating missing cirrus indices for jawikivoyage T260228
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12554 and previous config saved to /var/cache/conftool/dbconfig/20200910-091335-marostegui.json
  • 08:49 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:47 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12551 and previous config saved to /var/cache/conftool/dbconfig/20200910-082304-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12550 and previous config saved to /var/cache/conftool/dbconfig/20200910-073107-marostegui.json
  • 07:03 elukey: resize search-loader vms (+4 vcores +4GB of ram) on Ganeti - T262385
  • 05:29 marostegui: Deploy schema change on s3 master - T260476
  • 00:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master (duration: 06m 42s)
  • 00:24 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master
  • 00:23 twentyafterfour: done. Phabricator update complete
  • 00:23 twentyafterfour: applying database migrations to phabricator db
  • 00:09 twentyafterfour: deploying phabricator update 2020-09-10 https://phabricator.wikimedia.org/project/view/4755/

2020-09-09

  • 23:51 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915 (duration: 00m 05s)
  • 23:51 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915
  • 23:37 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/CirrusSearch/includes/Search/InterleavedResultSet.php: Repair passing interleaved search metrics from backend to frontend (duration: 01m 04s)
  • 20:13 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:625914 (duration: 01m 03s)
  • 20:03 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:626190 T261425 (duration: 01m 03s)
  • 20:01 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.8/skins/WikimediaApiPortal: Backport gerrit:626044, T261425 (duration: 01m 12s)
  • 19:11 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.8 (duration: 01m 03s)
  • 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.8
  • 18:19 _joe_: banning urls ^/api/rest_v1/page/mobile-html-offline-resources/ from varnish caches
  • 18:19 Urbanecm: Morning B&C window done
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b226330: Enable $wgAllowCrossOrigin on all wikis (T262425) (duration: 01m 04s)
  • 18:15 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 85e36ae: Enable MediaWiki client errors on commonswiki and metawiki (T255585) (duration: 01m 06s)
  • 18:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 02m 55s)
  • 17:59 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 06m 47s)
  • 17:52 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2 (duration: 09m 38s)
  • 17:42 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2
  • 17:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437 (duration: 06m 00s)
  • 17:35 ppchelko@deploy1001: Started deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:24 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:22 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:15 marostegui: Stop mysql on db2125 for on-site maintenance T260670
  • 16:10 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3] (duration: 00m 11s)
  • 16:10 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3]
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 bd808: scap3 of Striker to labweb1001 failing. Will investigate.
  • 16:05 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2] (duration: 00m 11s)
  • 16:05 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2]
  • 16:04 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) (duration: 01m 21s)
  • 16:03 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111)
  • 15:54 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:11 herron: prometheus1003: systemctl restart thanos-sidecar@ops.service
  • 14:29 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:22 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:00 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:57 marostegui: Restart mysql on db1115 T231769
  • 13:54 bblack: deployed https://gerrit.wikimedia.org/r/626153
  • 12:47 _joe_: restarting php-fpm on wtp2003
  • 12:46 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:37 cmjohnson1: beginning scheduled PDU maintenance racks D5 and D6 in eqiad
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12545 and previous config saved to /var/cache/conftool/dbconfig/20200909-123634-kormat.json
  • 12:31 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12544 and previous config saved to /var/cache/conftool/dbconfig/20200909-123109-kormat.json
  • 12:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 moritzm: installing zeromq security updates on Buster
  • 12:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:37 awight: EU Bacon complete
  • 11:34 awight@deploy1001: Synchronized wmf-config: Config: api-portal: required extended configuration (T261425) (duration: 01m 08s)
  • 11:15 moritzm: added Tobias Klausmann to pwstore
  • 11:14 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:03 marostegui: Stop MySQL on s2 eqiad master to prepare for the PDU maintenance (this will generate lag on s2 on labsdb) T261453
  • 10:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:28 volans: restarting ferm on failed hosts: an-test-master1001.eqiad.wmnet,an-worker1116.eqiad.wmnet,db[1075,1101,1116].eqiad.wmnet,labstore1007.wikimedia.org,logstash[1025,1030].eqiad.wmnet leftover from yesterday network issue
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:11 klausman: Rebooting stat1005 for clearing GPU status and testing new DKMS driver (T260442)
  • 10:09 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12542 and previous config saved to /var/cache/conftool/dbconfig/20200909-100157-kormat.json
  • 09:52 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12541 and previous config saved to /var/cache/conftool/dbconfig/20200909-095219-kormat.json
  • 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12540 and previous config saved to /var/cache/conftool/dbconfig/20200909-093353-kormat.json
  • 09:26 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12539 and previous config saved to /var/cache/conftool/dbconfig/20200909-092621-kormat.json
  • 09:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:11 moritzm: installing qemu security updates on Buster
  • 09:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:53 _joe_: restarting restbase on rb2009 (depooled)
  • 08:53 godog: upgrade kibana to 7.9.1 on the logstash7 cluster
  • 08:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12538 and previous config saved to /var/cache/conftool/dbconfig/20200909-085147-kormat.json
  • 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12537 and previous config saved to /var/cache/conftool/dbconfig/20200909-084433-kormat.json
  • 08:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12536 and previous config saved to /var/cache/conftool/dbconfig/20200909-083616-kormat.json
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:30 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12535 and previous config saved to /var/cache/conftool/dbconfig/20200909-083038-kormat.json
  • 08:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable DynamicPageList on ruwikinews (T262240) (duration: 01m 22s)
  • 07:25 elukey: restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage
  • 06:21 XioNoX: push new pfw policies - T262297
  • 01:58 eileen: civicrm revision changed from 4e40a59d42 to cc1f7e6d13, config revision is 4845a229dc

2020-09-08

  • 23:47 eileen: civicrm revision is 4e40a59d42, config revision is d26334fa36
  • 23:25 eileen: civicrm revision changed from 5e7352e2c3 to 4e40a59d42, config revision is 3cf0913789
  • 22:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:12 andrew@deploy1001: Finished deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update (duration: 03m 35s)
  • 22:08 andrew@deploy1001: Started deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update
  • 22:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:57 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks (duration: 00m 13s)
  • 21:57 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks
  • 19:19 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.8
  • 19:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s)
  • 18:22 elukey: rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305
  • 18:00 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.8
  • 17:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:54 Amir1: Deployed patch for T262240
  • 17:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:23 andrewbogott: rebooting cloudvirt1033
  • 17:03 klausman: attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs
  • 16:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - T251609 (duration: 00m 58s)
  • 16:34 herron: increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag)
  • 16:12 longma: 1.36.0-wmf.8 was branched at e81e81e for T257976
  • 16:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:03 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:02 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:34 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1004.*
  • 15:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.*
  • 15:30 elukey: roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed
  • 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 15:20 _joe_: restarted celery-ores-worker.service on ores1007
  • 15:19 _joe_: restarted ferm on wdqs1011
  • 15:18 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 15:16 _joe_: starting wdqs-updater on wdqs1005
  • 15:15 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
  • 15:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet
  • 15:14 bblack: repool cp1087-90 (eqiad row D)
  • 15:13 herron: rolling restart of elk5 logstashes
  • 15:10 marostegui: Start mysql on db1106 after PDU maintenance is done
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: service=kubesvc,name=kubernetes1013.*
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes1004.*
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 4 port 0
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 0 member 2 port 50
  • 15:02 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 1 port 1
  • 14:53 marostegui: Reload dbproxy1016 to recover the alert
  • 14:45 jynus: restarting bacula-dir @ backup1001
  • 14:44 XioNoX: reboot asw2-d3-eqiad
  • 14:33 moritzm: bouncing ferm on hosts where ferm.service failed due to DNS resolution issues for prometheus hosts
  • 14:31 volans: restarted ssh on mc1033 from console
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 1 member 4 port 0
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 0 member 2 port 50
  • 14:13 akosiaris: drain kubernetes1013, kubernetes1004. They are on row D
  • 14:13 bblack: dns1002 - disable puppet + bird service (stop advertising recdns from row D)
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1090.eqiad.wmnet
  • 13:59 bblack: depooling cp1087-1090
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp108[789].eqiad.wmnet
  • 13:57 XioNoX: asw2-d-eqiad> request system reboot member 3
  • 13:35 cmjohnson1: the power cable was not properly seated and lost power to asw2-d3-eqiad
  • 13:34 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 13:30 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:25 mateusbs17: Restarted puppetdb on deployment-puppetdb03 (T248041)
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:20 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:18 cmjohnson1: swapping pdu's in eqiad, mgmt for racks d3 and d4 will go down
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 13:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 13:14 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:13 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:12 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12523 and previous config saved to /var/cache/conftool/dbconfig/20200908-123546-kormat.json
  • 12:34 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:27 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12522 and previous config saved to /var/cache/conftool/dbconfig/20200908-122702-kormat.json
  • 12:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12521 and previous config saved to /var/cache/conftool/dbconfig/20200908-121139-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12520 and previous config saved to /var/cache/conftool/dbconfig/20200908-120419-kormat.json
  • 12:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:18 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:53 marostegui: Deploy schema change on s3 eqiad master - T253276
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:20 marostegui: Deploy schema change on s4 eqiad master - T253276
  • 10:14 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:11 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:08 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12519 and previous config saved to /var/cache/conftool/dbconfig/20200908-100852-kormat.json
  • 09:52 akosiaris: enable puppet, run it on all k8s eqiad nodes and double check that calico-node is fine T239835
  • 09:43 akosiaris: stopped calico-node and kube-apiserver on k8s nodes/masters T239835
  • 09:43 marostegui: Stop mysql on es2014 to clone es2026 T261717
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 - T261717', diff saved to https://phabricator.wikimedia.org/P12517 and previous config saved to /var/cache/conftool/dbconfig/20200908-093957-marostegui.json
  • 09:37 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs (#2), T261489"
  • 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:28 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12515 and previous config saved to /var/cache/conftool/dbconfig/20200908-092755-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:20 jayme: disabling puppted on argon.eqiad.wmnet,chlorine.eqiad.wmnet,kubernetes[1001-1016].eqiad.wmnet - Reinitialize eqiad k8s cluster with new etcd - T239835
  • 08:55 marostegui: Deploy schema change on s7 eqiad master - T253276
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2127's weight', diff saved to https://phabricator.wikimedia.org/P12514 and previous config saved to /var/cache/conftool/dbconfig/20200908-084834-marostegui.json
  • 08:45 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs, T261489"
  • 08:23 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
  • 08:22 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 08:21 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
  • 08:20 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 08:16 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:44 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (T250887; T262242) (duration: 00m 59s)
  • 07:44 elukey: roll restart kafka daemons on kafka-jumbo100[7-9] to pick up opendjk upgrades
  • 07:40 XioNoX: move HE from ix to transit BGP group on cr3-eqsin
  • 07:00 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:58 marostegui: Deploy schema change on s2 eqiad master - T253276
  • 06:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P12513 and previous config saved to /var/cache/conftool/dbconfig/20200908-065022-marostegui.json
  • 06:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:31 marostegui: Deploy schema change on s5 eqiad master - T253276
  • 06:23 elukey: roll restart of Hadoop master daemons on an-master100[1,2] to pick up new opejdk settings
  • 06:14 marostegui: Stop MySQL on db1106 for PDU maintenance T261452
  • 05:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime

2020-09-07

  • 23:35 Reedy: Deployed patch for T262213
  • 21:19 reedy@deploy1001: Synchronized private/PrivateSettings.php: Remove old mitigation (duration: 00m 55s)
  • 18:04 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12511 and previous config saved to /var/cache/conftool/dbconfig/20200907-153857-kormat.json
  • 15:32 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12510 and previous config saved to /var/cache/conftool/dbconfig/20200907-153206-kormat.json
  • 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12509 and previous config saved to /var/cache/conftool/dbconfig/20200907-152117-kormat.json
  • 15:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12508 and previous config saved to /var/cache/conftool/dbconfig/20200907-151718-kormat.json
  • 15:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:09 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12507 and previous config saved to /var/cache/conftool/dbconfig/20200907-150901-kormat.json
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 moritzm: rebooting poolcounter1004/1005
  • 15:03 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12506 and previous config saved to /var/cache/conftool/dbconfig/20200907-150310-kormat.json
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1133 from dbctl T253217', diff saved to https://phabricator.wikimedia.org/P12504 and previous config saved to /var/cache/conftool/dbconfig/20200907-143507-marostegui.json
  • 14:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 _joe_: restarting pybal in codfw to pick up the new mobileapps TLS endpoint
  • 13:44 _joe_: restarting pybal in eqiad to pick up the new mobileapps TLS endpoint
  • 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:28 hashar@deploy1001: Finished deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924 (duration: 00m 05s)
  • 13:27 hashar@deploy1001: Started deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:22 hashar@deploy1001: Finished deploy [integration/docroot@11ab4a0]: (no justification provided) (duration: 00m 10s)
  • 13:22 hashar@deploy1001: Started deploy [integration/docroot@11ab4a0]: (no justification provided)
  • 13:14 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:04 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:43 kormat@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 12:42 kormat@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:29 marostegui: Upgrade and reboot db2094 and db2095 (sanitarium hosts in codfw)
  • 12:18 gehel: restarting elasticsearch on elastic2029 (high GC)
  • 12:01 volans: restart uwsgi on debmonitor1002 to test db reconnection
  • 11:58 marostegui: Reboot pc1008 for upgrade
  • 11:36 Urbanecm: EU B&C done
  • 11:30 urbanecm@deploy1001: Synchronized docroot/noc/index.html: bbfe2ce: noc: Remove link to outdated blog (T259978) (duration: 00m 57s)
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: ff9f104: Update help URL (T256623) (duration: 00m 56s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b512d3: [hewiktionary] Enable wikilove (T262181) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 2/2) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 1/2) (duration: 01m 20s)
  • 11:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewiktionary wikilove # T262181
  • 11:01 marostegui: Reboot pc1007 for upgrade
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:36 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:12 dcausse@deploy1001: Finished deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server) (duration: 00m 33s)
  • 09:11 dcausse@deploy1001: Started deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server)
  • 09:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:49 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:29 jayme@deploy2001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:19 marostegui: Upgrade and restart pc1010
  • 08:18 jayme@deploy2001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:10 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:03 marostegui: Compress InnoDB on s8 eqiad master (db1109) - T232446
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P12501 and previous config saved to /var/cache/conftool/dbconfig/20200907-051157-marostegui.json
  • 04:56 marostegui: Compress InnoDB on s1 eqiad master - this will generate a few day of lag on s1 and labsdb for enwiki T254462
  • 04:53 marostegui: Deploy schema change on db1109 (eqiad wikidata master) - T256685

2020-09-06

  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2127's weight a bit', diff saved to https://phabricator.wikimedia.org/P12496 and previous config saved to /var/cache/conftool/dbconfig/20200906-194512-marostegui.json
  • 08:20 elukey: powercycle mw1360 (mgmt console available, network errors while running anything)
  • 08:04 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1360.eqiad.wmnet
  • 08:01 elukey: executed "sudo ipmitool -I lanplus -H mw1360.mgmt.eqiad.wmnet -U root mc reset cold" from cumin (mgmt not available for mw1360)

2020-09-05

  • 00:23 foks: removing 2 files for legal compliance

2020-09-04

  • 22:15 ryankemper: wdqs deploy complete, service is healthy
  • 21:54 ryankemper: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:52 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:49 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@c7e6b35]: 0.3.47 (duration: 12m 55s)
  • 21:37 ryankemper: Tests on canary `wdqs1003` passing, beginning full wdqs deploy
  • 21:36 ryankemper@deploy1001: Started deploy [wdqs/wdqs@c7e6b35]: 0.3.47
  • 21:31 ryankemper: `ryankemper@wdqs2002:~$ sudo systemctl restart wdqs-blazegraph`
  • 21:06 mutante: apt1001 - removed all libnginx-mod* packages except libnginx-mod-http-echo ; sudo apt-get autoremove ; run puppet ; restarted nginx - apt.wikimedia.org switched to nginx-light (T261962)
  • 21:02 mutante: apt1001 - remove all libnginx-mod* packages except libnginx-mod-http-echo
  • 20:59 mutante: apt2001 - sudo apt-get autoremove
  • 20:51 mutante: apt2001 - apt-get remove --purge libnginx* and run puppet to replace nginx-full with nginx-light (T261962)
  • 20:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 mutante: Icinga - ACKing with sticky - alerts on test and dev hosts
  • 18:10 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing (duration: 07m 35s)
  • 18:02 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing
  • 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12492 and previous config saved to /var/cache/conftool/dbconfig/20200904-102955-marostegui.json
  • 10:28 marostegui: Deploy MCR schema change on db1087 (sanitarium master), this will generate lag (probably a few days) on s8 labsdb hosts T238966
  • 09:48 marostegui: Restart prometheus-mysqld-exporter on db2125
  • 09:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 08:31 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:29 elukey: roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades
  • 08:08 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:30 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 05:13 marostegui: Deploy MCR schema change on s4 eqiad master T238966
  • 01:51 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints (duration: 63m 18s)
  • 01:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:30 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 01:23 ryankemper: (Following the restart of blazegraph, service has been restored to `wdqs2003`. See https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599182219699&to=1599182547699)
  • 01:16 ryankemper: Glancing at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599170628749&to=1599182011243, looks like `wdqs2003`'s blazegaph isn't happy based off the null data entries. Restarting blazegraph: `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph`
  • 00:48 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints

2020-09-03

  • 23:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9394739: Start logging log-ins on select wikis (T253802) (duration: 00m 56s)
  • 21:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:55 milimetric@deploy1001: deploy aborted: AQS: Deploying new geoeditors endpoints (duration: 00m 13s)
  • 19:54 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints
  • 19:07 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149] (duration: 00m 08s)
  • 19:07 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149]
  • 19:06 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149] (duration: 09m 06s)
  • 18:57 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149]
  • 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:46 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:02 papaul: power down ores2009 for DIMM upgrade
  • 16:45 papaul: power down ores2008 for DIMM upgrade
  • 16:33 papaul: power down ores2007 for DIMM upgrade
  • 16:24 elukey: roll restart aqs on aqs1* to pick up new druid settings
  • 16:05 papaul: power down ores2006 for DIMM upgrade
  • 15:51 papaul: power down ores2005 for DIMM upgrade
  • 15:33 papaul: power down ores2004 for DIMM upgrade
  • 15:30 moritzm: installing nginx updates on apt* and htmldumper1001
  • 15:25 moritzm: installing firejail update (along with restarts) on thumbor1001, maps1001, restbase1016 (and -dev)
  • 15:22 papaul: power down ores2003 for DIMM upgrade
  • 15:17 moritzm: installing firejail security updates on parsoid servers
  • 15:08 papaul: power down ores2002 for DIMM upgrade
  • 14:53 papaul: power down ores2001 for DIMM upgrade
  • 14:36 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:30 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:29 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 06s)
  • 14:29 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 14:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 marostegui: Failover m5 (wikitech) master - T260324
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:43 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 18s)
  • 13:43 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 13:40 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me (duration: 01m 29s)
  • 13:39 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me
  • 13:32 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host (duration: 00m 05s)
  • 13:32 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host
  • 13:08 marostegui: Start pre m5 failover steps T260324
  • 12:46 marostegui: Deploy MCR schema change on s7 eqiad master (lag might show up) - T238966
  • 12:30 hnowlan: enabling puppet on appservers, finished rollout of api.wikimedia.org https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'Shift weights in s2 codfw to account for db2125 being down T260670', diff saved to https://phabricator.wikimedia.org/P12485 and previous config saved to /var/cache/conftool/dbconfig/20200903-121916-kormat.json
  • 12:17 moritzm: installing openexr security updates for stretch
  • 12:03 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2125 after hw issue', diff saved to https://phabricator.wikimedia.org/P12483 and previous config saved to /var/cache/conftool/dbconfig/20200903-120304-kormat.json
  • 11:45 moritzm: installing net-snmp security updates on Stretch
  • 11:45 moritzm: installing net-snmp security updates on Buster
  • 11:33 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix | phaste # T260320 # P12481
  • 11:28 moritzm: installing PHP 7.0 security updates
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 04281a0: Add extra namespaces for jawikivoyage (T260320) (duration: 01m 01s)
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 976d735: Lift IP cap on 2020-09-08 for Senior Citizen Write Wikipedia course - cs.wikipedia (T261882) (duration: 01m 01s)
  • 11:21 gilles@deploy1001: Synchronized static/images/project-logos: T252108 Deploying lossily optimised Wikipedia logos (duration: 01m 20s)
  • 10:50 hnowlan: disabling apache on appservers for rollout of https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 10:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:07 XioNoX: re-apply vlan 1118 firewall filter and update OSPF/bootp on cr1/2-eqiad - T261866
  • 09:57 XioNoX: rectification: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 on cr1-eqiad - T261866
  • 09:56 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12480 and previous config saved to /var/cache/conftool/dbconfig/20200903-095510-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12479 and previous config saved to /var/cache/conftool/dbconfig/20200903-095015-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12478 and previous config saved to /var/cache/conftool/dbconfig/20200903-094857-marostegui.json
  • 09:48 XioNoX: move VRRP master from cr1-eqiad:ae2.1118 to cr2-eqiad:xe-3/0/4.1118 - T261866
  • 09:46 XioNoX: move vlan 1118 IPv4 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12477 and previous config saved to /var/cache/conftool/dbconfig/20200903-094435-marostegui.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12476 and previous config saved to /var/cache/conftool/dbconfig/20200903-094043-marostegui.json
  • 09:38 XioNoX: move vlan 1118 IPv6 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12475 and previous config saved to /var/cache/conftool/dbconfig/20200903-093629-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12474 and previous config saved to /var/cache/conftool/dbconfig/20200903-093454-marostegui.json
  • 09:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12473 and previous config saved to /var/cache/conftool/dbconfig/20200903-092549-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316 db2087:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12472 and previous config saved to /var/cache/conftool/dbconfig/20200903-092028-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12471 and previous config saved to /var/cache/conftool/dbconfig/20200903-091834-marostegui.json
  • 09:13 XioNoX: rolled back: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2122', diff saved to https://phabricator.wikimedia.org/P12470 and previous config saved to /var/cache/conftool/dbconfig/20200903-090901-marostegui.json
  • 09:06 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P12469 and previous config saved to /var/cache/conftool/dbconfig/20200903-090419-marostegui.json
  • 09:01 XioNoX: force ae2.1118 VRRP master on cr1-eqiad - T261866
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317, db1098:3316', diff saved to https://phabricator.wikimedia.org/P12468 and previous config saved to /var/cache/conftool/dbconfig/20200903-090007-marostegui.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3317', diff saved to https://phabricator.wikimedia.org/P12467 and previous config saved to /var/cache/conftool/dbconfig/20200903-085838-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12466 and previous config saved to /var/cache/conftool/dbconfig/20200903-085708-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12465 and previous config saved to /var/cache/conftool/dbconfig/20200903-084910-marostegui.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P12464 and previous config saved to /var/cache/conftool/dbconfig/20200903-084836-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317, db1090:3312', diff saved to https://phabricator.wikimedia.org/P12463 and previous config saved to /var/cache/conftool/dbconfig/20200903-084358-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12462 and previous config saved to /var/cache/conftool/dbconfig/20200903-084147-marostegui.json
  • 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 T261917', diff saved to https://phabricator.wikimedia.org/P12461 and previous config saved to /var/cache/conftool/dbconfig/20200903-082956-marostegui.json
  • 08:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:28 moritzm: rebooting mwmaint1002 for kernel update
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12460 and previous config saved to /var/cache/conftool/dbconfig/20200903-082655-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12459 and previous config saved to /var/cache/conftool/dbconfig/20200903-082034-marostegui.json
  • 08:16 marostegui: Upgrade db1101 (s7 and s8)
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12458 and previous config saved to /var/cache/conftool/dbconfig/20200903-081543-marostegui.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1101:3317', diff saved to https://phabricator.wikimedia.org/P12457 and previous config saved to /var/cache/conftool/dbconfig/20200903-081503-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12456 and previous config saved to /var/cache/conftool/dbconfig/20200903-081337-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12455 and previous config saved to /var/cache/conftool/dbconfig/20200903-080714-marostegui.json
  • 08:06 marostegui: Upgrade and reboot db1127
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12454 and previous config saved to /var/cache/conftool/dbconfig/20200903-080634-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12453 and previous config saved to /var/cache/conftool/dbconfig/20200903-080024-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12452 and previous config saved to /var/cache/conftool/dbconfig/20200903-075443-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12451 and previous config saved to /var/cache/conftool/dbconfig/20200903-074922-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12450 and previous config saved to /var/cache/conftool/dbconfig/20200903-074827-marostegui.json
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:45 marostegui: Upgrade and reboot db1094
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12449 and previous config saved to /var/cache/conftool/dbconfig/20200903-074426-marostegui.json
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12448 and previous config saved to /var/cache/conftool/dbconfig/20200903-073718-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12447 and previous config saved to /var/cache/conftool/dbconfig/20200903-073116-marostegui.json
  • 07:29 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12446 and previous config saved to /var/cache/conftool/dbconfig/20200903-072716-marostegui.json
  • 07:24 hashar: contint2001: restarting CI Jenkins for plugins upgrade
  • 07:19 marostegui: Deploy schema change on s8 eqiad master T237120
  • 07:18 marostegui: Stop slave on s8 eqiad master (lag will appear on s8 eqiad) - T237120
  • 07:02 marostegui: Stop db2100:3317 and db2121 in sync to reload metawiki.content T261869
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12445 and previous config saved to /var/cache/conftool/dbconfig/20200903-070104-marostegui.json
  • 06:56 hashar: contint2001: restarting CI Jenkins
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 _joe_: deployment of mobileapps to pick up changes to envoy config, new helmfile layout
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12444 and previous config saved to /var/cache/conftool/dbconfig/20200903-065105-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12443 and previous config saved to /var/cache/conftool/dbconfig/20200903-064804-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12442 and previous config saved to /var/cache/conftool/dbconfig/20200903-064623-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12441 and previous config saved to /var/cache/conftool/dbconfig/20200903-064334-marostegui.json
  • 06:24 marostegui: Disconnect eqiad -> codfw replication

2020-09-02

  • 22:55 shdubsh: restart rsyslog on centrallog[12]001
  • 22:27 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 22:26 ryankemper: Puppet finished on all external wdqs codfw nodes, nginx automatically reloaded as intended
  • 22:24 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo run-puppet-agent"`
  • 21:48 bd808@deploy1001: Finished deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810) (duration: 01m 34s)
  • 21:46 bd808@deploy1001: Started deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810)
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart nginx.service"`
  • 21:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 ryankemper: Restarted nginx on `wdqs2007`
  • 21:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 ryankemper: restarted blazegraph on `wdqs2001` as well
  • 20:46 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal and not P{wdqs2001.codfw.wmnet}' "sudo systemctl restart wdqs-blazegraph.service"` (restarted everything but 2001, will restart 2001 next)
  • 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:20 robh: scs-c1-eqiad firmware update complete and back online T238036
  • 19:14 robh: updating firmware on scs-c1-eqiad via T238036
  • 19:14 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (duration: 00m 32s)
  • 18:58 herron: freeing some disk space on centrallog1001 with 'tune2fs -m 0 /dev/centrallog1001-vg/data'
  • 18:43 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled, ouch, forgot to rebase (duration: 00m 55s)
  • 18:40 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled (duration: 00m 55s)
  • 18:38 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:37 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-codfw - T261865
  • 18:36 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:622897 Install OAuthRateLimiter extension II: Add flag to IS (duration: 00m 56s)
  • 18:34 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-eqiad - T261865
  • 18:33 ppchelko@deploy1001: Synchronized wmf-config/extension-list: (no justification provided) (duration: 00m 54s)
  • 18:32 ottomata: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:28 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Fix parsing localised digits in PHP discussion parser (duration: 00m 56s)
  • 18:19 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Re-apply new reply API patches (again) (duration: 00m 58s)
  • 17:34 bstorm: re-enabled puppet on labsdb10[09-12]
  • 17:28 bstorm: disabled puppet on labsdb10[09-12]
  • 17:18 herron: restarted elasticsearch on logstash1012
  • 16:39 Pchelolo: creating oauth_ratelimit_client_tier table T258711
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:32 hnowlan: Temporarily disabling apache for configuration change T246945
  • 15:24 godog: prometheus codfw lvextend --resizefs --size +50G /dev/mapper/vg--ssd-prometheus--k8s
  • 15:19 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:16 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main
  • 15:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:31 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main eqiad - T261865
  • 14:29 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main codfw - T261865
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12434 and previous config saved to /var/cache/conftool/dbconfig/20200902-141854-marostegui.json
  • 13:05 elukey: run kafka preferred-replica-election on kafka-main codfw
  • 12:07 XioNoX: move vrrp master from cr2-codfw to cr1-codfw
  • 11:52 duesen__: daniel@mwmaint2001:/srv/mediawiki/php-1.36.0-wmf.6$ mwscript findBadBlobs.php testwiki --mark T251778
  • 11:36 Urbanecm: EU B&C done
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 796b4fa: Add title for apiportalwiki (T246945) (duration: 00m 56s)
  • 11:34 Urbanecm: Fetched extra commits to deploy1001's stagging dir, commit messages explains it's an accident, continuing; cc Krinkle
  • 11:31 duesen__: Deployed second security fix for T260485
  • 11:07 XioNoX: repool cr1-eqiad
  • 10:58 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:49 XioNoX: reboot cr1-eqiad:re0 (backup)
  • 10:45 jbond42: install apache updates on buster
  • 10:36 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:35 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 10:34 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 10:32 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 10:31 jbond42: install apache updates on jessie
  • 10:27 XioNoX: reboot cr1-eqiad:re1 (backup)
  • 10:18 XioNoX: move VRRP master from cr1 to cr2
  • 10:16 XioNoX: drain cr1-eqiad transit/transport/IX
  • 10:13 XioNoX: drain cr1-eqiad-pfw3-eqiad link
  • 10:04 XioNoX: repool cr2-eqiad
  • 09:55 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:46 XioNoX: reboot cr2-eqiad:re0 (backup) - T259621
  • 09:28 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:18 XioNoX: reboot cr2-eqiad:re1 (backup) - T259621
  • 09:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:13 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:13 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:12 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:11 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:07 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 09:01 elukey: reimage kafka-jumbo1004 to Buster
  • 08:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from s10 - T260324', diff saved to https://phabricator.wikimedia.org/P12432 and previous config saved to /var/cache/conftool/dbconfig/20200902-085705-marostegui.json
  • 08:52 XioNoX: deactivate cr2-eqiad transit/IX - T259621
  • 08:50 XioNoX: drain cr2-eqiad transport links - T259621
  • 08:20 XioNoX: activate Telia BGP in eqiad
  • 07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 elukey: reimage kafka-jumbo1003 to buster
  • 07:28 marostegui: Reboot dbstore1003 for kernel upgrade - T261389
  • 07:12 XioNoX: configure cr2-eqiad:ae5 as single LACP link to Telia
  • 07:05 marostegui: Drop unused grants on m5 T261152
  • 07:02 elukey: reboot kafka-jumbo1002 to pick up new kernel settings
  • 07:00 XioNoX: deactivate Telia BGP in eqiad
  • 06:38 elukey: powercycle analytics1059 - cpu soft locks on multiple CPUs
  • 06:30 elukey: reboot kafka-jumbo1001 to pick up new kernel settings
  • 06:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .

2020-09-01

  • 22:39 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=sysop_itwiki Pierpao (T261722)
  • 17:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:36 ryankemper: wdqs [canary] rollback complete, tests passing now. Will need to dig into source of failure
  • 17:35 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@7920fbe]: 0.3.46 (duration: 03m 43s)
  • 17:35 ryankemper: `wdqs1003` (the canary instance) is failing tests now, going to rollback
  • 17:32 ryankemper@deploy1001: Started deploy [wdqs/wdqs@7920fbe]: 0.3.46
  • 17:30 ryankemper: Starting wdqs deploy
  • 15:56 chasemp: labsdb* puppet agent --test; sudo /usr/local/sbin/maintain-views --all-databases --table user --replace-all; sudo /usr/local/sbin/maintain-views --all-databases --table user_old --replace-all
  • 15:25 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:28 _joe_: restarting envoy on all eqiad jobrunners
  • 14:22 _joe_: restarted confd on mwmaint1002
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:18 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2083 weight', diff saved to https://phabricator.wikimedia.org/P12429 and previous config saved to /var/cache/conftool/dbconfig/20200901-141521-marostegui.json
  • 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:07 rzl@cumin1001: MediaWiki read-only period ends at: 2020-09-01 14:07:36.305500
  • 14:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: MediaWiki read-only period starts at: 2020-09-01 14:02:04.851006
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:58 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:58 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:51 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:45 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:44 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 10:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:48 XioNoX: reserve cr2-eqiad:xe-3/3/7 for new Telia port
  • 09:38 jayme: systemctl restart docker-reporter-releng-images.service on deneb to clear out alert because of temporary HTTP 504 from debmonitor
  • 09:01 moritzm: installing Java 8 sec updates on contint*
  • 08:51 moritzm: uploaded apache 2.4.10-10+deb8u16+wmf1 for jessie-wikimedia
  • 07:11 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 07:05 moritzm: restarting jenkins on releases1002 to pick up Java security updates
  • 06:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:44 elukey: reimage kafka-jumbo1002 to Buster
  • 06:20 marostegui: Install query killers on db2137:3314 T243373
  • 01:17 chaomodus: updated the pynetbox package to 5.0.7 and uploaded to buster
  • 00:02 mutante: wb2-grrrri was not running and wikibugs had no more Gerrit updates since a while
  • 00:01 mutante: restarting wikibugs

2020-08-31

  • 23:38 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final) (duration: 00m 17s)
  • 23:38 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final)
  • 23:37 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001 (duration: 01m 12s)
  • 23:36 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001
  • 23:36 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001 (duration: 00m 58s)
  • 23:35 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2 (duration: 00m 05s)
  • 23:31 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next (duration: 00m 57s)
  • 23:30 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next
  • 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable (future) mw-reverted tag for all wikis except testwiki (T254074) (duration: 00m 57s)
  • 21:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:20 ryankemper: `sudo systemctl restart elasticsearch_6@production-search-psi-eqiad.service` on `elastic1052.eqiad.wmnet`
  • 18:38 Urbanecm: Morning B&C done
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 16197aa: Add two domains to wgCopyUploadsDomains for commonswiki (T261562; T261575) (duration: 00m 54s)
  • 18:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb28e9d: itwiki: Assign patrol right to autopatrolled instead of autoconfirmed (T261587) (duration: 00m 53s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: a1b0d6e: b609cd5: CommonSettings.php: limit new Echos `push-subscription-manager` group to Meta-Wiki (T261625) (duration: 00m 54s)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 846c544: wgEventStreams: Stream for MEP-iOS pilot (T260382) (duration: 00m 55s)
  • 17:21 volans: uploaded spicerack_0.0.42 to apt.wikimedia.org buster-wikimedia
  • 15:50 rzl@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
  • 15:49 ejegg: updated payments-wiki from ef7ebd08cb to be81063168
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=99)
  • 15:32 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 14:58 ema: Traffic: depool eqiad from user traffic T243316
  • 14:38 moritzm: installing rake security updates on stretch
  • 14:33 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:21 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:20 rzl@cumin1001: Switching services apertium, termbox, search, api-gateway, ores, sessionstore, eventgate-main, graphoid, eventstreams, wikifeeds, wdqs, parsoid, eventgate-logging-external, wdqs-internal, echostore, mathoid, mobileapps, proton, restbase, kartotherian, recommendation-api, eventgate-analytics-external, restbase-async, citoid, schema, cxserver, eventgate-analytics, zotero: eqiad => codfw
  • 14:20 rzl@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:13 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:12 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=99)
  • 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 13:41 andrewbogott: dropping many databases from m5, as per T261152
  • 13:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:07 marostegui: Failover m3 (phabricator) proxy from dbproxy1016 to dbproxy1020 - T261459
  • 13:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:54 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:54 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:53 oblivian@cumin2001: Switching services parsoid: eqiad => codfw
  • 12:53 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:48 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:45 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:45 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:44 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:44 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:44 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:43 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:37 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:14 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:14 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:13 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:13 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:13 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:10 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:05 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 11:58 elukey: reimage kafka-jumbo1001 to Buster
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: 5d583d9: Disable MediaSearch A/B test (duration: 00m 55s)
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 81f88fd: Enable Signature button on Wikiproject for hywiki (T261550) (duration: 00m 54s)
  • 11:22 jbond42: removing old hiera version 1 and 3 backends
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b74893f: Enable sitenotice on mobile for closed wikis (T261357) (duration: 00m 56s)
  • 11:02 volans: upgraded spicerack to 0.0.41 on cumin hosts
  • 10:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:51 elukey: executed /srv/phab/phabricator/bin/remove destroy @klausman on phab1001 (following https://wikitech.wikimedia.org/wiki/Phabricator#Delete_a_user) to clear incosistent state of new account (wrong email address)
  • 08:43 moritzm: installing bind9 security updates on stretch/buster (client-side tools/libs only)
  • 07:53 volans: uploaded spicerack_0.0.41 to apt.wikimedia.org buster-wikimedia
  • 07:30 moritzm: installing squid security updates
  • 07:24 moritzm: installing openexr security updates on buster
  • 07:12 marostegui: Sanitize jawikivoyage on db2094:3325 and db1124:3325 T260482
  • 06:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:06 elukey: reimage kafka-jumbo1005 to Debian Buster
  • 05:21 marostegui: Reload haproxy on dbproxy1017 and dbproxy1021 to test db1128

2020-08-30

  • 16:13 herron: restarted eqiad v5 logstashes

2020-08-29

  • 18:05 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)
  • 17:45 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)

2020-08-28

  • 21:53 ryankemper: `sudo systemctl reload nginx.service` on `cloudelastic100[5,6].wikimedia.org` to try to resolve certificate warning issues
  • 19:11 andrewbogott: rebooting cloudvirt1006. It's a spare, unused system but showing a bus error and icinga alerts; not worth saving if it needs saving
  • 17:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:39 mutante: shutting down mw2196
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:40 rzl: switchdc live test complete
  • 16:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 16:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 16:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 16:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 16:33 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-28 16:29:24.432463
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-28 16:28:07.882663
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 16:19 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 16:19 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 16:13 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 16:12 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 16:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 16:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 16:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 16:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 16:06 rzl: starting one more live test of the data center switchover automation, no production impact is expected but there will be some SAL noise
  • 14:22 moritzm: installing Java security updates on kafka/main and Logstash(5) clusters
  • 13:35 hashar@deploy1001: Finished deploy [integration/docroot@65ec92c]: noop, sync up for README.md (duration: 00m 07s)
  • 13:35 hashar@deploy1001: Started deploy [integration/docroot@65ec92c]: noop, sync up for README.md
  • 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 elukey: stop kafka on kafka-jumbo1006 and reimage to buster
  • 12:56 moritzm: installing debmonitor1002 T261492
  • 12:46 moritzm: installing debmonitor2002 T261492
  • 11:50 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:27 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 09:48 jayme: updated helm to 2.16.9-3 on chartmuseum*, contint*, deploy*
  • 09:19 jayme: imported helm_2.16.9-3 to buster-wikimedia, stretch-wikimedia, jessie-wikimedia
  • 08:22 kormat: enabling replication from db2112 to db1083 (s1) T243373
  • 07:41 jynus: restart backup2001,backup1002
  • 07:10 jynus: restart db2139
  • 07:07 marostegui: Warm up parsercache in codfw - T260042
  • 06:47 jynus: restart db2102
  • 06:28 jynus: restart db2100
  • 06:07 jynus: restart db2099
  • 05:50 jynus: restart db2098
  • 00:06 eileen: process-control config revision is dd541a25dc

2020-08-27

  • 23:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:48 eileen: civicrm revision changed from a942537984 to 3d501e71d9, config revision is dd541a25dc
  • 22:54 eileen: civicrm revision changed from 481ab742db to a942537984, config revision is e2ab4d7c1f
  • 22:28 tzatziki: removing one file for legal compliance
  • 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 22:18 volans: uploaded spicerack_0.0.40-1_amd64.deb to apt.wikimedia.org buster-wikimedia
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:29 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:22 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:10 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw221[0-4].codfw.wmnet
  • 20:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw220[0-9].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw214[0-7].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw213[0-9].codfw.wmnet
  • 20:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Streams for testing MEP-based analytics instruments - T259714 (duration: 00m 55s)
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 marxarelli: 1.36.0-wmf.6 promoted to all wikis (T257974). new errors appear to be related to T261345 but are known since 1.36.0-wmf.5
  • 19:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw21[8-9][0-9]*.codfw.wmnet
  • 19:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.6
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
  • 19:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:16 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating apiportalwiki (T246945)
  • 19:15 urbanecm@deploy1001: Synchronized dblists: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:14 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:11 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 18:54 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 00m 08s)
  • 18:54 mforns@deploy1001: Started deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:53 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 10m 01s)
  • 18:43 mforns@deploy1001: Started deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:43 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Assign all homepage users to variant A (duration: 01m 03s)
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on ruwiki (T257490) (duration: 01m 03s)
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2250.codfw.wmnet,service=canary
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2249.codfw.wmnet,service=canary
  • 18:16 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:16 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:14 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=jobrunner,name=mw1318.eqiad.wmnet
  • 18:07 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw229[1-9].codfw.wmnet,cluster=api_appserver
  • 18:06 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2290.codfw.wmnet,cluster=api_appserver
  • 18:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw22[6-8][0-9].codfw.wmnet,cluster=api_appserver
  • 18:03 Urbanecm: Creating jawikivoyage is done (T260320)
  • 18:02 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
  • 18:02 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[0-9].codfw.wmnet,cluster=api_appserver
  • 18:00 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating jawikivoyage (T260320) (duration: 01m 02s)
  • 17:59 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw224[4-5].codfw.wmnet,service=canary
  • 17:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[4-5].codfw.wmnet
  • 17:59 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:58 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating jawikivoyage (T260320)
  • 17:57 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[0-3].codfw.wmnet
  • 17:56 urbanecm@deploy1001: Synchronized dblists: Creating jawikivoyage (T260320) (duration: 00m 58s)
  • 17:56 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw221[5-9].codfw.wmnet,service=canary
  • 17:55 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw221[5-9].codfw.wmnet
  • 17:55 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating jawikivoyage (T260320) (duration: 01m 07s)
  • 17:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw220[1-9].codfw.wmnet
  • 17:52 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw220[1-9].codfw.wmnet
  • 17:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2200.codfw.wmnet
  • 17:50 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2200.codfw.wmnet
  • 17:48 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw214[0-7].codfw.wmnet
  • 17:47 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw213[5-9].codfw.wmnet
  • 17:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw214[0-7].codfw.wmnet
  • 17:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw213[5-9].codfw.wmnet
  • 17:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw23[0-7][0-9].codfw.wmnet
  • 17:31 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[0-7].codfw.wmnet,service=canary
  • 17:30 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw227[0-7].codfw.wmnet
  • 17:29 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:29 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 17:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw226[8-9].codfw.wmnet
  • 17:13 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[4-8].codfw.wmnet
  • 17:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:11 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[0-2].codfw.wmnet
  • 17:04 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw223[2-9].codfw.wmnet
  • 17:01 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2231.codfw.wmnet
  • 16:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2230.codfw.wmnet
  • 16:54 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[4-9].codfw.wmnet
  • 16:49 mutante: re-weighted appservers and api appservers in eqiad - hardware type G = weight 25, all other types = weight 30 (T261159)
  • 16:48 mutante: depooling mw2187 - mw2199 - old codfw appservers of type A to be decom'ed, previously weight 10 (T260654)
  • 16:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw219[0-9].codfw.wmnet
  • 16:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw218[7-9].codfw.wmnet
  • 16:35 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1297.eqiad.wmnet
  • 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:21 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[0-5].eqiad.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw126[1-5].eqiad.wmnet,service=canary
  • 16:14 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw126[1-9].eqiad.wmnet
  • 16:12 elukey: remove some old/stale terms from analytics-in4 on cr1/cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622746, https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622744)
  • 16:09 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[6-9].eqiad.wmnet,service=canary
  • 16:08 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[6-9].eqiad.wmnet
  • 16:06 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1290.eqiad.wmnet
  • 16:05 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw128[0-9].eqiad.wmnet
  • 15:52 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1290.eqiad.wmnet
  • 15:51 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw128[0-9].eqiad.wmnet
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[7-9].eqiad.wmnet,service=canary
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1276.eqiad.wmnet,service=canary
  • 15:41 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw127[6-9].eqiad.wmnet
  • 15:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1297.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1269.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1267.eqiad.wmnet
  • 14:48 moritzm: installing Java security updates on aqs, hadoop and kafka-jumbo
  • 14:44 moritzm: restarting tomcat on idp-test* hosts to pick up Java update
  • 14:42 elukey: add eventgate-related terms to analytics-in4 filter on cr1/cr2-eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622705)
  • 14:37 moritzm: imported openjdk 8u265-b01-1~deb10u1 to buster-wikimedia (forward port of latest Java 8 security update)
  • 14:31 papaul: replacing msw-c5,c6,c7 and fmsw-c8
  • 13:58 kormat: disabling GTID on pc2007 (pc1), pc2008 (pc2), pc2009 (pc3) T243373
  • 13:56 kormat: disabling GTID on db2096 (x1), es2021 (es4), es2023 (es5) T243373
  • 13:54 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:53 kormat: disabling GTID on db2129 (s6), db2118 (s7), db2079 (s8) T243373
  • 13:52 kormat: disabling GTID on db2123 (s5) T243373
  • 13:52 kormat: disabling GTID on db2090 (s4) T243373
  • 13:51 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:51 kormat: disabling GTID on db2105 (s3) T243373
  • 13:50 kormat: disabling GTID on db2107 (s2) T243373
  • 13:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:29 elukey: restart jvm daemons on analytics1042, aqs1004, kafka-jumbo1001 to pick up new openjdk upgrades (canaries)
  • 13:18 kormat: enabling replication from db2107 to db1122 (s2) T243373
  • 13:14 kormat: enabling replication from db2096 to db1103 (x1) T243373
  • 13:10 jynus: restart db2097
  • 13:07 jbond42: deploy python3.4 security update to kraz
  • 13:03 jbond42: deploy python3.4 security update to canaries on jessie
  • 13:01 kormat: enabling replication from db2118 to db1086 (s7) T243373
  • 12:52 jynus: restart db1140
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s8 weights T243373', diff saved to https://phabricator.wikimedia.org/P12402 and previous config saved to /var/cache/conftool/dbconfig/20200827-124338-marostegui.json
  • 12:35 jynus: restart db1139
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12401 and previous config saved to /var/cache/conftool/dbconfig/20200827-123028-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12400 and previous config saved to /var/cache/conftool/dbconfig/20200827-123003-marostegui.json
  • 12:24 marostegui: Fix password format for in db2129 (s6 codfw master) T243373
  • 12:14 kormat: enabling replication from db2129 to db1093 (s6) T243373
  • 12:13 jynus: restart db1095
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights T243373', diff saved to https://phabricator.wikimedia.org/P12399 and previous config saved to /var/cache/conftool/dbconfig/20200827-120816-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12398 and previous config saved to /var/cache/conftool/dbconfig/20200827-120211-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 eqiad weights T243373', diff saved to https://phabricator.wikimedia.org/P12397 and previous config saved to /var/cache/conftool/dbconfig/20200827-115934-marostegui.json
  • 11:56 Urbanecm: Lift range blocks exceeding wgBlockCIDRLimit via custom script from F32197596 (ruwiki, ruwikiquote; T243980)
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s4 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12396 and previous config saved to /var/cache/conftool/dbconfig/20200827-115110-marostegui.json
  • 11:49 moritzm: uploaded python3.4 3.4.2-1+deb8u7+wmf1 for jessie-wikimedia T259102
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12395 and previous config saved to /var/cache/conftool/dbconfig/20200827-114509-marostegui.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust db2126 weight T243373', diff saved to https://phabricator.wikimedia.org/P12394 and previous config saved to /var/cache/conftool/dbconfig/20200827-112213-marostegui.json
  • 11:12 Urbanecm: EU B&C done
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 34994d3: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki (T131300) (duration: 01m 03s)
  • 10:57 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:56 godog: bounce grafana to apply new settings
  • 10:51 kormat: enabling replication from db2123 to db1100 (s5) T243373
  • 10:48 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:30 kormat: enabling replication from es2023 to es1024 (es5) T243373
  • 10:28 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:23 kormat: enabling replication from es2021 to es1021 (es4) T243373
  • 10:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 10:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:03 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 09:54 moritzm: installing Java security updates on IDP* hosts
  • 09:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:43 elukey: decommissioning vms schema[12]00[12] (replaced previously by schema[12]00[34] buster vms)
  • 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:41 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:20 kormat: enabling replication from db2105 to db1123 (s3) T243373
  • 09:15 kormat: enabling replication from db2079 to db1109 (s8) T243373
  • 09:07 kormat: enabling replication from db2090 to db1081 (s4) T243373
  • 08:53 kormat: enabling replication from pc2009 to pc1009 (pc3) T243373
  • 08:44 kormat: enabling replication from pc2008 to pc1008 (pc2) T243373
  • 08:13 marostegui: Enable replication codfw -> eqiad on pc1 T243373
  • 08:01 gehel: manual cleanup of stale wdqs deploy crontab on wdqs1009
  • 07:35 marostegui: Move pc2010 under pc2007 T243373
  • 07:16 moritzm: installing ghostscript security updates on stretch
  • 06:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 06:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12392 and previous config saved to /var/cache/conftool/dbconfig/20200827-060652-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12391 and previous config saved to /var/cache/conftool/dbconfig/20200827-055815-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12390 and previous config saved to /var/cache/conftool/dbconfig/20200827-055522-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12389 and previous config saved to /var/cache/conftool/dbconfig/20200827-055126-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12388 and previous config saved to /var/cache/conftool/dbconfig/20200827-055104-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12387 and previous config saved to /var/cache/conftool/dbconfig/20200827-054259-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 db1085 db1078', diff saved to https://phabricator.wikimedia.org/P12386 and previous config saved to /var/cache/conftool/dbconfig/20200827-054114-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12385 and previous config saved to /var/cache/conftool/dbconfig/20200827-053814-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12384 and previous config saved to /var/cache/conftool/dbconfig/20200827-053558-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12383 and previous config saved to /var/cache/conftool/dbconfig/20200827-053509-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12382 and previous config saved to /var/cache/conftool/dbconfig/20200827-053100-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P12381 and previous config saved to /var/cache/conftool/dbconfig/20200827-052925-marostegui.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P12380 and previous config saved to /var/cache/conftool/dbconfig/20200827-052818-marostegui.json
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P12379 and previous config saved to /var/cache/conftool/dbconfig/20200827-052413-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12378 and previous config saved to /var/cache/conftool/dbconfig/20200827-051609-marostegui.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12377 and previous config saved to /var/cache/conftool/dbconfig/20200827-051546-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12376 and previous config saved to /var/cache/conftool/dbconfig/20200827-050754-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P12375 and previous config saved to /var/cache/conftool/dbconfig/20200827-050727-marostegui.json
  • 04:53 marostegui: Stop db1074 and db2107 in sync to fix drifts on s2 change_tag - T260042
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P12374 and previous config saved to /var/cache/conftool/dbconfig/20200827-045329-marostegui.json
  • 04:04 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1006.wikimedia.org
  • 04:03 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1005.wikimedia.org
  • 04:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cloudelastic1005.wikimedia.org
  • 02:03 mutante: shutting down install3001,install4001,install5001 VMs (no OS yet, but please also don't delete, debugging in progress, shutting them down until I continue on T254157)

2020-08-26

  • 23:35 eileen: civicrm revision changed from d2e80f7522 to 481ab742db, config revision is e2ab4d7c1f
  • 23:00 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:51 XioNoX: standardize pfw3-eqiad
  • 19:33 marxarelli: 1.36.0-wmf.6 promoted to group1 (T257974). logs show no new errors
  • 19:24 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.6 (duration: 01m 03s)
  • 19:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.6
  • 18:21 Urbanecm: Morning B&C done
  • 18:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 945b97c: Added import sources for mlwiktionary (T260716) (duration: 01m 05s)
  • 18:12 Urbanecm: Purge Thai and Greek taglines, URLs are at P12372 (T258552)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4009289: Update Thai and Greek taglines (T258552) (duration: 01m 03s)
  • 18:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 4009289: Update Thai and Greek taglines (T258552) (duration: 01m 05s)
  • 18:08 herron: upgraded eqiad elk v7 cluster from 7.8.0 to 7.9.0 T234854
  • 18:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client side error logging on hewiki (T255585) (duration: 01m 04s)
  • 17:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Documentation-only change; sync for line sanity (duration: 01m 04s)
  • 17:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T254349 Set wgVisualEditorEnableBetaFeature true on wikis that need it (duration: 01m 03s)
  • 15:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:41 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:11 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for MCR change', diff saved to https://phabricator.wikimedia.org/P12371 and previous config saved to /var/cache/conftool/dbconfig/20200826-145612-marostegui.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12370 and previous config saved to /var/cache/conftool/dbconfig/20200826-145531-marostegui.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12369 and previous config saved to /var/cache/conftool/dbconfig/20200826-144750-marostegui.json
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1002.eqiad.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1001.eqiad.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2001.codfw.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2002.codfw.wmnet
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12368 and previous config saved to /var/cache/conftool/dbconfig/20200826-143623-marostegui.json
  • 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2003.codfw.wmnet
  • 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2004.codfw.wmnet
  • 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1004.eqiad.wmnet
  • 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1003.eqiad.wmnet
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12367 and previous config saved to /var/cache/conftool/dbconfig/20200826-142746-marostegui.json
  • 14:25 jgleeson: updated civicrm from 0f195c6cca to d2e80f7522
  • 14:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:20 marostegui: Upgrade mysql on db1091 after MCR changes
  • 14:13 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:37 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 100% T261276', diff saved to https://phabricator.wikimedia.org/P12366 and previous config saved to /var/cache/conftool/dbconfig/20200826-133753-kormat.json
  • 13:18 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php dewiki --mark T205936 --revisions - < ~/T205936-dewiki-20050512070000.ids # marking known bad revisions for T205936
  • 13:17 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 75% T261276', diff saved to https://phabricator.wikimedia.org/P12365 and previous config saved to /var/cache/conftool/dbconfig/20200826-131732-kormat.json
  • 13:16 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php oswiki --mark T205936 --revisions - < ~/T205936-oswiki-20090309200000.ids # marking known bad revisions for T205936
  • 13:07 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 50% T261276', diff saved to https://phabricator.wikimedia.org/P12364 and previous config saved to /var/cache/conftool/dbconfig/20200826-130735-kormat.json
  • 13:06 vgutierrez: serve a synthetic warn page to DHE-RSA-AES128-SHA users - T258405
  • 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 30% T261276', diff saved to https://phabricator.wikimedia.org/P12363 and previous config saved to /var/cache/conftool/dbconfig/20200826-124700-kormat.json
  • 12:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 20% T261276', diff saved to https://phabricator.wikimedia.org/P12362 and previous config saved to /var/cache/conftool/dbconfig/20200826-122059-kormat.json
  • 12:12 godog: upgrade nagios-nrpe-server to 2.15-2 on jessie hosts - T261198
  • 11:58 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1110 T261276', diff saved to https://phabricator.wikimedia.org/P12361 and previous config saved to /var/cache/conftool/dbconfig/20200826-115850-kormat.json
  • 11:56 mlitn@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 00s)
  • 11:55 mlitn@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 08s)
  • 11:53 kart_: Finished manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)
  • 11:39 kart_: Started manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)
  • 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable propagateChangeVisibility for testwikidata, part 2 (duration: 01m 03s)
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable propagateChangeVisibility for testwikidata, part 1 (duration: 01m 19s)
  • 10:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 XioNoX: re-enable IPv6 BGP to Init7 in knams
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 replication broken', diff saved to https://phabricator.wikimedia.org/P12360 and previous config saved to /var/cache/conftool/dbconfig/20200826-084044-marostegui.json
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for MCR change', diff saved to https://phabricator.wikimedia.org/P12358 and previous config saved to /var/cache/conftool/dbconfig/20200826-054557-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12357 and previous config saved to /var/cache/conftool/dbconfig/20200826-054409-marostegui.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12356 and previous config saved to /var/cache/conftool/dbconfig/20200826-053345-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12355 and previous config saved to /var/cache/conftool/dbconfig/20200826-052355-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12354 and previous config saved to /var/cache/conftool/dbconfig/20200826-050849-marostegui.json
  • 05:03 marostegui: Update db1135 and db1114 after MCR changes

2020-08-25

  • 21:51 mutante: xhgui1001/xhgui2001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) (T260397)
  • 21:50 mutante: xhgui1001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) ...
  • 21:46 mutante: importing xhgui 0.12.0-2-wmf1 to buster-wikimedia APT repo (T260397)
  • 19:40 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import (duration: 00m 54s)
  • 19:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import
  • 19:15 marxarelli: 1.36.0-wmf.6 promoted to group0 (T257974). no new errors
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.6
  • 19:05 moritzm: installing Java security updates on cloudelastic* hosts
  • 19:02 moritzm: installing Java security updates on elastic* hosts
  • 18:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:58 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.6 (duration: 41m 58s)
  • 17:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import (duration: 01m 52s)
  • 17:28 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import
  • 17:17 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.6
  • 17:08 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.4 (duration: 01m 40s)
  • 17:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.3 (duration: 19m 12s)
  • 17:01 herron: imported logstash, elasticsearch, and kibana 7.9.0 -oss packages into buster-wikimedia thirdparty/elastic79
  • 16:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import (duration: 00m 49s)
  • 16:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import
  • 16:21 shdubsh: restart logstash on logstash1007 -- gc duration outlier
  • 16:08 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import (duration: 00m 54s)
  • 16:07 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import
  • 16:00 gehel: repool wdqs1005 - catched up on lag
  • 15:47 elukey: restart mariadb@analytics_meta on db1108 to apply a replication filter (exclude superset_staging database from replication)
  • 15:44 jgleeson: fundraising-tools updated from dcad0bfe75 to 3fe3a23114
  • 15:41 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import (duration: 01m 38s)
  • 15:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import
  • 15:22 liw: testing upcoming Scap release on beta
  • 14:56 moritzm: installing rake security updates on stretch
  • 14:56 moritzm: installing take security updates on stretch
  • 14:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 14:32 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 14:32 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 14:26 XioNoX: disable IPv6 BGP to Init7 in knams
  • 14:10 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538 (duration: 03m 50s)
  • 14:06 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for MCR change', diff saved to https://phabricator.wikimedia.org/P12347 and previous config saved to /var/cache/conftool/dbconfig/20200825-135248-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'fully repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12346 and previous config saved to /var/cache/conftool/dbconfig/20200825-134736-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12345 and previous config saved to /var/cache/conftool/dbconfig/20200825-133734-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12344 and previous config saved to /var/cache/conftool/dbconfig/20200825-132027-marostegui.json
  • 13:17 moritzm: installing firejail security updates on remaining mw* servers in eqiad
  • 12:56 godog: upgrade nagios-nrpe-server on scb2* and mwlog* - T261198
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12343 and previous config saved to /var/cache/conftool/dbconfig/20200825-125108-marostegui.json
  • 12:45 marostegui: Update MySQL on db1111 after MCR change
  • 12:39 marostegui: alter table sites on s6, directly on the primary master T260476
  • 12:39 godog: test nagios-nrpe-server with dh 2048 on scb2001 - T261198
  • 12:35 moritzm: imported ceph packages from stretch-backports to component/ceph T256877
  • 12:10 moritzm: installing ruby-json security updates
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 MCR change', diff saved to https://phabricator.wikimedia.org/P12341 and previous config saved to /var/cache/conftool/dbconfig/20200825-120708-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12340 and previous config saved to /var/cache/conftool/dbconfig/20200825-120211-marostegui.json
  • 11:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12339 and previous config saved to /var/cache/conftool/dbconfig/20200825-114938-marostegui.json
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12338 and previous config saved to /var/cache/conftool/dbconfig/20200825-113758-marostegui.json
  • 11:36 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12337 and previous config saved to /var/cache/conftool/dbconfig/20200825-112859-marostegui.json
  • 11:25 marostegui: Upgrade mysql on db1118 after MCR change
  • 11:16 Urbanecm: EU B&C done
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d869e30: Enable ContentTranslation as a default tool in Assamese and Burmese WPs (T258503; T258505) (duration: 01m 00s)
  • 10:59 moritzm: installing remaining libx11 security updates
  • 10:37 arturo: import all binary packages from tesseract-ocr-lang into stretch-wikimedia/component/tesseract-410-bpo (T247422)
  • 10:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:23 moritzm: removed fermium.wikimedia.org from debmonitor
  • 09:45 marostegui: Create missing table cx_notification_log on x1 wikishared T261190
  • 08:50 XioNoX: re-activate eqord peering/transit - T259593
  • 08:19 XioNoX: reconfigure eqord to be AS65020 - T259593
  • 08:18 XioNoX: deactivate eqord peering/transit - T259593
  • 07:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 07:13 marostegui: Upgrade MySQL on dbstore1004
  • 07:09 dcausse: depooling wdqs1005 (high lag)
  • 07:04 dcausse: restartint blazegraph on wdqs1005 (T242453)
  • 06:20 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111, db1118 for MCR change', diff saved to https://phabricator.wikimedia.org/P12336 and previous config saved to /var/cache/conftool/dbconfig/20200825-053856-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12335 and previous config saved to /var/cache/conftool/dbconfig/20200825-053801-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12334 and previous config saved to /var/cache/conftool/dbconfig/20200825-052602-marostegui.json
  • 05:21 moritzm: installing Java security updates on relforge*
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12333 and previous config saved to /var/cache/conftool/dbconfig/20200825-051327-marostegui.json
  • 05:11 marostegui: Remove revisions triggers from db2094:3311 T238966
  • 05:10 marostegui: Deploy MCR schema change on s1 codfw, this will create lag on s1 codfw - T238966
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12332 and previous config saved to /var/cache/conftool/dbconfig/20200825-050451-marostegui.json
  • 04:02 ejegg: updated fundraising python tools from 305f2a4438 to dcad0bfe75
  • 01:49 eileen: civicrm revision changed from ce28723709 to 0f195c6cca, config revision is 96839009f1
  • 01:39 eileen: civicrm revision is ce28723709, config revision is 96839009f1
  • 01:30 eileen: civicrm revision is ce28723709, config revision is 54c8c7abf2
  • 01:17 cdanis: repool esams
  • 01:11 cdanis: T259621 wrong junos version was staged on cr2-esams, abandoning this attempt and putting back in service
  • 01:07 cdanis: cdanis@re0.cr2-esams> request system software add validate re1 /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz
  • 00:56 cdanis: T259621 ❌cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 homer 'cr*' commit 'drain cr2-esams transport link'
  • 00:36 cdanis: T259621 cdanis@re1.cr3-esams> request chassis routing-engine master switch
  • 00:30 cdanis: T259621 cdanis@re1.cr3-esams> request vmhost reboot re0
  • 00:24 cdanis: T259621 cdanis@re1.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re0
  • 00:18 cdanis: T259621 cdanis@re0.cr3-esams> request chassis routing-engine master switch
  • 00:14 cdanis: T259621 cdanis@re0.cr3-esams> request vmhost reboot re1
  • 00:08 cdanis: T259621 cdanis@re0.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re1

2020-08-24

  • 23:46 cdanis: depool esams T259621
  • 23:16 Urbanecm: Evening B&C window done
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 778f710: Alternate configuration mechanism for Parsoid (T241961) (duration: 00m 58s)
  • 22:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:29 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deployed additional mitigations for T257687 (duration: 00m 58s)
  • 20:29 rzl: re-enabled puppet on 'R:File = /etc/nutcracker/nutcracker.yml' T261154
  • 19:25 rzl: disabling puppet on 'R:File = /etc/nutcracker/nutcracker.yml' to swap mc2028 out for mc2037 T261154
  • 18:10 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Increase weight of grants and research namespaces in metawiki search (duration: 00m 58s)
  • 15:20 jynus: shutdown backup2001 T260764
  • 15:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:08 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:04 vgutierrez: rolling restart of ats-tls to disable ECDHE-RSA-AES128-SHA - T258405
  • 14:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:55 rzl: switchover test complete, puppet re-enabled on cumin1001
  • 14:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:53 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:52 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:48 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:47 godog: powercycle ganeti5002 -- host down and nothing in console
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:43 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-24 14:43:35.570234
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:42 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
  • 14:42 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:42 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:41 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-24 14:41:55.754938
  • 14:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 14:41 dcausse: creating cirrus indices for lldwiki
  • 14:39 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:24 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:24 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:22 moritzm: installing libexif security updates on stretch
  • 14:18 rzl: disabling puppet on cumin1001 and starting a test of the DC switchover automation, expect some SAL noise but no production impact
  • 14:08 duesen: Deployed patch for T260485
  • 13:59 marostegui: Stop mysql on db1117:3325 to clone db1128 - T260324
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for MCR change', diff saved to https://phabricator.wikimedia.org/P12327 and previous config saved to /var/cache/conftool/dbconfig/20200824-135538-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318 after MCR change', diff saved to https://phabricator.wikimedia.org/P12326 and previous config saved to /var/cache/conftool/dbconfig/20200824-133032-marostegui.json
  • 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12325 and previous config saved to /var/cache/conftool/dbconfig/20200824-131305-marostegui.json
  • 13:05 moritzm: installing imagemagick security updates on stretch
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12323 and previous config saved to /var/cache/conftool/dbconfig/20200824-130024-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12322 and previous config saved to /var/cache/conftool/dbconfig/20200824-125131-marostegui.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for MCR change', diff saved to https://phabricator.wikimedia.org/P12321 and previous config saved to /var/cache/conftool/dbconfig/20200824-122848-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 after MCR change', diff saved to https://phabricator.wikimedia.org/P12320 and previous config saved to /var/cache/conftool/dbconfig/20200824-122752-marostegui.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12319 and previous config saved to /var/cache/conftool/dbconfig/20200824-122050-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12318 and previous config saved to /var/cache/conftool/dbconfig/20200824-121200-marostegui.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12317 and previous config saved to /var/cache/conftool/dbconfig/20200824-120310-marostegui.json
  • 12:01 Urbanecm: EU B&C window completed
  • 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8c380d6: Enable tewiki as import source for tewikibooks (T260107) (duration: 00m 57s)
  • 11:58 XioNoX: test advertise CF tunnel endpoint on cr1-eqiad - T259036
  • 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5a6d025: Add retrobibliothek.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T261012) (duration: 00m 56s)
  • 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e1ae39a: Enable mapframe at trwiki (T260594) (duration: 00m 58s)
  • 11:43 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: 1066ecb: Enable MediaSearch A/B test (T254388) (duration: 00m 56s)
  • 11:42 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/ContentTranslation/modules/publish/ext.cx.wikibase.link.js: 74a8718: Publish: Fix broken wikidata linking (T249458) (duration: 00m 58s)
  • 11:39 Urbanecm: Purge 13 URLs with purgeList.php, see P12316 for list of them (T260908; T258552; T261076; T261110)
  • 11:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 arturo: add liblept5 1.76.0-1~bpo9+1 (and leptonica-progs) to stretch-wikimedia/component/tesseract-410-bpo (T247422)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fe0449d: 74220d0: 7db8a19: Update Chinese wordmarks and taglines, update zhwikisource project logo (T260908; T258552; T261076; T261110) (duration: 00m 59s)
  • 11:29 urbanecm@deploy1001: Synchronized static/images/: fe0449d: 74220d0: 7db8a19: Update Chinese wordmarks and taglines, update zhwikisource project logo (T260908; T258552; T261076; T261110) (duration: 00m 58s)
  • 11:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:43 moritzm: installing ruby2.3 security updates
  • 10:12 moritzm: installing firejail security updates on mw canaries
  • 09:58 oblivian@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=appserver,service=canary
  • 09:46 XioNoX: add PNI to CF on cr1-eqiad with import/export NONE - T259036
  • 09:18 moritzm: restarting mw canaries to pick up libx11 update
  • 09:13 moritzm: installing libx11 security updates on stretch
  • 09:10 vgutierrez: repool cp5002
  • 09:08 _joe_: restarting php-fpm on mw1344 (stuck in SIGILL for new children)
  • 09:00 vgutierrez: restart ats-tls on cp5002
  • 08:54 moritzm: installing net-snmp security updates on buster
  • 08:52 ema: depool cp5002 due to icinga errors
  • 08:24 moritzm: installing json-c security updates on buster
  • 07:36 XioNoX: push new pfw policies - T261007
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1105:3311 for MCR change', diff saved to https://phabricator.wikimedia.org/P12315 and previous config saved to /var/cache/conftool/dbconfig/20200824-052916-marostegui.json

2020-08-23

  • 20:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 gehel: repool wdqs1006 - catched up on lag

2020-08-22

  • 19:33 ryankemper: depooled wdqs1006 (still has 2.5 hours to catch up on)
  • 19:31 ryankemper: pooled wdqs1006 now that lag has dissipated
  • 07:36 gehel: restart blazegraph on wdqs1006 + depool to catchup on lag
  • 05:24 legoktm: legoktm@mwmaint1002:~$ echo "https://releases.wikimedia.org/mediawiki/1.35/" | mwscript purgeList.php --wiki=aawiki

2020-08-21

  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:17 zpapierski@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification (duration: 00m 50s)
  • 16:16 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification
  • 16:15 zpapierski@deploy1001: deploy aborted: .. (duration: 00m 01s)
  • 16:15 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: ..
  • 13:25 jayme@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=termbox,name=codfw
  • 13:25 jayme@cumin1001: conftool action : set/pooled=False; selector: dnsdisc=termbox,name=codfw
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 09:02 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 01:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-08-20

  • 22:31 eileen: civicrm revision changed from 27d5900f7d to ce28723709, config revision is 706cf3c898
  • 22:20 eileen: civicrm revision is 27d5900f7d, config revision is 706cf3c898
  • 22:20 mutante: permanently shut down tungsten.eqiad.wmnet T260395 T158837 T180761 T224549
  • 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:35 ejegg: updated fundraising CiviCRM from 958a79f660 to 27d5900f7d
  • 20:53 cdanis: repool eqsin
  • 20:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:25 cdanis: cdanis@cr2-eqsin> request vmhost reboot
  • 20:17 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:13 cdanis: cdanis@cr2-eqsin> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-18.2R3-S5.3.tgz
  • 20:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:02 cdanis: depool eqsin for router upgrade
  • 19:57 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 19:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:24 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:17 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 19:17 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.5 refs T257973
  • 19:08 mutante: restarted apache on cont2001 for integration.wikimedia.org docroot change
  • 19:07 mutante: switching document root of integration.wikimedia.org to scap (T149924)
  • 19:02 twentyafterfour: 1.36.0-wmf.5 has no known blockers and logspam is cleaned up, time to roll group2 wikis to wmf.5
  • 18:42 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 18:42 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:19 mutante: ores1004 - starting failed celery-ores-worker
  • 18:18 mutante: testreduce1001 - rt_client and vd_client now properly stopped by puppet T257906
  • 17:29 shdubsh: restart elasticsearch on logstash1012 (not 1020) -- high gc runtimes
  • 17:28 shdubsh: restart elasticsearch on logstash1020 -- high gc runtimes
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:23 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 17:22 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 16:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:48 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:43 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:40 _joe_: restarted apache2 on icinga1001
  • 16:13 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:11 shdubsh: restart elasticsearch on logstash1011 -- long gc runs
  • 16:10 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:08 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:06 oblivian@deploy1001: Finished deploy [ores/deploy@8540eec]: various configuration fixes (duration: 09m 03s)
  • 13:57 oblivian@deploy1001: Started deploy [ores/deploy@8540eec]: various configuration fixes
  • 13:53 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:53 oblivian@deploy1001: Finished deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy T244843 (duration: 14m 00s)
  • 13:39 oblivian@deploy1001: Started deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy T244843
  • 13:26 oblivian@deploy1001: Finished deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy T244843 (take 2) (duration: 11m 37s)
  • 13:14 oblivian@deploy1001: Started deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy T244843 (take 2)
  • 13:11 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843 (duration: 11m 19s)
  • 13:09 gehel: repool wdqs1007 - catched up on lag
  • 13:00 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843
  • 12:51 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843 (duration: 07m 03s)
  • 12:44 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843
  • 11:49 Lucas_WMDE: EU backport window done
  • 11:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/AbuseFilter/includes/AbuseFilterHooks.php: d762e7b: Use $user param when filtering edits (T258717) (duration: 01m 05s)
  • 11:41 eileen: civicrm revision changed from 6c9441a18e to 958a79f660, config revision is 706cf3c898
  • 11:38 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/AbuseFilter/includes/AbuseFilterHooks.php: 00da39b: Use $user param when filtering edits (T258717) (duration: 01m 05s)
  • 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/Wikibase/client/data-bridge/dist/: Backport: Don't try to load source maps in production (T260852) (duration: 01m 07s)
  • 11:07 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix testwikidata depicts id & CirrusSearchUserTesting config (duration: 01m 06s)
  • 11:07 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=trwiki editor # T260899
  • 10:58 XioNoX: re-pool codfw - T259621
  • 10:53 XioNoX: un-drain cr1-codfw - T259621
  • 10:45 XioNoX: cr1-codfw> request chassis routing-engine master switch - T259621
  • 10:26 hashar: Restarted zuul-merger instances on contint1001 and contint2001
  • 10:24 hashar@deploy1001: Finished deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events (duration: 00m 24s)
  • 10:24 hashar@deploy1001: Started deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events
  • 10:21 XioNoX: cr1-codfw> request chassis routing-engine master switch - T259621
  • 10:12 XioNoX: reboot cr1-codfw:re1 (backup) for upgrade - T259621
  • 09:57 XioNoX: bump cr1-codfw OSPF metrics - T259621
  • 09:51 XioNoX: enable transit/peering and re-set normal OSPF values on cr2-codfw - T259621
  • 09:41 XioNoX: cr2-codfw> request chassis routing-engine master switch - T259621
  • 09:36 eileen: civicrm revision changed from cf9fadbeed to 6c9441a18e, config revision is 706cf3c898
  • 09:33 XioNoX: reboot cr2-codfw:re0 (backup) for upgrade - T259621
  • 09:18 XioNoX: cr2-codfw> request chassis routing-engine master switch - T259621
  • 09:18 kormat: stress-testing db2125 T260670
  • 09:08 XioNoX: reboot cr2-codfw:re1 (backup) for upgrade - T259621
  • 09:03 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2125 after host failure T260670', diff saved to https://phabricator.wikimedia.org/P12303 and previous config saved to /var/cache/conftool/dbconfig/20200820-090313-kormat.json
  • 08:52 kormat: removing /usr/bin/check_mariadb.py from all db hosts T259516
  • 08:52 XioNoX: disable transit/peering on cr2-codfw - T259621
  • 08:48 XioNoX: bump cr2-codfw OSPF metrics - T259621
  • 08:44 jynus: running analyze table on db1115's tendril.global_status_log, may case some stalls on tendril/dbtree T260876
  • 08:41 XioNoX: depool codfw for routers upgrade - T259621
  • 08:31 XioNoX: enable transit/peering on cr3-knams - T259621
  • 08:21 XioNoX: reboot cr3-knams for upgrade - T259621
  • 08:07 XioNoX: disable transit/peering on cr3-knams - T259621
  • 07:39 hashar: contint2001: restarted zuul
  • 07:29 hashar: contint1001: restarted zuul-merger
  • 07:29 hashar@deploy1001: Finished deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - T258630 (duration: 00m 13s)
  • 07:28 hashar@deploy1001: Started deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - T258630
  • 01:54 ejegg: re-enabled fundraising scheduled jobs
  • 00:51 mutante: ms-be1039 - started failed ferm service
  • 00:35 ejegg: stopped fundraising scheduled jobs
  • 00:27 eileen: civicrm revision changed from c442a09153 to cf9fadbeed, config revision is 3cdffd4fc2

2020-08-19

  • 23:20 Urbanecm: Evening B&C window closed
  • 23:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a808999: Enable VisualEditor in namespaces Draft and Wikiproject on hywiki (T260825) (duration: 01m 05s)
  • 22:41 eileen: civicrm revision changed from 34f95a3311 to c442a09153, config revision is 3cdffd4fc2
  • 21:27 eileen: civicrm revision changed from 154519cc1f to 34f95a3311, config revision is 3cdffd4fc2
  • 21:17 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 21:17 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167) (duration: 00m 06s)
  • 20:39 dpifke@deploy1001: Started deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167)
  • 19:43 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader2001 with debug logging
  • 19:20 mutante: testreduce1001 - re-enabled puppet, confirmed parsoid-rt service was now stopped properly by puppet while it runs as before on scandium, the previous parsoid-testing host. switching it over is now a Hiera one-liner. (T257906)
  • 19:15 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.5 refs T257973 (duration: 01m 04s)
  • 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.5 refs T257973
  • 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 60af096: Add autopatrolled group at arzwiki (T260761) (duration: 01m 04s)
  • 18:52 mutante: testreduce1001 - disable puppet; stop parsoid-rt service
  • 18:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 924a03b: Add clinton.presidentiallibraries.us to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T259927) (duration: 01m 04s)
  • 18:45 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 83b34e1: ClosedWikiProvider: Use testUserForCreation rather than testForAuthentication (T258695) (duration: 01m 04s)
  • 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d45f6: Dont index Draft (118) and Draft talk (119) on hywiki (T260804) (duration: 01m 04s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 04s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 06s)
  • 18:25 mutante: rebooting webperf1002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb4aa44: Configure namespaces on commons to include categories (T198716) (duration: 01m 04s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b904333: Update project wordmarks (T254788; sync 2/2) (duration: 01m 04s)
  • 18:19 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: b904333: Update project wordmarks (T254788; sync 1/2) (duration: 01m 06s)
  • 18:15 mutante: rebooting webperf2002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a6f8354: Enable $wgMFNoindexPages for all wikis (T255458) (duration: 01m 07s)
  • 18:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:38 mutante: decom'ing releases2001.codfw.wmnet (
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:37 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:41 rzl: finished exercising the switchdc cookbooks with --live-test for now, all changes reverted including re-enabling puppet on cumin1001
  • 15:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 15:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:31 jbond42: update java.security https://gerrit.wikimedia.org/r/c/operations/puppet/+/593467
  • 15:30 oblivian@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=api-rw
  • 15:26 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:26 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:22 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:22 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:18 godog: prometheus codfw lvextend --resizefs --size +80G /dev/mapper/vg--ssd-prometheus--ops
  • 15:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 15:17 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:16 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:16 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:14 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:50 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 14:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:50 rzl: running the switchdc cookbooks with --live-test, simulating a switch to eqiad where we're already running, no production impact is expected
  • 14:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:47 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:41 rzl: disable puppet on cumin1001 for switchdc testing
  • 14:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:27 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:38 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:34 gehel: depooling wdqs1007 and restarting blazegraph
  • 13:29 _joe_: depooling and disabling puppet on restbase1024 for further investigation
  • 13:27 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:25 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:03 _joe_: building and uploading fluent-bit, ratelimit images
  • 13:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 12:57 _joe_: building a new version of the base docker images
  • 11:29 awight: EU bacon finished
  • 11:28 effie: restart mwdebug* servers
  • 11:08 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Fix typos in flaggedrevs comments () (duration: 01m 19s)
  • 09:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:36 XioNoX: update firewall policies on pfw - T260585
  • 08:35 jayme: running puppet on A:all-mw-eqiad
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:20 godog: switch grafana.w.o to grafana 7 in codfw - T259143
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:14 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:06 jayme: running puppet on A:all-mw-eqiad
  • 07:46 godog: upgrade to grafana 7 on cloudmetrics hosts - T259143
  • 07:15 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 07:10 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:39 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:13 eileen: tools revision changed from b4ebd1e564 to 0b9d971bc4
  • 06:07 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:03 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:00 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:53 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 03:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:53 cstone: civicrm revision changed from f5469d0a4c to 154519cc1f
  • 02:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
  • 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend (T180761) (duration: 05m 13s)
  • 00:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster

2020-08-18

  • 23:45 catrope@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 05s)
  • 23:44 catrope@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 06s)
  • 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
  • 23:34 Urbanecm: Run scap pull at mw1301
  • 23:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable static maps on testwiki, disable them on test2wiki (duration: 03m 22s)
  • 23:32 mutante: rebooting mw1301 via mgmt
  • 23:22 mutante: killed reboot-cluster on cumin1001
  • 23:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ac34f72: Enable subpages in NS:0 in techconductwiki (T260350) (duration: 05m 14s)
  • 23:04 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 22:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 22:09 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:07 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:06 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:37 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:24 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:27 hashar: https://releases-jenkins.wikimedia.org/ changed agent from releases1001 to releases1002
  • 20:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.5 refs T257973
  • 20:11 mutante: running puppet on cp-ats-ulsfo and switching releases-jenkins backend
  • 20:07 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.5 refs T257973 (duration: 53m 12s)
  • 20:00 mutante: releases1001 rm /etc/rsync.d/frag* & run puppet
  • 19:54 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002/2002 with --delete T256164
  • 19:47 ejegg: updated payments-wiki from a7ee1790e0 to ef7ebd08cb
  • 19:44 hashar: Deleting old jobs from https://releases-jenkins.wikimedia.org/ # T256164
  • 19:41 hashar: releases1001: deleting old legacy mediawiki snapshots under /var/lib/jenkins/{REL1_27,REL1_29,REL1_30} # T256164
  • 19:14 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.5 refs T257973
  • 19:13 twentyafterfour: Promote testwikis from 1.36.0-wmf.4 to 1.36.0-wmf.5 refs T257973
  • 17:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:12 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw14(09|11|13).*
  • 16:03 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 15:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 15:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:02 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:56 papaul: replacing msw-c1,c2 and c4
  • 14:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P12293 and previous config saved to /var/cache/conftool/dbconfig/20200818-145337-marostegui.json
  • 14:48 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(55|64|65).*
  • 14:46 XioNoX: move v4 HE on cr3-ulsfo from peering to transit bgp group
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12292 and previous config saved to /var/cache/conftool/dbconfig/20200818-144415-marostegui.json
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12291 and previous config saved to /var/cache/conftool/dbconfig/20200818-143758-marostegui.json
  • 14:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12290 and previous config saved to /var/cache/conftool/dbconfig/20200818-142937-marostegui.json
  • 14:28 marostegui: Stop MYSQL on db2125 for on-site maintenance - T260670
  • 13:54 marostegui: Revoke DELETE and CREATE from xhgui user on m2 T260640
  • 13:53 XioNoX: bump Zayo v4 BGP session in eqiad
  • 13:49 XioNoX: move v4 HE on cr2-eqord from peering to transit bgp group
  • 13:37 XioNoX: move v4 cr1-eqiad from peering to transit bgp group
  • 13:04 kormat: disabling puppet on all db machines T259516
  • 12:57 _joe_: rebooting appservers in eqiad, 3 at a time
  • 12:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:34 kormat: deploying wmfmariadbpy 0.4
  • 12:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:53 XioNoX: add new icinga hosts to mr policies - T260533
  • 11:40 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:36 Lucas_WMDE: EU backport&config done
  • 11:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wikisource wordmark for trwikisource (T260658), part 2 (duration: 00m 55s)
  • 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/mobile/copyright/wikisource-wordmark-tr.svg' | mwscript purgeList.php # T260658
  • 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wikisource-wordmark-tr.svg: Config: Add Wikisource wordmark for trwikisource (T260658), part 1 (duration: 00m 55s)
  • 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Catalan Wikipedia (T232584) (duration: 01m 01s)
  • 11:06 jbond42: deploy net-snmp update to buster
  • 10:56 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw229.*
  • 10:55 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 10:54 marostegui: Reboot db2125 after running a full upgrade - T260670
  • 10:46 marostegui: Powercycle db2125 from the idrac T260670
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - host down T260670', diff saved to https://phabricator.wikimedia.org/P12288 and previous config saved to /var/cache/conftool/dbconfig/20200818-100718-marostegui.json
  • 09:45 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:43 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 09:40 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[234].*
  • 09:40 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 09:35 kart_: Update cxserver to 2020-08-17-090424-production (T259980)
  • 09:32 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:29 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:28 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:28 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[02].*
  • 09:26 volans: upgraded spicerack to v0.0.39 on cumin hosts
  • 09:25 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:21 volans: uploaded spicerack_0.0.39-1+deb10u1 to apt.wikimedia.org buster-wikimedia
  • 09:05 hashar: Restarting CI Jenkins
  • 08:44 vgutierrez: restart ats-tls on cp5006
  • 08:24 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 08:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:16 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:10 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P12284 and previous config saved to /var/cache/conftool/dbconfig/20200818-080256-marostegui.json
  • 07:58 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:45 godog: VictorOps ack'd incidents will re-trigger after 24h if not resolved - T259465
  • 07:44 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12283 and previous config saved to /var/cache/conftool/dbconfig/20200818-074325-marostegui.json
  • 07:42 _joe_: performing rolling reboot of all codfw api servers
  • 07:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12282 and previous config saved to /var/cache/conftool/dbconfig/20200818-072349-marostegui.json
  • 07:19 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw213[5-9].codfw.wmnet
  • 07:16 jynus: update rest of phabricator passwords T250361
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12281 and previous config saved to /var/cache/conftool/dbconfig/20200818-071121-marostegui.json
  • 07:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:07 godog: prometheus eqiad: add 100G to prometheus/global
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:01 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:53 twentyafterfour: phabricator maintenance successful
  • 06:48 jynus: deploy another password change to phabricator service (potentially disruptive) T250361
  • 06:41 XioNoX: add cloudflare PNI IPs in eqiad - T259036
  • 06:21 jynus: deploy password change to phabricator service T146055
  • 06:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:52 _joe_: running puppet on mc1020 T260622
  • 05:02 twentyafterfour: phabricator appears to be fully functional
  • 05:01 twentyafterfour: phabricator read-only ended
  • 05:00 twentyafterfour: phabricator is now read-only
  • 05:00 marostegui: Failover m3 (phabricator) database master from db1128 to db1132 - T259589
  • 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P12279 and previous config saved to /var/cache/conftool/dbconfig/20200818-043241-marostegui.json
  • 01:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
  • 01:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1343.eqiad.wmnet
  • 01:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1344.eqiad.wmnet
  • 01:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:48 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1341.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1339.eqiad.wmnet
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:15 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1315.eqiad.wmnet
  • 00:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)

2020-08-17

  • 23:59 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:41 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet
  • 23:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:30 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet
  • 23:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:25 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:11 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet
  • 23:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet
  • 22:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 22:37 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet
  • 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:26 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
  • 22:25 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:09 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
  • 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
  • 22:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:57 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary (T259360)
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:53 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add api-gateway.request stream config T259736, one host timed out (duration: 00m 55s)
  • 21:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 ppchelko@deploy1001: sync-file aborted: Add api-gateway.request stream config T259736 (duration: 05m 01s)
  • 21:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
  • 21:46 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet
  • 21:42 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 21:38 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for T257687 (duration: 00m 57s)
  • 21:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:34 effie: blocking temporarily traffic to mc1020
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet
  • 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
  • 21:08 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:30 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:28 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:22 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:01 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3 (duration: 02m 57s)
  • 18:58 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3
  • 18:58 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2 (duration: 11m 19s)
  • 18:46 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2
  • 18:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002 (duration: 131m 17s)
  • 18:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:43 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 808c17d: Change logo for lldwiki to match the requested one (T259432) (duration: 00m 56s)
  • 18:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: 67e8f88: Add logo files for lldwiki (T259432) (duration: 00m 56s)
  • 17:17 cdanis@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.*
  • 17:06 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 17:04 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw,name=mw2246.codfw.wmnet
  • 17:01 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 16:36 jynus: restart backup2001, backup1001 one after the other
  • 16:35 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002
  • 16:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 16:27 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - remove unneeded override for SearchSatisfaction - T259163 (duration: 00m 56s)
  • 16:22 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: cluster=jobrunner,dc=codfw,name=mw2250.codfw.wmnet
  • 16:20 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw
  • 16:20 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:14 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1359.*
  • 16:12 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 16:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out (duration: 01m 31s)
  • 15:43 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out
  • 15:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out (duration: 20m 40s)
  • 15:36 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:30 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*-codfw*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:22 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out
  • 15:22 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054 (duration: 02m 30s)
  • 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054
  • 15:08 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:06 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:04 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis (take 2) - T254606 (duration: 00m 53s)
  • 14:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis - T254606 (duration: 00m 55s)
  • 14:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - group0 - T254606 (duration: 00m 56s)
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12277 and previous config saved to /var/cache/conftool/dbconfig/20200817-141449-marostegui.json
  • 14:09 marostegui: Sanitize thankyouwiki on db1124:3315, db2094:3315 - T260551
  • 14:03 marostegui: Sanitize lldwiki on db1124:3315 and db2094:3315 T259436
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12276 and previous config saved to /var/cache/conftool/dbconfig/20200817-140229-marostegui.json
  • 13:58 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259432)
  • 13:54 Urbanecm: Creating thankyouwiki and lldwiki is done
  • 13:54 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 52s)
  • 13:54 Urbanecm: Create account Pcoombe (WMF) at thankyouwiki, email set to pcoombe@wikimedia.org (T259002)
  • 13:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:49 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating thankyouwiki (T259002)
  • 13:48 urbanecm@deploy1001: Synchronized dblists: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:47 marostegui: Deploy MCR change on db1104
  • 13:47 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating thankyouwiki (T259002) (duration: 00m 56s)
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for MCR change', diff saved to https://phabricator.wikimedia.org/P12275 and previous config saved to /var/cache/conftool/dbconfig/20200817-134701-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12274 and previous config saved to /var/cache/conftool/dbconfig/20200817-134619-marostegui.json
  • 13:46 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12273 and previous config saved to /var/cache/conftool/dbconfig/20200817-134604-marostegui.json
  • 13:41 jayme: imported td-agent-bit_1.5.3-0 to buster-wikimedia - T260536
  • 13:40 jayme: imported !log imported to buster-wikimedia
  • 13:39 marostegui: Upgrade db1088 (s6) to a newer mysql version (10.4.14)
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for mysql upgrade', diff saved to https://phabricator.wikimedia.org/P12272 and previous config saved to /var/cache/conftool/dbconfig/20200817-133905-marostegui.json
  • 13:34 jbond42: deploy json-c security update to buster
  • 13:33 marostegui: Restart mysql on db2102 (testing new package)
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12271 and previous config saved to /var/cache/conftool/dbconfig/20200817-133043-marostegui.json
  • 13:29 urbanecm@deploy1001: Synchronized langlist: Creating lldwiki (T259432) (duration: 00m 54s)
  • 13:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:27 urbanecm@deploy1001: sync-file aborted: Creating lldwiki (T259432)¨ (duration: 00m 00s)
  • 13:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating lldwiki (T259432) (duration: 00m 53s)
  • 13:25 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lldwiki (T259432)
  • 13:23 urbanecm@deploy1001: Synchronized dblists: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:22 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:20 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12270 and previous config saved to /var/cache/conftool/dbconfig/20200817-131307-marostegui.json
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:09 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12269 and previous config saved to /var/cache/conftool/dbconfig/20200817-130127-marostegui.json
  • 12:58 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depoool db1089 for MCR change', diff saved to https://phabricator.wikimedia.org/P12268 and previous config saved to /var/cache/conftool/dbconfig/20200817-124458-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12267 and previous config saved to /var/cache/conftool/dbconfig/20200817-124409-marostegui.json
  • 12:44 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:27 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12266 and previous config saved to /var/cache/conftool/dbconfig/20200817-122234-marostegui.json
  • 12:21 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12265 and previous config saved to /var/cache/conftool/dbconfig/20200817-121600-marostegui.json
  • 12:05 Lucas_WMDE: EU backport window done
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki --fix | tee T259429-fix
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki | tee T259429-dryrun
  • 12:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set Portal and Portal_talk namespaces in bjnwiki as an extra namespace. (T259429) (duration: 00m 55s)
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12264 and previous config saved to /var/cache/conftool/dbconfig/20200817-115741-marostegui.json
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 2 (duration: 00m 57s)
  • 11:53 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/mobile/copyright/wiktionary-wordmark-es.svg\n' | mwscript purgeList.php # T254059
  • 11:53 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wiktionary-wordmark-es.svg: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 1 (duration: 00m 56s)
  • 11:46 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki%s.png\n' '-1.5x' '-2x' | mwscript purgeList.php # T259006
  • 11:45 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: Change the logo of lzh Wikipedia (T259006) (duration: 00m 55s)
  • 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons for Turkish Wikiquote, Turkish Wiktionary, Turkish Wikisource and Turkish Wikibooks (T260493) (duration: 00m 55s)
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons (T260492) (duration: 00m 57s)
  • 11:25 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:14 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] configure mediasearch A/B test (duration: 01m 08s)
  • 11:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:54 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:51 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:49 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:36 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:30 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:14 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:55 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:45 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:42 jynus: updating compiler facts for cloud puppet compiler project to include new host dbprov2003
  • 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:22 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:18 _joe_: running a full apt-get upgrade on mw1379-1380
  • 09:18 _joe_: re-upgrading imagemagick on mw1378
  • 09:16 _joe_: upgrading packages on mw1377
  • 09:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:05 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: forcing a puppet run on all mw-api servers in eqiad - T260329
  • 07:52 _joe_: repooling mw1382
  • 07:37 _joe_: running the same test on mw1382 T260329
  • 07:34 _joe_: repooling mw1381
  • 07:15 _joe_: running the same test on mw1381 T260329
  • 07:15 _joe_: repooled mw1281
  • 06:26 _joe_: stop testing on mw1281, T260329
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:28 marostegui: Stop mysql on db1099:3311, db1099:3318 for reimage
  • 05:28 _joe_: depooling mw1281 for testing for T260329
  • 05:25 marostegui: Deploy schema change on db1139:3311
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311, db1099:3318 for reimage and MCR change', diff saved to https://phabricator.wikimedia.org/P12263 and previous config saved to /var/cache/conftool/dbconfig/20200817-052147-marostegui.json

2020-08-16

  • 11:12 gehel: repooling wdqs1004 - catched up on lag

2020-08-15

  • 21:18 gehel: depooling wdqs1004 and restarting services, will wait to catch up on lag before repooling

2020-08-14

  • 19:41 effie: restart mwdebug1002
  • 16:58 cdanis: done deploying 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' to all routers T260449
  • 16:44 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-esams*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:39 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr1-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:36 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 02:41 eileen: tools revision changed from 9a89f45974 to b4ebd1e564

2020-08-13

  • 23:39 tzatziki: removing 3 files for legal compliance
  • 22:03 mutante: switching xhgui from tungsten to xhgui1001 - ran puppet on webperf*001 - T180761 T158837
  • 21:54 andrew@deploy1001: Finished deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388 (duration: 03m 53s)
  • 21:50 andrew@deploy1001: Started deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388
  • 21:11 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002 and then all other releases* servers. 57GB, overwriting existing data from manual config (T247652)
  • 20:53 kormat: dropping xhgui.xhgui on m2
  • 19:35 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/DiscussionTools: Revert new reply API (again) T259855 (duration: 00m 57s)
  • 18:06 herron: restarted ES on logstash1010
  • 18:05 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Enabling new XHGui backend (T180761) (duration: 00m 56s)
  • 17:16 hnowlan: deployed ATS and varnish rules to route api.wikimedia.org
  • 16:26 hnowlan: created api.wikimedia.org
  • 15:49 hnowlan: moving api-gateway service to state production. critical set to false
  • 15:41 herron: restart ES on logstash1012
  • 14:56 fdans@deploy1001: Finished deploy [analytics/refinery@ba1a439]: Regular analytics weekly train (duration: 11m 34s)
  • 14:45 ema: repool mw1382 with kernel memory accounting disabled T260281
  • 14:45 fdans@deploy1001: Started deploy [analytics/refinery@ba1a439]: Regular analytics weekly train
  • 14:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:40 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:38 ema: reboot mw1382 with kernel memory accounting disabled T260281
  • 14:34 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:34 _joe_: rebooting mw1381 with a newer kernel, mw1383 as control with the old kernel T260329
  • 14:33 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:31 _joe_: installing kernel 4.19.0-0.bpo.9 on mw1381 T260329
  • 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:00 elukey: create schema[12]00[34] in ganeti - T260347
  • 13:59 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:58 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:45 hnowlan: moving api-gateway service to monitoring_setup
  • 13:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:44 hashar: Gracefully restarting Zuul
  • 13:39 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:10 _joe_: forcing a puppet run on the api appservers in eqiad T260329
  • 13:07 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: revert enabling of lilypond (again) T257091 T260329 (duration: 00m 59s)
  • 11:24 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 hnowlan: restarting pybal on lvs2010 T254908
  • 11:09 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:06 hnowlan: restarting pybal on lvs2009 T254908
  • 11:05 hnowlan: restarting pybal on lvs1016 T254908
  • 11:05 jayme: depool mw1380 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 11:05 hnowlan: restarting pybal on lvs1015 T254908
  • 11:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 hnowlan: Moving api-gateway service to from service_setup to lvs_setup and running puppet on LVS servers
  • 10:17 jayme: depool mw1379 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 10:04 XioNoX: re-order OSPF interfaces on all routers (now partially netbox driven)
  • 09:37 ayounsi@deploy1001: Finished deploy [homer/deploy@89636df]: Homer release v0.2.5 (duration: 03m 03s)
  • 09:34 ayounsi@deploy1001: Started deploy [homer/deploy@89636df]: Homer release v0.2.5
  • 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12247 and previous config saved to /var/cache/conftool/dbconfig/20200813-085547-marostegui.json
  • 08:45 _joe_: downgrading imagemagick on mw1378 T260329
  • 08:43 _joe_: downgrading imagemagick on mw1378 T260281
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:55 _joe_: downgrading curl/libcurl3/libcurl3-gnutls on mw1377 T260329
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12246 and previous config saved to /var/cache/conftool/dbconfig/20200813-074528-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12244 and previous config saved to /var/cache/conftool/dbconfig/20200813-071943-marostegui.json
  • 07:16 marostegui: Stop replication on db1082 to remove triggers on sanitarium for MCR changs
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12243 and previous config saved to /var/cache/conftool/dbconfig/20200813-071545-marostegui.json
  • 06:48 marostegui: Deploy MCR change on dbstore1003:3311
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12242 and previous config saved to /var/cache/conftool/dbconfig/20200813-060135-marostegui.json
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:43 marostegui: Stop MySQL on db2135 (codfw master), haproxy irc alert will fire T260324
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12241 and previous config saved to /var/cache/conftool/dbconfig/20200813-052859-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12240 and previous config saved to /var/cache/conftool/dbconfig/20200813-051222-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12239 and previous config saved to /var/cache/conftool/dbconfig/20200813-050107-marostegui.json
  • 02:56 mutante: testreduce1001 - systemctl reset-failed ; fix parsoid-vd systemd state and icinga alert
  • 00:37 mutante: removing jenkins_service_running checks from secondary servers where it's stopped, manually from icinga config, running puppet on icinga
  • 00:14 mutante: re-enabling puppet on releases* servers

2020-08-12

  • 23:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:37 wkandek: reboot mw1372
  • 23:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:32 wkandek: reboot mw1373
  • 23:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:31 wkandek: reboot mw1371
  • 23:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:28 wkandek: reboot mw1384
  • 23:27 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:27 wkandek: reboot mw1385
  • 23:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:24 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:22 wkandek: reboot mw1370
  • 23:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:19 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:18 wkandek: reboot mw1369
  • 23:18 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:17 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:17 wkandek: reboot mw1387
  • 23:16 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:16 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:16 wkandek: reboot mw1389
  • 23:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:14 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:09 wkandek: reboot mw1368
  • 23:09 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:08 wkandek: reboot me1367
  • 23:08 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:07 wkandek: reboot mw1391
  • 23:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:05 ejegg: updated Fundraising CiviCRM from 72452e28a9 to f5469d0a4c
  • 23:05 wkandek: reboot mw1393
  • 23:04 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:04 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:01 wkandek: reboot mw1395
  • 23:01 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:53 wkandek: reboot mw1397
  • 22:53 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek: reboot mw1366
  • 22:52 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:52 wkandek: reboot me1365
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:51 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 wkandek: reboot mw1399
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:46 wkandek: reboot mw1364
  • 22:46 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:42 wkandek: reboot mw1401
  • 22:42 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:41 wkandek: reboot mw1355
  • 22:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:38 wkandek: reboot mw1354
  • 22:38 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:36 wkandek: reboot mw1396
  • 22:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:35 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:32 wkandek: reboot mw1353
  • 22:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:31 wkandek: reboot mw1352
  • 22:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:29 wkandek: reboot mw1348
  • 22:29 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:28 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:26 wkandek: reboot 1347
  • 22:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:23 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:22 wkandek: reboot mw1350
  • 22:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:21 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:20 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:19 wkandek: reboot mw1346
  • 22:19 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:18 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:14 wkandek: reboot mw1345
  • 22:13 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:12 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:12 wkandek: reboot mw1349
  • 22:12 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:11 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:08 wkandek: reboot mw1333
  • 22:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:07 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
  • 22:03 wkandek: reboot mw1344
  • 22:03 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek: reboot mw1343
  • 22:02 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:00 wkandek: reboot mw1332
  • 22:00 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:56 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:55 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:53 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:50 wkandek: reboot mw1331
  • 21:50 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 wkandek: reboot mw1342
  • 21:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:46 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:46 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 21:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:39 wkandek: reboot mw1341
  • 21:39 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:37 wkandek@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 21:37 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:36 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:33 wkandek: reboot mw1329
  • 21:33 wkandek: reboot mw1328
  • 21:32 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:29 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 ejegg: updated payments-wiki from 77ff5d70fc to a7ee1790e0
  • 21:25 wkandek: reboot mw1340
  • 21:25 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:21 wkandek: reboot mw1339
  • 21:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:20 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:15 wkandek: reboot mw1327
  • 21:15 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:13 wkandek: reboot mw1326
  • 21:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:11 wkandek: reboot mw1317
  • 21:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:10 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:05 wkandek: reboot mw1316
  • 21:04 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:03 wkandek: reboot mw1325
  • 21:03 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:02 wkandek: reboot mw1324
  • 21:02 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:02 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 wkandek: reboot mw1315
  • 21:01 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:57 wkandek: reboot mw1323
  • 20:57 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:52 wkandek: reboot mw1322
  • 20:52 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:51 wkandek: reboot mw1314
  • 20:51 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:50 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:50 wkandek: reboot mw1313
  • 20:50 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:44 wkandek: reboot mw1312
  • 20:44 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:43 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:43 wkandek: reboot mw1321
  • 20:42 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:41 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:40 wkandek: reboot mw1297
  • 20:40 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:39 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:39 wkandek: reboot mw1320
  • 20:39 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:34 wkandek: reboot mw1290
  • 20:34 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 wkandek: reboot mw1319
  • 20:33 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:32 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:29 wkandek: reboot mw1275
  • 20:29 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:26 wkandek: reboot mw1289
  • 20:25 wkandek: reboot mw1288
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:23 wkandek: reboot mw1274
  • 20:23 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:20 wkandek: reboot mw1273
  • 20:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:13 wkandek: reboot mw1287
  • 20:13 wkandek: reboot mw1286
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 wkandek: reboot mw1272
  • 20:11 wkandek: reboot mw1271
  • 19:41 hashar: Upgrading Jenkins on contint2001 (primary)
  • 19:25 hashar: contint1001: sudo systemctl mask jenkins # spare server
  • 19:25 mutante: all releases* servers except 1001 - disable puppet; stop jenkins, mask jenkins
  • 19:22 mutante: releases1002 - stopped and masked jenkins service
  • 19:22 mutante: releases2001 - stopped and masked jenkins service
  • 19:20 mutante: upgrading jenkins on releases*001
  • 19:19 hashar: Upgrading Jenkins on contint1001 (spare)
  • 19:16 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.4
  • 19:13 mutante: uploade new jenkins version to APT repo; upgrading jenkins on releases1002/2002
  • 19:08 effie: pool mw1396
  • 19:06 effie: repool mw1395 mw1397 mw1399
  • 18:56 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in client (duration: 02m 13s)
  • 18:47 wkandek: reboot mw1270
  • 18:47 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:45 wkandek: reboot mw1269
  • 18:41 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:38 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:25 wkandek: reboot mw1268
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:17 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on hewiki (T255020) (duration: 01m 03s)
  • 18:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:04 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo (duration: 01m 06s)
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:56 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:51 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:49 effie: reboot mw1265 mw1282 mw1283
  • 17:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:45 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:37 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:19 effie: reboot mw1263 mw1264 mw1279 and mw1281
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:16 cdanis: for posterity: mw1359 has a bunch of special packages installed (previously recorded in SAL) and also has `sudo memleak-bpfcc -o 60000 -z 31 -Z 33 30` running in a tmux in an attempt to understand what's causing the page fragmentation in the appserver fleet
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:00 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:57 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Additional mitigations for T257687 (duration: 01m 03s)
  • 16:53 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:48 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:35 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:32 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:31 effie: reboot mw1277 mw1278 && mw1261 mw1262
  • 16:29 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 16:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:04 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: I3726a6364d, T257079 (duration: 01m 02s)
  • 15:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:52 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:50 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:48 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:48 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:37 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:32 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:26 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:15 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:12 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install linux-headers-4.9.0-12-amd64
  • 15:10 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install python3-netaddr ieee-data
  • 15:09 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo dpkg -i bpfcc-tools_0.12.0-2_all.deb libbpfcc_0.12.0-2_amd64.deb python3-bpfcc_0.12.0-2_all.deb
  • 15:08 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:54 cdanis: again un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:53 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 14:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:44 cdanis: temporarily re-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:35 cdanis: un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:32 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:31 cdanis: temporarily kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:02 kormat: uploaded wmfmariadbpy 0.3 to apt
  • 13:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:42 effie: restart mw1383 & mw1386
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:27 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.4 (duration: 01m 16s)
  • 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.4
  • 13:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:19 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:15 cdanis: ✔️ cdanis@mw1357.eqiad.wmnet ~ 🕘☕ sudo sysctl -w vm/compact_memory=1
  • 13:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:59 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:50 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:33 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:27 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:51 ema: pool mw1363 after reboot
  • 11:49 jynus: creating artificial low replication lag on db2130 to test icinga alerts T253120
  • 11:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:37 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:28 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:21 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:13 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:08 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:07 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:00 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:00 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:55 _joe_: rebooting mw1361
  • 10:51 jayme: rebooting mw1356
  • 10:49 _joe_: rebooting mw1378
  • 09:45 _joe_: repooling mw1377
  • 09:40 _joe_: rebooting mw1377
  • 09:22 _joe_: depool mw1357 tool
  • 09:14 _joe_: depooling mw1377 for inspection
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1110', diff saved to https://phabricator.wikimedia.org/P12220 and previous config saved to /var/cache/conftool/dbconfig/20200812-091211-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12219 and previous config saved to /var/cache/conftool/dbconfig/20200812-090831-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12218 and previous config saved to /var/cache/conftool/dbconfig/20200812-085021-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12217 and previous config saved to /var/cache/conftool/dbconfig/20200812-083548-marostegui.json
  • 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for reimage', diff saved to https://phabricator.wikimedia.org/P12215 and previous config saved to /var/cache/conftool/dbconfig/20200812-073130-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for MCR change', diff saved to https://phabricator.wikimedia.org/P12214 and previous config saved to /var/cache/conftool/dbconfig/20200812-045157-marostegui.json

2020-08-11

  • 23:41 Urbanecm: Evening B&C window completed
  • 23:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0f238f7: Update wgMFRemovableClasses (T231160) (duration: 01m 03s)
  • 23:36 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/MobileFrontend/extension.json: c22d65f: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 03s)
  • 23:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/MobileFrontend/extension.json: 81d54b0: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 05s)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 28faa27: Switching to updated license definition (duration: 01m 04s)
  • 21:52 krinkle@deploy1001: Synchronized php-1.36.0-wmf.3/includes/skins/SkinMustache.php: Ibe1f07346, T259872, T259858 (duration: 01m 04s)
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add streams for eventgate-main - T251935 (duration: 01m 04s)
  • 19:21 ejegg: updated payments-wiki from f199c071c3 to 77ff5d70fc
  • 18:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:48 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant investigate right to checkuser group on frwiki (T260171) (duration: 01m 04s)
  • 18:18 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Beta-only: Configured additional settings for API Portal beta wiki gerrit:619339 (duration: 01m 03s)
  • 18:05 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Direct GrowthExperiments help panel questions to mentors on cswiki (T250235) (duration: 01m 03s)
  • 17:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Remove extraneous mediawiki.api-request stream - T251935 (duration: 01m 01s)
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:38 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:25 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:53 hashar@deploy1001: Synchronized php-1.36.0-wmf.4/skins/MinervaNeue/: Revert "ServiceWiring: Avoid usage of deprecated Title::getSubjectPage()" - T260155 (duration: 01m 06s)
  • 16:12 herron: migrating lists.wikimedia.org services from fermium to lists1001 T224586
  • 15:36 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.4
  • 15:27 hashar@deploy1001: Finished scap: (no justification provided) (duration: 30m 51s)
  • 14:59 marostegui: Deploy MCR change on db1116:3318
  • 14:56 hashar@deploy1001: Started scap: (no justification provided)
  • 14:56 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.2 (duration: 04m 15s)
  • 14:55 jayme: updated helmfile to 0.125.2-1 on contint* and deploy*
  • 14:52 otto@deploy1001: Finished deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935 (duration: 01m 14s)
  • 14:51 otto@deploy1001: Started deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935
  • 14:50 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.1 (duration: 02m 07s)
  • 14:48 jayme: imported helmfile_0.125.2-1 to buster-wikimedia, jessie-wikimedia, stretch-wikimedia
  • 14:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.41 (duration: 04m 20s)
  • 14:40 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.40 (duration: 10m 24s)
  • 14:37 papaul: replacing msw-b5,b6,b7 and b8
  • 14:30 hashar: Cleaning old MediaWiki versions that were never removed
  • 14:27 hashar@deploy1001: sync aborted: testwikis wikis to 1.36.0-wmf.4 (duration: 72m 36s)
  • 14:10 hashar: mw1319: scap pull
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:14 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.4
  • 13:12 hashar: Applied 1.36.0-wmf.4 security patches # T257972
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:52 kormat: uploaded wmfmariadbpy 0.2 packages to apt1001
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:54 marostegui: Install new MariaDB 10.4.14 on db2102
  • 11:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:18 Urbanecm: EU B&C window done
  • 11:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 619255|Enable ContentTranslation in Sundanese WP as a default tool (T258502) (duration: 00m 59s)
  • 10:39 volans: migrating *all* eqiad mgmt DNS records to the autogenerated ones via Netbox - T233183
  • 10:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0)
  • 10:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh
  • 10:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 09:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 09:29 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:25 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:11 marostegui: Rename tables on muswiki and mhwiktionary on s3 master (db1123) without replication T260112
  • 09:01 volans: renewed puppet certificate on scb1001.eqiad.wmnet
  • 08:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e6ec237: Revert "Turn muswiki and mhwiktionary to read-only" (T259004) (duration: 00m 58s)
  • 08:45 urbanecm@deploy1001: Synchronized dblists/: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 3/3) (duration: 00m 58s)
  • 08:44 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 2/3) (duration: 00m 58s)
  • 08:43 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 1/3) (duration: 01m 02s)
  • 08:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a04bc1f: Turn muswiki and mhwiktionary to read-only (T259004) (duration: 01m 01s)
  • 08:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:54 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:45 XioNoX: Re-prioritize peering over transit eqiad/esams - T259614
  • 01:59 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: enabling fast stale mode T250248 (duration: 00m 58s)
  • 00:33 dpifke@deploy1001: Finished deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167 (duration: 01m 03s)
  • 00:31 dpifke@deploy1001: Started deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167
  • 00:24 mutante: reverting switch of releases.wikimedia.org for today since releases-jenkins.wikimedia.org is tied to it and new jenkins still needs some config and plugins (T247652)
  • 00:08 mutante: releases-jenkins.wikimedia.org currently under maintenance (T247652)

2020-08-10

  • 23:56 eileen: tools revision changed from 22550f38c5 to 9a89f45974
  • 23:53 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced. httpbb tests have been created and pass. (T247652)
  • 23:52 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced of course.
  • 20:13 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/c/integration/config/+/619359/
  • 20:10 ejegg: updated payments-wiki from 932aacde54 to f199c071c3
  • 18:32 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@3e12dbb]: 0.3.44 (duration: 15m 18s)
  • 18:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 ryankemper@deploy1001: Started deploy [wdqs/wdqs@3e12dbb]: 0.3.44
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Investigate on frwiki (T257891) (duration: 00m 58s)
  • 18:07 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Explicitly disable nativeGallery in Parsoid settings (no-op) (duration: 00m 58s)
  • 18:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump the weight of near match for search (T257922) (duration: 00m 59s)
  • 17:56 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:49 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-analytics streams - T251935 (duration: 01m 02s)
  • 17:46 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:14 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:04 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:01 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:55 XioNoX: Re-prioritize peering over transit - codfw - T259614
  • 12:34 XioNoX: Re-prioritize peering over transit - eqsin - T259614
  • 12:07 XioNoX: standardize cr1-eqiad interfaces
  • 11:56 Urbanecm: EU B&C window done
  • 11:55 Urbanecm: Run `mwscript namespaceDupes.php --wiki=tiwiki --fix` at mwmaint1002 (T259295)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 14b2897: Define Portal namespace for tiwiki (T259295) (duration: 00m 59s)
  • 11:49 urbanecm@deploy1001: Synchronized static/images/project-logos/: bbbf701: Regenerate Bengali Wikipedia logo from source SVG (T259292) (duration: 00m 59s)
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0d8366f: Search Work NS by default at bnwikisource (T258982) (duration: 00m 59s)
  • 11:37 Urbanecm: Run `mwscript namespaceDupes.php --wiki=hywiki --fix` at mwmaint1002 (T259987)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1771487: add two extra namespaces for hywiki (T259987) (duration: 00m 59s)
  • 11:28 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/shnwiktionary*.png with purgeList.php (T260010)
  • 11:27 XioNoX: standardize cr2-eqiad interfaces
  • 11:27 urbanecm@deploy1001: Synchronized static/images/project-logos/: c5c96ca: Regenerate shnwiktionary logo from source svg (T260010) (duration: 00m 58s)
  • 11:21 XioNoX: repool ulsfo
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a15e3a2: Increase autoconfirmed threshold for Chinese Wikinews to 7 days and 20 edits at least (T259869) (duration: 00m 58s)
  • 11:13 XioNoX: Re-prioritize peering over transit - ulsfo - T259614
  • 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ba0b2ab: Create TemplateEditor group on zhwiki (T260012) (duration: 00m 58s)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix --add-prefix=T259959 (T259959)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix (T259959)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959) (duration: 00m 58s)
  • 11:06 urbanecm@deploy1001: sync-file aborted: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959¨) (duration: 00m 00s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 01s)
  • 10:42 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:37 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:36 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.pool (exit_code=99)
  • 10:36 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:29 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:23 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 10:19 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:18 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:14 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 hashar: Updated containeer for Jenkins job operations-dns-lint-docker https://gerrit.wikimedia.org/r/619267
  • 09:55 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/619266
  • 09:54 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 09:49 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 09:21 marostegui: Promote dbproxy1019 back T255408
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:43 marostegui: Remove revision triggers from db2094:3318 T238966
  • 06:42 marostegui: Stop replication on s8 codfw master to deploy MCR change, this will generate lag on s8 codfw T238966
  • 04:46 marostegui: Depool dbproxy1019 for reimage T255408

2020-08-09

  • 21:58 ejegg: updated payments-wiki from cd012f37f1 to 932aacde54
  • 03:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2020-08-08

  • 02:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 02:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload

2020-08-07

  • 16:42 jforrester@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/DiscussionTools/: T259855 Revert new reply API (duration: 01m 06s)
  • 15:01 volans: import DNS names for network devices in Netbox - T258729
  • 13:27 godog: bounce pybal on lvs1016 and then lvs1015 to reset state, logstash1025 reported down but actually up
  • 10:27 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:27 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey: reboot deneb via ganeti2021 (hostname config pointing to recdns for some reason)
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P12195 and previous config saved to /var/cache/conftool/dbconfig/20200807-091527-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12194 and previous config saved to /var/cache/conftool/dbconfig/20200807-084747-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12193 and previous config saved to /var/cache/conftool/dbconfig/20200807-080719-marostegui.json
  • 07:50 godog: prometheus codfw lvextend --resize --size +60G /dev/mapper/vg--hdd-prometheus--global
  • 07:49 godog: prometheus codfw lvextend --resize --size +30G /dev/mapper/vg--ssd-prometheus--k8s
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12192 and previous config saved to /var/cache/conftool/dbconfig/20200807-074658-marostegui.json
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for upgrade', diff saved to https://phabricator.wikimedia.org/P12191 and previous config saved to /var/cache/conftool/dbconfig/20200807-063431-marostegui.json

2020-08-06

  • 23:21 catrope@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/: Fixes for WelcomeSurvey language question (T232410) (duration: 00m 59s)
  • 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change GrowthExperiments mentor list on fawiki (T253291) (duration: 00m 59s)
  • 21:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:33 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/vendor: Update git submodules (vendor) (T259832) (duration: 01m 08s)
  • 21:32 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:47 shdubsh: restart logstash -- pipeline appears stuck
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 brennen: manually updating the vendor submodule on 1.36.0 for T259832
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix another typo in eventgate stream config - T251935 (duration: 00m 58s)
  • 19:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix typo in eventgate stream config - T251935 (duration: 00m 59s)
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.3
  • 18:58 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:57 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:21 Urbanecm: Morning B&C window was completed
  • 18:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/modules/: fb4a808: Fix "Ask mentor" help panel button styling (T250235) (duration: 01m 07s)
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9db9659: Remove temporary logging for mediamoderation (T259742) (duration: 01m 07s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9695811: : Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") (T259574) (duration: 01m 06s)
  • 17:42 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 06s)
  • 17:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 17:37 brennen: train 1.36.0-wmf.3: proceeding to group1
  • 17:36 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/WikibaseMediaInfo/src/View/MediaInfoEntityTermsView.php: Backport: Fix array unpacking as argument list (T259745) (duration: 01m 07s)
  • 16:32 chrisalbon@deploy1001: Finished deploy [ores/deploy@f3c44be]: T258435 (duration: 14m 12s)
  • 16:18 dpifke@deploy1001: Finished deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167 (duration: 00m 05s)
  • 16:18 dpifke@deploy1001: Started deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167
  • 16:18 chrisalbon@deploy1001: Started deploy [ores/deploy@f3c44be]: T258435
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:10 fdans@deploy1001: Finished deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3 (duration: 20m 01s)
  • 14:50 fdans@deploy1001: Started deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3
  • 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-* test.event streams - T251935 (duration: 01m 08s)
  • 13:32 jayme: updated helm to 2.16.9-2 on contint*, deploy* and chartmuseum*
  • 13:24 jayme: imported helm_2.16.9-2 and tiller_2.16.9-2 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:06 kart_: Updated cxserver to 2020-08-05-070016-production (T258919, T199523, T257943, T256194)
  • 12:03 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:59 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:57 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:54 Lucas_WMDE: EU backport window done
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Flow/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 09s)
  • 11:53 XioNoX: reboot cr2-eqord - T259621
  • 11:37 XioNoX: drain traffic away cr2-eqord - T259621
  • 11:27 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Wikibase/lib/: Backport: Fix CachingFallbackLabelDescriptionLookup failing in edge-cases (T259744) (duration: 01m 10s)
  • 11:22 XioNoX: reboot cr2-eqdfw - T259621
  • 11:13 XioNoX: drain traffic away cr2-eqdfw - T259621
  • 10:52 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:48 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:45 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:23 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:16 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:12 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:11 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1127', diff saved to https://phabricator.wikimedia.org/P12188 and previous config saved to /var/cache/conftool/dbconfig/20200806-084406-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12187 and previous config saved to /var/cache/conftool/dbconfig/20200806-083743-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12186 and previous config saved to /var/cache/conftool/dbconfig/20200806-083033-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12185 and previous config saved to /var/cache/conftool/dbconfig/20200806-081416-marostegui.json
  • 07:03 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:57 marostegui: Truncate tables on zerowiki T227717
  • 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:47 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:43 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:37 elukey: roll restart of druid clusters' zookeeper and an-conf* zookeeper for openjdk-11 upgrades
  • 06:36 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for MCR', diff saved to https://phabricator.wikimedia.org/P12184 and previous config saved to /var/cache/conftool/dbconfig/20200806-050743-marostegui.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P12182 and previous config saved to /var/cache/conftool/dbconfig/20200806-045622-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12181 and previous config saved to /var/cache/conftool/dbconfig/20200806-045107-marostegui.json
  • 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12180 and previous config saved to /var/cache/conftool/dbconfig/20200806-044608-marostegui.json
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12179 and previous config saved to /var/cache/conftool/dbconfig/20200806-043758-marostegui.json
  • 03:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
  • 02:24 eileen: process-control config revision is 525eb71235 turn off delete deleted contacts
  • 01:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:35 mutante: wtp2019 - reimaging - parsoid service does not work, unlike on all other wtp*, making sure it's clean
  • 00:00 mutante: LDAP - removed demon from nda group

2020-08-05

  • 23:57 eileen: civicrm revision changed from 150c3476c4 to 72452e28a9, config revision is b6ece03513
  • 23:02 shdubsh: logstash in codfw looks stuck -- restarting
  • 19:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.2
  • 19:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:13 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 44s)
  • 19:11 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 18:26 Lucas_WMDE: Morning backport window done
  • 18:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/ContentTranslation/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 11s)
  • 18:14 mutante: test !log
  • 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable growth study quick survey (T257015) (duration: 01m 12s)
  • 17:30 shdubsh: test prometheus-icinga-exporter upgrade on icinga2001
  • 16:50 elukey: powercycle stat1005 after GPU issue
  • 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-logging-external streams and destination_event_service settings - T251935 (duration: 01m 05s)
  • 15:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 godog: bounce logstash on logstash100[789] - udp loss reported
  • 15:05 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:48 elukey: reboot stat1008 for unexpected maintenance (GPU stuck)
  • 14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:32 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:25 moritzm: installing nmap bugfix updates from buster point release
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 moritzm: installing pillow security updates
  • 14:03 moritzm: installing node-minimist security updates
  • 13:51 moritzm: installing Linux update to 4.9.132 from buster point update (no reboots, just the package updates)
  • 13:32 jayme: updated helmfile to 0.125.2-0 and helm-diff to 3.1.2-1 on contint* and deploy*
  • 13:28 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:24 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:04 elukey: restart yarn resource managers on an-master100[12] to pick up new Yarn settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/618529
  • 13:00 moritzm: installing libjpeg-turbo security updates on stretch
  • 12:52 XioNoX: netmon1002:/srv/deployment/librenms/librenms$ sudo -u librenms ./lnms migrate
  • 12:49 jayme: imported helm-diff_3.1.2-1 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:46 moritzm: installing imagemagick security updates on buster
  • 12:33 moritzm: installing net-snmp security updates on icinga hosts
  • 11:36 awight: EU Bacon reclosed
  • 11:36 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Switch test wikis to new version of vector by default (3/3) (T254227) (duration: 01m 07s)
  • 11:29 awight: EU Bacon reopened
  • 11:28 awight: EU Bacon complete
  • 11:26 awight@deploy1001: Synchronized wmf-config: Config: FileImporter: full default deployment (T232542) (duration: 01m 04s)
  • 11:23 jayme: imported helm-diff_3.1.2-0 to jessie-wikimedia and stretch-wikimedia
  • 11:22 jayme: imported helm-diff_3.1.2-0 to buster-wikimedia
  • 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add import sources for lijwikisource (T259633) (duration: 01m 07s)
  • 11:13 awight@deploy1001: sync-file aborted: Config: Add import sources for lijwikisource (T259633) (duration: 00m 13s)
  • 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Test Wikidata clients (T232584) (duration: 01m 20s)
  • 10:39 XioNoX: reboot cr3-ulsfo - T259621
  • 10:28 XioNoX: drain traffic away cr3-ulsfo - T259621
  • 10:21 moritzm: installing libssh security updates
  • 10:18 XioNoX: reboot cr4-ulsfo - T259621
  • 09:58 XioNoX: drain traffic away cr4-ulsfo
  • 09:53 XioNoX: depool ulsfo - T259621
  • 09:32 elukey: set ticket max renewable lifetime to 7d on all kerberos clients (was zero, the default)
  • 09:07 jayme: imported helmfile_0.125.2-0 to jessie-wikimedia
  • 09:07 jayme: imported helmfile_0.125.2-0 to stretch-wikimedia
  • 09:05 jayme: imported helmfile_0.125.2-0 to buster-wikimedia
  • 08:39 marostegui: Remove revision triggers on db1125:3317
  • 08:39 marostegui: Stop replication on db1079 for MCR, this will generate lag on s7 on labsdb
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for MCR', diff saved to https://phabricator.wikimedia.org/P12173 and previous config saved to /var/cache/conftool/dbconfig/20200805-083916-marostegui.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P12172 and previous config saved to /var/cache/conftool/dbconfig/20200805-083833-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12171 and previous config saved to /var/cache/conftool/dbconfig/20200805-082908-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12170 and previous config saved to /var/cache/conftool/dbconfig/20200805-082138-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12169 and previous config saved to /var/cache/conftool/dbconfig/20200805-081237-marostegui.json
  • 07:49 marostegui: Stop mysql on db1117:3323 (this will generate haproxy irc alerts) T259589
  • 07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:26 moritzm: installing perl security updates on buster
  • 07:20 moritzm: installing libexif security updates on buster
  • 07:14 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:13 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 05:53 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for MCR', diff saved to https://phabricator.wikimedia.org/P12167 and previous config saved to /var/cache/conftool/dbconfig/20200805-050907-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P12166 and previous config saved to /var/cache/conftool/dbconfig/20200805-050808-marostegui.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12165 and previous config saved to /var/cache/conftool/dbconfig/20200805-050308-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12164 and previous config saved to /var/cache/conftool/dbconfig/20200805-045334-marostegui.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12163 and previous config saved to /var/cache/conftool/dbconfig/20200805-043346-marostegui.json

2020-08-04

  • 22:41 brennen: restarting php7.2-fpm on mw1404 for opcache issues
  • 21:45 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:52 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:27 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch (duration: 02m 22s)
  • 20:25 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch
  • 20:15 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances (duration: 02m 07s)
  • 20:12 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances
  • 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.3
  • 19:11 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.3 (duration: 91m 03s)
  • 19:03 brennen: current 1.36.0-wmf.3 train status (T257971): mid scap-cdb-rebuild for testwiki sync; will proceed with group0 when finished.
  • 18:55 sukhe: upload pdns-recursor_4.3.3-1~deb10u1 to apt.wm.o (buster) - T252132
  • 18:49 mutante: letting puppet install envoy on all ores1* hosts
  • 18:46 mutante: letting puppet install envoy on all ores2* hosts
  • 18:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:19 mutante: temp disabling puppet on all ores hosts to add envoy
  • 17:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.3
  • 17:36 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:05 brennen: 1.36.0-wmf.3 was branched at 2d0cf09cdf for T257971
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:24 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Set default topic_prefixes - T255888 (duration: 00m 58s)
  • 15:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:18 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove now unused wgEventServiceStreamConfig - T229863 (duration: 00m 58s)
  • 15:18 moritzm: installing jackson-databind security issues
  • 15:08 moritzm: installing qemu security updates on cloudvirt* Stretch hosts
  • 14:54 cmjohnson1: swapping kubernetes1010 network cable T257542
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:41 cmjohnson1: powercycling analytics1050 T258370
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for MCR', diff saved to https://phabricator.wikimedia.org/P12161 and previous config saved to /var/cache/conftool/dbconfig/20200804-143524-marostegui.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12160 and previous config saved to /var/cache/conftool/dbconfig/20200804-142710-marostegui.json
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12159 and previous config saved to /var/cache/conftool/dbconfig/20200804-142220-marostegui.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12158 and previous config saved to /var/cache/conftool/dbconfig/20200804-141556-marostegui.json
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12157 and previous config saved to /var/cache/conftool/dbconfig/20200804-141004-marostegui.json
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:51 hashar: Install newer openjdk on contint2001 and restarting CI Jenkins
  • 12:00 jayme: helm was updated: 2.16.7-2 -> 2.16.9-1 on chartmuseum*, contint*, deploy*
  • 11:43 Lucas_WMDE: EU backport window done
  • 11:41 marostegui: Deploy schema change on s3 codfw master, lag might show up on codfw s3 T259238
  • 11:37 moritzm: installing openjdk-11 security updates
  • 11:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Load WikibaseRepo using extension registration in production (T257433) (duration: 00m 58s)
  • 11:12 Lucas_WMDE: Deployed patch for T86738 / T259565
  • 11:03 moritzm: installing e2fsprogs security updates for stretch
  • 10:47 moritzm: installing tomcat8 security updates
  • 10:47 vgutierrez: upgrade acme-chief to version 0.28
  • 10:33 vgutierrez: upload acme-chief 0.28 to apt.wm.o (buster) - T259338
  • 10:18 moritzm: installing imagemagick security updates on stretch
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for MCR and PK change T259524', diff saved to https://phabricator.wikimedia.org/P12156 and previous config saved to /var/cache/conftool/dbconfig/20200804-100035-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12155 and previous config saved to /var/cache/conftool/dbconfig/20200804-095608-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12154 and previous config saved to /var/cache/conftool/dbconfig/20200804-094909-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 moritzm: installing python3.5 security updates
  • 08:15 moritzm: installing remaining cups security updates
  • 08:13 XioNoX: cleaning up a bunch of prefix limit reached issues
  • 08:00 marostegui: Failover m2 from db1132 to db1107 -T257540
  • 07:54 moritzm: installing poppler security updates on stretch
  • 07:43 jayme: imported helm_2.16.9-1 to jessie-wikimedia
  • 07:43 jayme: imported helm_2.16.9-1 to stretch-wikimedia
  • 07:38 jayme: imported helm_2.16.9-1 to buster-wikimedia
  • 07:34 elukey: upgrade druid analytics (backend for Turnilo/Superset/etc..) to 0.19
  • 07:32 XioNoX: remove nonstop-bridging from fasw-c-eqiad switches - T191667
  • 07:29 XioNoX: remove nonstop-bridging from eqiad asw2 switches - T191667
  • 07:28 XioNoX: remove nonstop-bridging from asw2-esams - T191667
  • 07:27 marostegui: Start topology changes on m2 - T257540
  • 07:25 moritzm: installing rails security updates
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P12153 and previous config saved to /var/cache/conftool/dbconfig/20200804-064223-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12152 and previous config saved to /var/cache/conftool/dbconfig/20200804-063026-marostegui.json
  • 06:27 _joe_: restarting docker daemon on kubestage1002, seems like a case of https://github.com/moby/moby/issues/29635
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore original weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12151 and previous config saved to /var/cache/conftool/dbconfig/20200804-062358-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12150 and previous config saved to /var/cache/conftool/dbconfig/20200804-062256-marostegui.json
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 06:13 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling lilypond execution in safe mode 3rd attempt (duration: 00m 58s)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12149 and previous config saved to /var/cache/conftool/dbconfig/20200804-061255-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12148 and previous config saved to /var/cache/conftool/dbconfig/20200804-061209-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for MCR', diff saved to https://phabricator.wikimedia.org/P12147 and previous config saved to /var/cache/conftool/dbconfig/20200804-061003-marostegui.json
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for reimage', diff saved to https://phabricator.wikimedia.org/P12146 and previous config saved to /var/cache/conftool/dbconfig/20200804-051843-marostegui.json
  • 05:04 marostegui: Reboot db1107 to pick up the last kernel
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12145 and previous config saved to /var/cache/conftool/dbconfig/20200804-050150-marostegui.json
  • 03:56 legoktm: added Arlo to wmf-deployment Gerrit group
  • 03:53 legoktm: added subbu to wmf-deployment Gerrit group

2020-08-03

  • 23:43 mutante: mwdebug1001 - temp installing apt-file for debugging an issue on mwmaint
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on fawiki (T253291) (duration: 00m 59s)
  • 21:35 sbassett: Deployed mitigations for T115888
  • 21:14 sbassett@deploy1001: Synchronized php-1.36.0-wmf.2/resources/src/mediawiki.jqueryMsg/mediawiki.jqueryMsg.js: (no justification provided) (duration: 01m 00s)
  • 18:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:09 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update (duration: 15m 53s)
  • 17:53 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update
  • 17:33 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 17:28 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: (no justification provided) (duration: 00m 35s)
  • 17:28 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: (no justification provided)
  • 16:58 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.36.0-wmf.1"
  • 16:21 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 15:55 _joe_: regenerating the TLS certs for blubberoid
  • 15:33 XioNoX: standardize all routers routing-options config
  • 15:27 marostegui: Change PK on frwiktionary.revision on db2087:3317, db2129, db2121 db2086:3317 T259524
  • 15:16 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P12143 and previous config saved to /var/cache/conftool/dbconfig/20200803-145111-marostegui.json
  • 14:40 moritzm: update Buster netboot images to Buster 10.5 T259519
  • 14:33 XioNoX: disable all ALGs from pfw3-codfw
  • 14:28 XioNoX: remove IGMP and PIM from pfw3-codfw security zones
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into dump and depool db1106', diff saved to https://phabricator.wikimedia.org/P12142 and previous config saved to /var/cache/conftool/dbconfig/20200803-142749-marostegui.json
  • 14:27 XioNoX: remove nonstop-bridging from fasw-c-codfw - T191667
  • 14:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 filippo@deploy1001: Finished deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017 (duration: 00m 23s)
  • 14:03 filippo@deploy1001: Started deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017
  • 14:00 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'enable-puppet "cdanis deploying I92e9a05"'
  • 13:56 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'disable-puppet "cdanis deploying I92e9a05"'
  • 13:27 moritzm: installing libopenmpt security updates
  • 13:15 XioNoX: remove nonstop-bridging from asw-d-codfw - T191667
  • 13:14 XioNoX: remove nonstop-bridging from asw-c-codfw - T191667
  • 13:12 XioNoX: remove nonstop-bridging from asw-b-codfw - T191667
  • 13:11 XioNoX: remove nonstop-bridging from asw-a-codfw - T191667
  • 13:05 moritzm: installing json-c security updates
  • 12:53 XioNoX: move VRRP master to cr3-eqsin
  • 12:32 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 12:26 moritzm: installing apache-log4j1.2 security updates
  • 12:20 moritzm: restarting nginx on francium to pick up luajit update
  • 12:13 kormat: disabling puppet on cumin hosts T259021
  • 11:55 moritzm: installing luajit security updates
  • 11:20 moritzm: installing ruby-rack security updates
  • 11:19 Urbanecm: EU B&C done
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 346138d: Add extra namespaces for yuewiktionary (T258913) (duration: 01m 06s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8c2a2b2: Add gpophotoeng.gov.il to the wgCopyUploadsDomains allowlist for commonswiki (T258857) (duration: 01m 07s)
  • 11:03 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: ead6b9e: New throttle rule for Czech editathon (T259352) (duration: 01m 06s)
  • 11:03 moritzm: installing ruby2.5 security updates
  • 11:01 moritzm: removing cloudcephmon100[1-3].wikimedia.org from debmonitor (these eventually got re-installed as cloudcephmon100[1-3].eqiad.wmnet)
  • 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 06s)
  • 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 08s)
  • 10:29 moritzm: installing NSS security updates on buster
  • 10:26 moritzm: restarting Apache on puppetboard to pick up curl security updates
  • 10:19 moritzm: restarting wtp1025 (parsoid canary) to pick up curl security updates
  • 09:46 moritzm: restarting mw1261-mw1265 to pick up curl security updates
  • 09:42 moritzm: installing curl security updates on stretch
  • 08:59 moritzm: installing ffmpeg security updates on jobrunners/video scalers (3.2.15 rebuilt with VP9/row-mt patches)
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12141 and previous config saved to /var/cache/conftool/dbconfig/20200803-082641-marostegui.json
  • 08:25 moritzm: installing qemu security updates on stretch
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12140 and previous config saved to /var/cache/conftool/dbconfig/20200803-082533-marostegui.json
  • 08:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify s5 wikis T259437 (duration: 01m 05s)
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify s5 wikis T259437 (duration: 01m 40s)
  • 08:07 elukey: roll restart aqs on aqs* to pick up new druid settings
  • 07:10 marostegui: Remove revision triggers from db2095:3317 for MCR changes T238966
  • 07:09 marostegui: Deploy MCR change on s7 codfw, lag will appear on codfw T238966
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12139 and previous config saved to /var/cache/conftool/dbconfig/20200803-070702-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12138 and previous config saved to /var/cache/conftool/dbconfig/20200803-052715-marostegui.json
  • 05:04 marostegui: Remove db1108:3321 and db1108:3322 from tendril and add db1108:3351 and db1108:3352 T254462
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12137 and previous config saved to /var/cache/conftool/dbconfig/20200803-050148-marostegui.json

2020-08-01

  • 16:30 Amir1: wikiadmin@10.64.32.197(avkwiki)> delete from site_identifiers; (T259122)
  • 16:27 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259122)

Archives

See Server Admin Log/Archives.