Server Admin Log/Archive 42

From Wikitech

2020-11-30

  • 23:12 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:08 mutante: parse2001 - sudo -i /usr/local/sbin/restart-php7.2-fpm
  • 23:08 mutante: sudo -i /usr/local/sbin/restart-php7.2-fpm
  • 22:45 razzi@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:42 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove 1.34 from $wgExtDistSnapshotRefs T268931 (duration: 00m 57s)
  • 22:34 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:21 cdanis@deploy1001: Synchronized docroot/thankyou: Also serve apple-app-site-assoc file from /.well-known/ T259312 bc52d1481 (duration: 00m 57s)
  • 22:15 razzi@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:14 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:14 mutante: parse2001 - systemctl restart ferm - had to restart ferm after reimaging (though there weren't any alerts about that) but it fixed running httpbb tests on it (T268524)
  • 22:13 ejegg: extended and re-synchronized timing of thank you mail sender and donation queue consumer
  • 21:51 mutante: parse2001 - scap pull
  • 21:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
  • 21:45 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:38 razzi@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:47 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:42 mutante: reimaging deploy2002 with buster (not active, deploy1001/2001 are) T265963
  • 20:39 mutante: reimaging parse2001 (parsoid canary) with buster (T268524)
  • 20:36 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
  • 20:33 mutante: depooling parse2001 to prepare for reimage T268524
  • 20:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
  • 20:28 mutante: reimaging deploy1002 with buster - not the active deployment server, deploy1001 still is (T265963)
  • 20:10 ariel@deploy1001: Finished deploy [dumps/dumps@2f4d931]: per job batches for page content. step one. (duration: 00m 04s)
  • 20:10 ariel@deploy1001: Started deploy [dumps/dumps@2f4d931]: per job batches for page content. step one.
  • 19:52 papaul: power down ms-be2059 for RAID re-configuration
  • 19:47 mutante: added Sukhbir to Ops vendor maintenance calendar permissions to make changes and share like all of SRE (T229860)
  • 19:23 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:644236 Decrease OAuth token expiration (duration: 00m 56s)
  • 19:17 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:644243 group2: switch ParserCache to JSON (duration: 00m 58s)
  • 19:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:47 joal@deploy1001: Finished deploy [analytics/refinery@9db742d] (thin): Analytics special deploy before first of month - Hotfix -- THIN [analytics/refinery@9db742d] (duration: 00m 08s)
  • 17:47 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:47 joal@deploy1001: Started deploy [analytics/refinery@9db742d] (thin): Analytics special deploy before first of month - Hotfix -- THIN [analytics/refinery@9db742d]
  • 17:43 joal@deploy1001: Finished deploy [analytics/refinery@9db742d]: Analytics special deploy before first of month - Hotfix [analytics/refinery@9db742d] (duration: 11m 32s)
  • 17:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:31 joal@deploy1001: Started deploy [analytics/refinery@9db742d]: Analytics special deploy before first of month - Hotfix [analytics/refinery@9db742d]
  • 17:07 moritzm: reset failed (now obsolete idp-u2f-sync/stunnel4 services on idp1001
  • 16:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1008.eqiad.wmnet
  • 16:24 volans: uploaded spicerack_0.0.45 to apt.wikimedia.org buster-wikimedia
  • 16:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@b46380d]: oozie: Repoint hive to analytics-hive.eqiad.wmnet (duration: 01m 15s)
  • 16:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@b46380d]: oozie: Repoint hive to analytics-hive.eqiad.wmnet
  • 15:43 moritzm: installing tomcat8 security updates
  • 15:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1007.eqiad.wmnet
  • 15:34 ema: cp3054: upgrade varnish to 6.0.7-1wm1 T268736 T264398
  • 15:28 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 2 Anti-Harassment schemas to EventGate on all wikis - T268517 (duration: 00m 56s)
  • 15:15 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 2 Anti-Harassment schemas to EventGate on testwiki - T268517 (duration: 01m 16s)
  • 14:55 joal@deploy1001: Finished deploy [analytics/refinery@72ac883] (thin): Analytics special deploy before first of month -- THIN [analytics/refinery@72ac883] (duration: 00m 08s)
  • 14:55 joal@deploy1001: Started deploy [analytics/refinery@72ac883] (thin): Analytics special deploy before first of month -- THIN [analytics/refinery@72ac883]
  • 14:55 joal@deploy1001: Finished deploy [analytics/refinery@72ac883]: Analytics special deploy before first of month [analytics/refinery@72ac883] (duration: 09m 26s)
  • 14:45 joal@deploy1001: Started deploy [analytics/refinery@72ac883]: Analytics special deploy before first of month [analytics/refinery@72ac883]
  • 14:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1006.eqiad.wmnet
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13481 and previous config saved to /var/cache/conftool/dbconfig/20201130-143232-root.json
  • 14:23 marostegui: Deploy schema change on s3 codfw, lag will show up on s3 codfw T268004
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P13480 and previous config saved to /var/cache/conftool/dbconfig/20201130-141953-marostegui.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13479 and previous config saved to /var/cache/conftool/dbconfig/20201130-141729-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P13478 and previous config saved to /var/cache/conftool/dbconfig/20201130-141146-marostegui.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13477 and previous config saved to /var/cache/conftool/dbconfig/20201130-140226-root.json
  • 13:58 ema: varnish 6.0.7-1wm1 uploaded to apt.wikimedia.org component/varnish6 T268736
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P13475 and previous config saved to /var/cache/conftool/dbconfig/20201130-134841-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: After cloning clouddb1016:3318', diff saved to https://phabricator.wikimedia.org/P13474 and previous config saved to /var/cache/conftool/dbconfig/20201130-134722-root.json
  • 13:23 jbond42: update zeromq on jessie hosts
  • 13:21 dcausse: depooling wdqs1004 (lag)
  • 13:18 moritzm: CAS enabled for racktables
  • 13:16 gilles@deploy1001: Synchronized debug.json: T268167 Add mwdebug1003 to list of debug servers (duration: 00m 56s)
  • 12:50 Urbanecm: EU B&C window done
  • 12:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3476644: Grant enwikibooks reviewers suppressredirect and raise move rate limit to 100/60 (T268849; 2nd attempt) (duration: 00m 56s)
  • 12:43 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Redeploy to fix gelf traffic (duration: 00m 24s)
  • 12:43 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Redeploy to fix gelf traffic
  • 12:41 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
  • 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5585fd7: Enable RelatedArticles on ptwikinews (T268945) (duration: 00m 57s)
  • 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ba6d0f8: Grant enwikibooks reviewers suppressredirect and raise move rate limit to 100/60 (T268849) (duration: 00m 57s)
  • 12:37 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Newer codfw maps hosts (duration: 02m 05s)
  • 12:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9942d68: Assign patrolmarks right to autoconfirmed users on itwiki (T268734) (duration: 00m 57s)
  • 12:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1005.eqiad.wmnet
  • 12:35 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Newer codfw maps hosts
  • 12:34 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts (duration: 00m 24s)
  • 12:34 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts
  • 12:34 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts (duration: 00m 51s)
  • 12:33 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: Newer codfw maps hosts
  • 12:32 Lucas_WMDE: Deployed patch for T260349
  • 12:27 hnowlan@deploy1001: Finished deploy [tilerator/deploy@97575e4]: New eqiad maps hosts (duration: 00m 03s)
  • 12:27 hnowlan@deploy1001: Started deploy [tilerator/deploy@97575e4]: New eqiad maps hosts
  • 12:24 hnowlan@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: New eqiad maps hosts (duration: 00m 03s)
  • 12:24 hnowlan@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: New eqiad maps hosts
  • 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1005.eqiad.wmnet
  • 12:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 2922abe: Remove wgContentTranslationRESTBase config (T266213) (duration: 00m 57s)
  • 11:43 marostegui: Sanitize clouddb1016:3318 - T267090
  • 11:38 ema: A:cp upgrade fifo-log-demux to 0.6.2 T268883
  • 11:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 11:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 01s)
  • 11:32 ariel@deploy1001: Finished deploy [dumps/dumps@e8c6267]: allow page content fixup script to write output files to arbitrary dir (duration: 00m 04s)
  • 11:32 ariel@deploy1001: Started deploy [dumps/dumps@e8c6267]: allow page content fixup script to write output files to arbitrary dir
  • 11:28 ema: upload fifo-log-demux 0.6.2 to buster-wikimedia T268883
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13473 and previous config saved to /var/cache/conftool/dbconfig/20201130-111321-root.json
  • 11:00 hnowlan: bootstrapping maps1005 cassandra
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13472 and previous config saved to /var/cache/conftool/dbconfig/20201130-105818-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13471 and previous config saved to /var/cache/conftool/dbconfig/20201130-104314-root.json
  • 10:29 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:29 marostegui: Compare data between clouddb1014:3312 clouddb1018:3312 labsdb1012 T267090
  • 10:29 marostegui: Compare data between clouddb1012:3312 clouddb1018:3312 labsdb1012 T267090
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13470 and previous config saved to /var/cache/conftool/dbconfig/20201130-102811-root.json
  • 10:24 akosiaris: applying https://gerrit.wikimedia.org/r/q/topic:%22k8s_config%22 series of patches
  • 10:18 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:18 ema: cp4031: reboot to test atsmtail/fifo-log-demux service dependencies -- https://gerrit.wikimedia.org/r/c/operations/puppet/+/643922 T256467
  • 10:11 ema: cp4032: upgrade varnish to 6.0.7-1wm1 T268736
  • 10:06 moritzm: installing NSS security updates
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P13469 and previous config saved to /var/cache/conftool/dbconfig/20201130-095729-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13468 and previous config saved to /var/cache/conftool/dbconfig/20201130-095621-root.json
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13467 and previous config saved to /var/cache/conftool/dbconfig/20201130-094117-root.json
  • 09:40 marostegui: Stop MySQL on db1087 to clone clouddb1016:3318 T267090)
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from s8 and pool db1092 instead temporarily on vslow T267090', diff saved to https://phabricator.wikimedia.org/P13466 and previous config saved to /var/cache/conftool/dbconfig/20201130-093909-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1089 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13465 and previous config saved to /var/cache/conftool/dbconfig/20201130-092614-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1089+ (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13464 and previous config saved to /var/cache/conftool/dbconfig/20201130-092154-root.json
  • 08:51 marostegui: Deploy schema change on db1089
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P13463 and previous config saved to /var/cache/conftool/dbconfig/20201130-085101-marostegui.json
  • 08:41 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 08:36 marostegui: Compare data between clouddb1016:3315 labsdb1012 T267090
  • 07:45 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:25 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:11 marostegui: Deploy schema change on s1 codfw - T268004
  • 07:05 marostegui: Stop mysql on db1124:3318 to clone clouddb1016:3318, lag will show up on wikireplicas on s8 T267090
  • 06:47 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:26 kart_: Updated cxserver to 2020-11-23-050106-production (T262253, T268410)
  • 04:18 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:14 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:11 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .

2020-11-27

  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:50 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 15:06 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 14:56 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 14:50 elukey: roll restart zookeeper on druid* nodes for openjdk upgrades
  • 14:50 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 10:52 jayme: updated helmfile to 0.135.0-1 on deploy*,contint*
  • 10:51 jayme: updated helm-diff to 3.1.3-1 on contint*
  • 10:49 jayme: updated helm to 2.17.0-1 on deploy*,contint*,chartmuseum*
  • 10:06 jayme: updated helm and helmfile on deploy2001
  • 10:04 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:00 jayme: imported helm 2.17.0 into buster-wikimedia and stretch-wikimedia
  • 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:05 elukey: roll restart druid public cluster for openjdk upgrades
  • 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 06:39 marostegui: Stop mysql on es1015 T268810
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1015 from dbctl', diff saved to https://phabricator.wikimedia.org/P13454 and previous config saved to /var/cache/conftool/dbconfig/20201127-063846-marostegui.json
  • 06:30 marostegui: Remove es1016 from tendril and zarcillo T268812
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for decommissioning T268810', diff saved to https://phabricator.wikimedia.org/P13453 and previous config saved to /var/cache/conftool/dbconfig/20201127-061929-marostegui.json

2020-11-26

  • 17:18 jayme: downgrade helmfile to 0.125.2-1 on deploy*
  • 17:05 jayme: updated helm-diff and helmfile on deploy100* and deploy200*
  • 16:34 jayme: imported helm-diff 3.1.3-1 into buster-wikimedia and stretch-wikimedia
  • 15:01 moritzm: installing libonig security updates
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13452 and previous config saved to /var/cache/conftool/dbconfig/20201126-144446-root.json
  • 14:38 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 14:36 moritzm: installing zeromq3 security updates for stretch
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
  • 14:35 jbond42: failing idp back to idp2001
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13451 and previous config saved to /var/cache/conftool/dbconfig/20201126-142942-root.json
  • 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
  • 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
  • 14:23 moritzm: remove labtestpuppetmaster2001 from debmonitor T258103
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13450 and previous config saved to /var/cache/conftool/dbconfig/20201126-141439-root.json
  • 13:52 elukey: roll restart druid daemons on druid analytics to pick up new openjdk upgrades
  • 13:52 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:52 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 13:50 moritzm: installing python3.5 security updates
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P13449 and previous config saved to /var/cache/conftool/dbconfig/20201126-133204-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13448 and previous config saved to /var/cache/conftool/dbconfig/20201126-132918-root.json
  • 13:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13447 and previous config saved to /var/cache/conftool/dbconfig/20201126-131414-root.json
  • 13:07 hnowlan: testing depooling kartotherian on maps2004 to reduce load
  • 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
  • 13:01 jbond42: update puppet_compiler on compiler1003
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13446 and previous config saved to /var/cache/conftool/dbconfig/20201126-125911-root.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P13445 and previous config saved to /var/cache/conftool/dbconfig/20201126-124253-marostegui.json
  • 12:31 jbond42: fail over idp.wikimedia.org
  • 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:53 moritzm: rebooting seaborgium for kernel update
  • 11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:40 marostegui: Deploy schema change on s8 codfw - there will be lag on s8 codfw - T268004
  • 11:16 moritzm: restarting archiva to pick up Java security update
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13442 and previous config saved to /var/cache/conftool/dbconfig/20201126-104324-root.json
  • 10:41 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13441 and previous config saved to /var/cache/conftool/dbconfig/20201126-102820-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13440 and previous config saved to /var/cache/conftool/dbconfig/20201126-101317-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13439 and previous config saved to /var/cache/conftool/dbconfig/20201126-095813-root.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P13438 and previous config saved to /var/cache/conftool/dbconfig/20201126-094729-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P13437 and previous config saved to /var/cache/conftool/dbconfig/20201126-094702-marostegui.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P13436 and previous config saved to /var/cache/conftool/dbconfig/20201126-094639-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13435 and previous config saved to /var/cache/conftool/dbconfig/20201126-094538-root.json
  • 09:38 marostegui: Stop mysql on es1016 for decommission
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13434 and previous config saved to /var/cache/conftool/dbconfig/20201126-093035-root.json
  • 09:26 ema: deployment-cache-text06: upgrade Varnish to 6.0.7-1wm1 T268736
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13433 and previous config saved to /var/cache/conftool/dbconfig/20201126-091532-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13432 and previous config saved to /var/cache/conftool/dbconfig/20201126-090028-root.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P13431 and previous config saved to /var/cache/conftool/dbconfig/20201126-084903-marostegui.json
  • 08:40 elukey: roll restart cassandra on aqs10* for openjdk upgrades
  • 08:40 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 08:09 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 08:08 marostegui: Deploy schema change on s7 codfw - there will be lag on s7 codfw - T268004
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13430 and previous config saved to /var/cache/conftool/dbconfig/20201126-072506-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13429 and previous config saved to /var/cache/conftool/dbconfig/20201126-071514-root.json
  • 07:12 marostegui: Enable GTID on clouddb1018:3317 clouddb1014:3317 T267090
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13428 and previous config saved to /var/cache/conftool/dbconfig/20201126-071003-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13427 and previous config saved to /var/cache/conftool/dbconfig/20201126-070010-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13426 and previous config saved to /var/cache/conftool/dbconfig/20201126-065500-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13425 and previous config saved to /var/cache/conftool/dbconfig/20201126-064507-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13424 and previous config saved to /var/cache/conftool/dbconfig/20201126-063956-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1016 from dbctl', diff saved to https://phabricator.wikimedia.org/P13423 and previous config saved to /var/cache/conftool/dbconfig/20201126-063234-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13422 and previous config saved to /var/cache/conftool/dbconfig/20201126-063003-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 for decommissioning', diff saved to https://phabricator.wikimedia.org/P13421 and previous config saved to /var/cache/conftool/dbconfig/20201126-062811-marostegui.json
  • 06:17 marostegui: Stop mysql on db1124:3315 to clone clouddb1016:3315 T267090
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for schema change', diff saved to https://phabricator.wikimedia.org/P13420 and previous config saved to /var/cache/conftool/dbconfig/20201126-061552-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P13419 and previous config saved to /var/cache/conftool/dbconfig/20201126-061459-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P13418 and previous config saved to /var/cache/conftool/dbconfig/20201126-061432-marostegui.json
  • 06:08 ryankemper: T268770 [eqiad] Finished rolling restart of cirrus eqiad. All cirrus elasticsearch restarts are now complete (cloudelastic, relforge, eqiad, codfw)
  • 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 04:24 ryankemper: T268770 [eqiad] Begin rolling restart of cirrus eqiad, 3 nodes at a time
  • 04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 03:07 krinkle@deploy1001: Synchronized wmf-config/mc.php: I805699ecfa (duration: 00m 58s)

2020-11-25

  • 23:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:55 mutante: mwdebug1003 - scap pull - which rsyncs from deploy1001 and runs php-fpm restart check script (T245757)
  • 22:47 ejegg: increased Ingenico API call timeout
  • 22:34 shdubsh: beginning rolling restart of logstash cluster - eqiad
  • 22:23 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:19 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:49 krinkle@deploy1001: Synchronized php-1.36.0-wmf.18/includes/libs/CSSMin.php: I26ed3e5e9a - fix T268308 (duration: 00m 59s)
  • 20:43 mutante: LDAP added user duminasi to group wmf (T266791)
  • 20:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 18:44 elukey: upload new hive* packages 2.2.3-2 to stretch-wikimedia - thirdparty/bigtop14 component
  • 18:42 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 18:38 mutante: LDAP adding swagoel to NDA T267314#6625628
  • 18:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
  • 18:05 ryankemper: T268770 [cloudelastic] Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
  • 18:01 ryankemper: [cloudelastic] (forgot to mention this) Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
  • 17:58 ryankemper: T268770 [cloudelastic] restarts complete, service is healthy. This is done.
  • 17:55 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1006` complete and all 3 elasticsearch clusters are green, all cloudelastic instances are now complete
  • 17:49 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1005` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:44 shdubsh: beginning rolling restart of logstash cluster - codfw
  • 17:44 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1004` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:39 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1003` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:39 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1002` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:28 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1001` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:22 ryankemper: T268770 Freezing writes to cloudelastic in preparation for restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint1002`
  • 17:09 ryankemper: T268770 [cloudelastic] Downtimed `cloudelastic100[1-6]` in icinga in preparation for cloudelastic search elasticsearch cluster restart
  • 17:05 ryankemper: T268770 Begin rolling restart of eqiad cirrus elasticsearch, 3 nodes at a time
  • 17:04 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 17:00 godog: fail sdk on ms-be2031
  • 16:49 godog: clean up sdk1 on / on ms-be2031
  • 16:46 elukey: move analytics1066 to C3 - T267065
  • 16:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:21 mutante: puppetmaster - revoking old and signing new cert for mwdebug1003
  • 16:11 elukey: move analytics1065 to C3 - T267065
  • 16:10 mutante: shutting down mwdebug1003 - reimaging for T245757
  • 16:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:02 moritzm: installing golang-1.7 updates for stretch
  • 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:57 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:38 elukey: move stat1004 to A5 - T267065
  • 15:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 15:34 moritzm: removing maps2002 from debmonitor
  • 15:10 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:04 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 14:56 moritzm: installing krb5 security updates for Buster
  • 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 14:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 14:26 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 13:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:44 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 akosiaris: assign IPs to kubestage200{1,2,3}.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox T268747
  • 13:14 marostegui: Deploy schema change on commonswiki.watchlist on s4 codfw - there will be lag on s4 codfw - T268004
  • 13:08 akosiaris: assign IPs to kubestage200{1,2,3}.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13414 and previous config saved to /var/cache/conftool/dbconfig/20201125-124202-root.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13413 and previous config saved to /var/cache/conftool/dbconfig/20201125-122659-root.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13412 and previous config saved to /var/cache/conftool/dbconfig/20201125-121155-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13411 and previous config saved to /var/cache/conftool/dbconfig/20201125-115652-root.json
  • 11:49 gilles@deploy1001: Finished deploy [performance/coal@be167b2]: T268724 (duration: 00m 06s)
  • 11:48 gilles@deploy1001: Started deploy [performance/coal@be167b2]: T268724
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P13408 and previous config saved to /var/cache/conftool/dbconfig/20201125-114717-marostegui.json
  • 11:27 gilles@deploy1001: Finished deploy [performance/coal@468bc50]: T268724 (duration: 00m 06s)
  • 11:27 gilles@deploy1001: Started deploy [performance/coal@468bc50]: T268724
  • 11:27 jbond42: install krb5 updates to jessie hosts
  • 10:52 jbond42: failover idp primary to idp2001
  • 10:51 kormat: deployed wmfmariadbpy 0.6.1 to `C:wmfmariadbpy`
  • 10:43 kormat: uploaded wmfmariadbpy 0.6.1 to stretch+buster apt repos
  • 10:21 jynus: upgrade wmfbackup-check package on alert* hosts
  • 10:11 kormat: uploaded wmfmariadbpy 0.6 to stretch+buster apt repos
  • 09:54 moritzm: uploaded krb5 1.12.1+dfsg-19+deb8u5+wmf1 to apt.wikimedia.org
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13405 and previous config saved to /var/cache/conftool/dbconfig/20201125-095239-root.json
  • 09:45 marostegui: Manually install apt-get install bsd-mailx on clouddb1015, labsdb1012 and labsdb1011 - T268725
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13404 and previous config saved to /var/cache/conftool/dbconfig/20201125-093736-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13403 and previous config saved to /var/cache/conftool/dbconfig/20201125-092232-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13402 and previous config saved to /var/cache/conftool/dbconfig/20201125-090729-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P13401 and previous config saved to /var/cache/conftool/dbconfig/20201125-085216-marostegui.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13400 and previous config saved to /var/cache/conftool/dbconfig/20201125-084603-root.json
  • 08:43 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es5 T268469 (duration: 00m 59s)
  • 08:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13399 and previous config saved to /var/cache/conftool/dbconfig/20201125-083059-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13398 and previous config saved to /var/cache/conftool/dbconfig/20201125-081556-root.json
  • 08:14 kormat: rebooting es1024 T268469
  • 08:08 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 08:07 kormat: stopping mariadb on es1024 T268469
  • 08:04 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es5 T268469 (duration: 00m 58s)
  • 08:02 marostegui: Upgrade db2108
  • 08:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13397 and previous config saved to /var/cache/conftool/dbconfig/20201125-080053-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P13396 and previous config saved to /var/cache/conftool/dbconfig/20201125-071951-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P13395 and previous config saved to /var/cache/conftool/dbconfig/20201125-071450-marostegui.json
  • 06:38 marostegui: Stop mysql on db1125:3317 to clone clouddb1014:3317 clouddb1018:3317 T267090
  • 06:33 marostegui: Restart clouddb1019:3314, clouddb1019:3316
  • 06:32 marostegui: Restart clouddb1015:3314, clouddb1015:3316
  • 06:28 marostegui: Check private data on clouddb1014:3312 and clouddb1018:3312 T267090
  • 05:48 marostegui: Sanitize clouddb1014:3312 and clouddb1018:3312 T267090
  • 01:10 tgr_: Evening deploys done
  • 01:07 tgr@deploy1001: Finished scap: Backport: GrowthExperiments: Add Russian aliases (T268519) (duration: 32m 09s)
  • 00:35 tgr@deploy1001: Started scap: Backport: GrowthExperiments: Add Russian aliases (T268519)

2020-11-24

  • 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2 (duration: 00m 05s)
  • 23:50 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2
  • 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 (duration: 01m 51s)
  • 23:48 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488
  • 21:27 andrewbogott: restarting slapd on serpens
  • 21:20 cdanis: ✔️ cdanis@seaborgium.wikimedia.org ~ 🕟🍵 sudo systemctl restart prometheus-openldap-exporter.service
  • 21:17 andrewbogott: restarting slapd on seaborgium
  • 20:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Remove no longer needed EventLoggingSchemas override for NavigationTiming and ResourceTiming - T254606 (duration: 01m 01s)
  • 19:49 ryankemper: [elasticsearch] Restarted all elasticsearch systemd-managed services on `relforge100[1,2]`: `elasticsearch_6@relforge-eqiad.service` and `elasticsearch_6@relforge-eqiad-small-alpha.service`
  • 19:30 gilles@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/NavigationTiming/extension.json: (no justification provided) (duration: 00m 57s)
  • 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 331a129: Remove temporary feature flags (T258116) (duration: 00m 57s)
  • 19:20 mutante: LDAP - added derick to group nda (T268150)
  • 19:17 moritzm: installing Java security updates on elastic* and relforge*
  • 19:09 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:643260 group1: Switch ParserCache to JSON (duration: 00m 57s)
  • 18:59 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:56 elukey@deploy1001: Finished deploy [analytics/refinery@1ff0868]: Regular analytics weekly train (duration: 09m 50s)
  • 18:56 volans: migrating anycast zonefile to the Netbox-generated ones - T258729
  • 18:55 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:51 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:46 elukey@deploy1001: Started deploy [analytics/refinery@1ff0868]: Regular analytics weekly train
  • 18:46 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488 p2 (duration: 00m 05s)
  • 18:45 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488 p2
  • 18:45 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488 (duration: 01m 09s)
  • 18:45 elukey: restart memcached on mw2339 to pick up the correct port (was bound on 11211 rather than 11210)
  • 18:44 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488
  • 18:19 ejegg: updated Fundraising CiviCRM from 28464df973 to fb0ad7f39b
  • 18:07 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:06 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:04 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:51 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:10 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:29 elukey: move analytics1064 from C2 to C3 eqiad - T267065
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:06 hnowlan: finished removing restbase2009 from cassandra cluster
  • 16:01 cmjohnson1: replacing the sfp at cr1-eqiad xe-3/2/1 T267672
  • 15:42 marostegui: Drop kraken user from s4 - T268636
  • 15:38 elukey: move druid1005 from rack B7 to B6 - T267065
  • 15:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 15:28 jayme: pushed docker-registry.discovery.wmnet/calico/kube-controllers:v3.17.0 docker-registry.discovery.wmnet/calico/node:v3.17.0 docker-registry.discovery.wmnet/calico/typha:v3.17.0
  • 15:23 jayme: imported calico 3.17.0 into component/calico-future for stretch-wikimedia
  • 15:07 godog: swift eqiad-prod: decom ms-be1022 ssd from swift - T267870
  • 15:01 marostegui: Enable GTID on clouddb1013:3311 clouddb1015:3314 clouddb1017:3311 clouddb1019:3314 T267090
  • 14:58 elukey: move analytics1072 from rack B2 to B3 - T267065
  • 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:53 jayme: imported helmfile 0.135.0-1 into buster-wikimedia and stretch-wikimedia
  • 14:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P13392 and previous config saved to /var/cache/conftool/dbconfig/20201124-144219-marostegui.json
  • 14:34 liw: finished testing Scap on Beta cluster in prep for https://phabricator.wikimedia.org/T268634
  • 14:31 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:27 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13391 and previous config saved to /var/cache/conftool/dbconfig/20201124-141912-root.json
  • 14:09 moritzm: reset-failed idp-u2f.service after Hiera change (one time issue, will soon be obsolete)
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13390 and previous config saved to /var/cache/conftool/dbconfig/20201124-140409-root.json
  • 13:52 elukey@deploy1001: Finished deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252 (duration: 00m 05s)
  • 13:52 elukey@deploy1001: Started deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13389 and previous config saved to /var/cache/conftool/dbconfig/20201124-134905-root.json
  • 13:40 marostegui: Stop MySQL on db1074 to clone clouddb1018 and clouddb1014 T267090
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone clouddb1018 and clouddb1014 T267090', diff saved to https://phabricator.wikimedia.org/P13388 and previous config saved to /var/cache/conftool/dbconfig/20201124-133709-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13387 and previous config saved to /var/cache/conftool/dbconfig/20201124-133402-root.json
  • 13:13 jgleeson: civicrm revision is 28464df973, config revision is 928918a9b6
  • 13:01 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.18
  • 13:01 liw: done testing Scap release candidate on beta (failed: disk full on deploy01)
  • 12:49 hnowlan: disabled cassandra service on restbase2009, starting drain
  • 12:30 liw: testing upcoming Scap release on beta
  • 12:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:59 jayme: imported helm3 3.4.1-1 into buster-wikimedia and stretch-wikimedia
  • 11:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:52 XioNoX: push CR641949 and CR641949
  • 11:38 effie: rolling depool and pool app and api clusters - T244340
  • 11:25 _joe_: rebuild docker images for T268612
  • 11:20 effie: disable puppet on api and app servers to rollout onhost memcached - T244340
  • 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:14 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:12 marostegui: Stop mysql on db1125:3312 to clone clouddb1014:3312 and clouddb1018:3312 - T267090
  • 10:45 moritzm: upgrading seaborgium to Buster
  • 10:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:31 jbond42: up0load new cas package to wikimedia-buster
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2073', diff saved to https://phabricator.wikimedia.org/P13384 and previous config saved to /var/cache/conftool/dbconfig/20201124-100139-marostegui.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2026', diff saved to https://phabricator.wikimedia.org/P13383 and previous config saved to /var/cache/conftool/dbconfig/20201124-100020-marostegui.json
  • 09:48 volans: Migrating codfw private/public primary DNS records to the auto-generated ones from Netbox - T258729
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13382 and previous config saved to /var/cache/conftool/dbconfig/20201124-094449-marostegui.json
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P13381 and previous config saved to /var/cache/conftool/dbconfig/20201124-094159-marostegui.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13380 and previous config saved to /var/cache/conftool/dbconfig/20201124-094052-marostegui.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P13379 and previous config saved to /var/cache/conftool/dbconfig/20201124-093517-marostegui.json
  • 09:23 marostegui: Deploy schema change on db2114 and db1096:3316 - T268004
  • 09:13 ema: cp4032: switch back to varnish 6.0.6-1wm2 after T264398 experiment, fix T268243
  • 09:09 elukey: drop principals and keytabs for analytics10[42-57] - T267932
  • 09:03 gilles@deploy1001: Finished deploy [performance/navtiming@ba6cd0d]: T260580 Parse user agents in navtiming instead of relying on eventlogging to do it (duration: 00m 05s)
  • 09:03 gilles@deploy1001: Started deploy [performance/navtiming@ba6cd0d]: T260580 Parse user agents in navtiming instead of relying on eventlogging to do it
  • 08:49 _joe_: uploading the base production docker images for MediaWiki, T265324
  • 08:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:43 _joe_: refreshing debian buster base image
  • 08:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:42 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:31 marostegui: Deploy user for pki database for dbproxy1012, dbproxy1014, dbproxy2001 - T268329
  • 08:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 08:27 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:58 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13378 and previous config saved to /var/cache/conftool/dbconfig/20201124-074342-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13377 and previous config saved to /var/cache/conftool/dbconfig/20201124-073202-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13376 and previous config saved to /var/cache/conftool/dbconfig/20201124-073125-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13375 and previous config saved to /var/cache/conftool/dbconfig/20201124-072755-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13374 and previous config saved to /var/cache/conftool/dbconfig/20201124-072715-marostegui.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13373 and previous config saved to /var/cache/conftool/dbconfig/20201124-072249-marostegui.json
  • 07:00 _joe_: changing the mtail recipe for mediawiki/apache to use an actual histogram
  • 06:31 marostegui: Sanitize clouddb1019:3314 T267090
  • 06:28 marostegui: Sanitize clouddb1015:3314 T267090
  • 03:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:31 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:42 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls T268583 (duration: 01m 05s)
  • 00:29 reedy@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls T268583 (duration: 01m 06s)

2020-11-23

  • 22:56 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:52 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 22:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:54 mutante: mwdebug1003 - removing php packages and letting puppet reinstall them after it has the correct APT config T267248
  • 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 mutante: mwdebug1003 - scap pull because <+icinga-wm> PROBLEM - Ensure local MW versions match expected deployment on mwdebug1003 is CRITICAL
  • 20:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:09 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 04s)
  • 20:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
  • 20:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert a110db0: group1: switch ParserCache to JSON (T263579) (duration: 00m 42s)
  • 19:22 Urbanecm: Morning B&C done
  • 19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a110db0: group1: switch ParserCache to JSON (T263579) (duration: 01m 05s)
  • 19:15 Urbanecm: Synced security patch for T120883 (wmf.18)
  • 19:12 Urbanecm: Synced security patch for T120883 (wmf.16)
  • 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7561926: GrowthExperiments: Enable help panel top-posting on svwiki, ruwiki (T268227) (duration: 01m 06s)
  • 17:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:41 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2010.codfw.wmnet
  • 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:29 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 05s)
  • 17:22 mutante: DNS - new project language 'skr' added - Saraiki ( سرائیکی Sarā'īkī, also spelt Siraiki, or Seraiki) is an Indo-Aryan language of the Lahnda group, spoken in the south-western half of the province of Punjab in Pakistan.
  • 17:12 elukey: move aqs1004 from rack A4 to A3 - T267065
  • 17:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:37 elukey: move analytics1070 from rack A7 to rack A5 - T267065
  • 15:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 15:13 godog: add ipv6 forward/reverse records for grafana1002 / grafana2001
  • 15:05 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 14:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2009.codfw.wmnet
  • 14:10 kormat: cleaning up heartbeat.heartbeat on pc3 T268336
  • 14:09 kormat: cleaning up heartbeat.heartbeat on pc2 T268336
  • 14:04 kormat: cleaning up heartbeat.heartbeat on pc1 T268336
  • 14:01 moritzm: imported prometheus-php-fpm-exporter 0.4.1+git20181018.d0d1837-2 to buster-wikimedia T245757
  • 13:56 XioNoX: push CR641960
  • 13:56 godog: add ms-be106[0-3] to eqiad-prod with minimal weight - T268435
  • 13:17 moritzm: imported ploticus 2.42-4.2~wmf1 to buster-wikimedia T245757
  • 13:11 Lucas_WMDE: EU backport+config window done
  • 13:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/Wikibase: Backport: Calculate page props on-the-fly during RDF dump (T145712) (duration: 01m 14s)
  • 13:01 hnowlan: started cassandra pooling maps2009
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13370 and previous config saved to /var/cache/conftool/dbconfig/20201123-125815-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13369 and previous config saved to /var/cache/conftool/dbconfig/20201123-125759-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13368 and previous config saved to /var/cache/conftool/dbconfig/20201123-125417-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13367 and previous config saved to /var/cache/conftool/dbconfig/20201123-125345-marostegui.json
  • 12:34 Lucas_WMDE: Undeployed patch for T260349
  • 12:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2008.codfw.wmnet
  • 12:32 Urbanecm: Run scap pull at mwdebug1003
  • 12:28 marostegui: Stop mysql on db1121 to clone clouddb1017:3314 clouddb1019:3314
  • 12:27 Lucas_WMDE: Deployed patch for T260349
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone clouddb1017:3314 clouddb1019:3314 T267090', diff saved to https://phabricator.wikimedia.org/P13366 and previous config saved to /var/cache/conftool/dbconfig/20201123-122549-marostegui.json
  • 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c00d7e8: Move ContentTranslation out of Beta for br, ka, ast, si and ig WPs (T267212, T266217, T266218, T266219, T266220) (duration: 01m 06s)
  • 12:01 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=zhwiki; T246539)
  • 11:49 XioNoX: eqiad row A, split LVS, Ganeti, Cloud, interface-ranges to individual terms
  • 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 05s)
  • 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 21s)
  • 11:25 hnowlan: starting cassandra bootstrap of maps2008
  • 11:20 effie: enable puppet on cp* hosts
  • 11:16 moritzm: installing poppler security updates on stretch
  • 11:13 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 11:13 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:05 XioNoX: eqiad row A, standardize interfaces descriptions and ranges order
  • 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:26 effie: disable puppet on cp* hosts to merge 641730
  • 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:26 moritzm: rebooting serpens
  • 10:21 XioNoX: eqiad row B, split LVS, Ganeti, Cloud, interface-ranges to individual terms
  • 09:48 XioNoX: eqiad row B, standardize interfaces descriptions and ranges order
  • 08:46 elukey: drop kerberos keytabs for analytics10[28-41] from krb1001:/srv/kerberos/keytabs, decommed nodes (old hadoop test cluster)
  • 08:43 godog: start stress testing on ms-be106* - T268435
  • 08:41 elukey: drop kerberos principals from krb1001 for analytics10[29-41], decommed nodes (old hadoop test cluster)
  • 08:36 elukey: drop analytics1028's krb principals from krb1001 - old decommed node
  • 08:35 moritzm: installing remaining krb5 security updates for Stretch
  • 07:27 marostegui: Stop MySQL on db1125:3314 to clone clouddb1015 and clouddb1019 - lag will appear on Commosnwiki on wikireplicas - T267090
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:46 marostegui: Restart clouddb1013 clouddb1015 clouddb1017 clouddb1019 for testing T267090

2020-11-21

  • 09:18 joal: Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb
  • 09:17 joal: Drop historical logs of '
  • 08:28 ariel@deploy1001: Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s)
  • 08:28 ariel@deploy1001: Started deploy [dumps/dumps@1a76a9a]: revinfo updates
  • 08:10 elukey: remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110
  • 08:05 elukey: remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105

2020-11-20

  • 23:38 mutante: synced puppet-compiler facts - new hosts should be usable in compiler
  • 22:30 mutante: cumin1001 - sudo systemctl start cumin-check-aliases -> <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK T268369
  • 21:30 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:26 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:09 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:52 mutante: releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts
  • 19:45 mutante: releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed)
  • 19:39 mutante: Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise. from 72 to 25 active alerts
  • 19:14 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:47 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:36 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:31 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:14 dwisehaupt: shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259
  • 17:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:32 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:24 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:48 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 16:40 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:29 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:29 razzi@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 16:28 razzi: removed canceled ip address records for kafka-test1002 from netbox
  • 16:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:01 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:01 razzi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:42 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:01 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 14:58 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:30 elukey: force umount/mount for /mnt/hdfs on all stat1* nodes to pick up new openjdk settings
  • 14:28 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 14:00 elukey: restart hadoop daemons on an-master[1001-1002] (Hadoop masters) to pick up new rack settings and openjdk upgrades
  • 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:34 liw: finished trying to test scap on beta cluster
  • 13:24 bblack: cp*: remove remnants of expiring globalsign-2019 unified cert, including ocsp config+outputs
  • 13:12 liw: testing upcoming Scap release on beta
  • 13:00 bblack: dns*: upgrade remainder of fleet to gdnsd to 3.4.1
  • 12:54 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 12:29 moritzm: uploaded wmf-sre-laptop 0.3 to buster-wikimedia/component/wmf-sre-laptop
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set original weight to db1089', diff saved to https://phabricator.wikimedia.org/P13351 and previous config saved to /var/cache/conftool/dbconfig/20201120-121645-marostegui.json
  • 12:14 marostegui: Run check private data on clouddb1013:3311 clouddb1013:3313 clouddb1015:3316 clouddb1017:3311 clouddb1017:3313 clouddb1019:3316 T267090
  • 12:11 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=fawiki; T246539)
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13350 and previous config saved to /var/cache/conftool/dbconfig/20201120-115057-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13349 and previous config saved to /var/cache/conftool/dbconfig/20201120-114758-marostegui.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P13348 and previous config saved to /var/cache/conftool/dbconfig/20201120-114614-marostegui.json
  • 11:15 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 volans@cumin2001: START - Cookbook sre.dns.netbox
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13347 and previous config saved to /var/cache/conftool/dbconfig/20201120-104459-root.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13345 and previous config saved to /var/cache/conftool/dbconfig/20201120-102955-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13344 and previous config saved to /var/cache/conftool/dbconfig/20201120-101452-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13342 and previous config saved to /var/cache/conftool/dbconfig/20201120-095949-root.json
  • 09:56 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/642346)
  • 09:21 marostegui: Move pc2010 right under pc1007 to investigate lag issues (using orchestrator for this move)
  • 09:07 moritzm: updating krb5 on krb*
  • 08:57 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 08:50 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 08:32 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 08:31 elukey: roll restart kafka daemons on kafka-jumbo100* to pick up openjdk upgrades
  • 08:13 marostegui: Enable GTID on clouddb1015:3316 clouddb1019:3316 - T267090
  • 08:10 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/642268)
  • 08:04 marostegui: Stop db1124:3313 to clone clouddb1013:3313, clouddb1017:3313
  • 08:00 XioNoX: update cloud-in4 filter in codfw
  • 04:57 bblack: dns3001: upgrade gdnsd to 3.4.1
  • 04:55 bblack: authdns1001: upgrade gdnsd to 3.4.1
  • 04:49 bblack: authdns2001: upgrade gdnsd to 3.4.1
  • 04:45 bblack: dns3002: upgrade gdnsd to 3.4.1
  • 04:41 bblack: reprepro: uploaded gdnsd-3.4.1-1~wmf1 to buster-wikimedia

2020-11-19

  • 23:59 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:23 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:07 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:06 krinkle@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo/: T267668 - I1115135ee, and Ic239bb9807 (duration: 01m 07s)
  • 20:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:12 herron: upgraded logstash-next to kibana 7.10
  • 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:48 mutante: gerrit1001 - re-enabling puppet after merging gerrit:642086 for T268260 (upstream bug 13701)
  • 18:41 mutante: gerrit1001 - added RequestHeader set "X-Forwarded-Proto" expr=%{REQUEST_SCHEME} in apache config, reloaded apache to fix redirect issue
  • 18:37 mutante: gerrit1001 - disabled puppet
  • 18:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:07 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 17:59 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:47 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:33 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5 (duration: 00m 09s)
  • 17:33 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5
  • 17:32 hashar: Upgrading Gerrit to 3.2.5 and restarting it
  • 17:05 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 06s)
  • 17:04 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
  • 16:59 ryankemper: T246345 [wdqs] Data-transfer of new wdqs node `wdqs1012` is complete, beginning transfer of `wdqs1004`->`wdqs1013` (public) and `wdqs1003`->`wdqs1011` (internal). Once these transfers are done `wdqs1012` and `wdqs1013` will need to be pooled and have their weights set to 10 after verifying they're healthy
  • 16:58 kormat: started mariadb on pc2010, now with more 🤞
  • 16:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:54 kormat: stopping mariadb on pc2010
  • 16:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:43 hashar: Restarting Gerrit replica instance on gerrit2001
  • 16:42 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server) (duration: 00m 10s)
  • 16:42 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server)
  • 16:41 kormat: stopped and started replication on pc2010 to see if that would help it recover
  • 16:40 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5 (duration: 00m 05s)
  • 16:40 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5
  • 16:35 elukey: roll restart hadoop workers for openjdk upgrades
  • 16:35 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
  • 15:58 moritzm: installing jupyter-notebook security updates on an-coord*
  • 15:56 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
  • 15:52 bblack: dns*: upgrade to gdnsd-3.4.0 on remainder of the dns fleet'
  • 15:44 bblack: dns3001: upgrade gdnsd to 3.4.0
  • 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:41 bblack: dns1001: upgrade gdnsd to 3.4.0
  • 15:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:36 bblack: dns3002: upgrade gdnsd to 3.4.0
  • 15:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:31 bblack: authdns1001: upgrade gdnsd to 3.4.0
  • 15:30 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:26 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:57 moritzm: installing openldap security updates on buster (client side tools/libs, slapd already updated)
  • 14:54 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:47 marostegui: Sanitize enwiki on clouddb1017 T267090
  • 14:45 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:43 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:41 marostegui: Sanitize enwiki on clouddb1013 T267090
  • 14:39 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:29 moritzm: rolling restart of app server canaries to pick up latest sec updates
  • 14:21 moritzm: installing krb5 security updates on stretch
  • 14:02 bblack: authdns2001: upgrade gdnsd to 3.4.0
  • 13:45 XioNoX: push current state of audited cloud-in4 filter - T264993
  • 13:42 moritzm: removing stray wireshark 2.2.6 wireshark libs on Stretch
  • 13:32 moritzm: installing wireshark security updates
  • 13:30 bblack: dns4002: upgrade gdnsd to 3.4.0
  • 13:28 bblack: reprepro: updated buster-wikimedia gdnsd package to 3.4.0-1~wmf1
  • 12:43 moritzm: installing libproxy security updates on stretch
  • 12:38 marostegui: Stop mysql on db1106 to clone clouddb1013 and clouddb1017 T267090
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 T267090', diff saved to https://phabricator.wikimedia.org/P13334 and previous config saved to /var/cache/conftool/dbconfig/20201119-122459-marostegui.json
  • 12:00 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:44 moritzm: installing Java security updates on Hadoop/Kafka Jumbo hosts
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:00 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ruwiki; T246539)
  • 10:28 marostegui: Restart mysql on db1115, tendril and dbtree will be down for a few minutes
  • 09:40 marostegui: Stop mysql on db1124:3311 to clone clouddb1013 and clouddb1017, there will be lag on s1 on wikireplicas - T267090
  • 09:29 moritzm: upgrading serpens to Buster
  • 09:26 XioNoX: eqiad row C: move Ganeti/LVS interfaces to individual terms
  • 09:07 elukey: restart kafka daemons on kafka-jumbo1001 for openjdk upgrades (canary)
  • 08:56 effie: disable puppet on mw canaries to merge 641816
  • 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 08:49 elukey: restart hadoop daemons on analytics1058 for openjdk upgrades (canary)
  • 08:25 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 08:19 XioNoX: eqiad row C: standardize interfaces config
  • 07:55 XioNoX: eqiad row D: move Ganeti/LVS interfaces to individual terms
  • 07:47 XioNoX: eqiad row D: standardize interfaces config
  • 07:22 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 07:05 elukey: roll restart java daemons on Hadoop test for openjdk upgrades
  • 07:05 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 06:22 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:21 marostegui: Remove es1014 from tendril and zarcillo T268102
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:08 marostegui: Stop mysql on db1125:3316 to clone clouddb1015 and clouddb1019, there will be lag on s6 on wikireplicas - T267090
  • 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 01:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2020-11-18

  • 23:34 mutante: disabling puppet on memcache::mediawiki - deploying gerrit:637742
  • 22:56 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after T267269 (duration: 00m 04s)
  • 22:56 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after T267269
  • 22:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy GlobalWatchlist to beta (noop; T268181) (duration: 01m 04s)
  • 22:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy GlobalWatchlist extension: Prepare IS.php to know relevant variables (noop; T268181) (duration: 01m 06s)
  • 22:05 urbanecm@deploy1001: Synchronized wmf-config/extension-list: Deploy GlobalWatchlist extension to beta: add it to extension-list (T268181) (duration: 01m 05s)
  • 21:53 mutante: mwdebug1003 - restarting ferm because config was generated but service not restarted due to puppet dependency errors, breaking NRPE monitoring T267248
  • 21:47 mutante: mwdebug1003 - scap pull - T267248
  • 21:40 mutante: mw1317,mw1318 - back in action and all monitoring activated again
  • 21:17 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1318.eqiad.wmnet,cluster=videoscaler
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
  • 21:02 mutante: mw1317,mw1318 - repooled=no after physical move to rack B
  • 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
  • 20:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
  • 20:27 mutante: mw1317, mw1318 shutting down for physical move
  • 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1318.eqiad.wmnet
  • 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1317.eqiad.wmnet
  • 20:15 mutante: mw1317,mw1318 - downtimed and depooled - they are physically moving from B7 to B5 (T266164)
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
  • 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
  • 20:10 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 03s)
  • 20:09 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
  • 20:03 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 20:03 akosiaris@cumin1001: conftool action : set/weight=0; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 19:53 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 19:48 otto@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - T240460 (duration: 01m 06s)
  • 19:45 otto@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - T240460 (duration: 01m 07s)
  • 19:26 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:635607 - Switch ParserCache to JSON for group0 wikis (duration: 01m 05s)
  • 19:19 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:635086 - Enable parsoid on api_appserver (duration: 01m 04s)
  • 19:19 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 19:13 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:641527 - Set to 0 (duration: 01m 04s)
  • 18:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 elukey: shutdown an-presto1004 for hw maintenance
  • 17:13 akosiaris: T241230 pool codfw kubernetes for recommendation-api at a very low weight
  • 17:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 17:12 akosiaris@cumin1001: conftool action : set/weight=1; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 16:52 jbond42: drop os_version/requiers_os functions from wmflib
  • 16:50 elukey: update /etc/krb5.keytab on krb1001/krb2001 to match the most up to date key version for host/krb2001.codfw.wmnet
  • 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:44 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:38 reedy@deploy1001: Synchronized wmf-config/logging.php: T268141 (duration: 01m 06s)
  • 16:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:27 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:56 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 Urbanecm: mwscript deleteEqualMessages.php --wiki=cswiki --delete
  • 15:14 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:11 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:05 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:03 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php (T264797)
  • 14:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:30 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:13 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php (T264797)
  • 14:09 elukey: copied /etc/krb5.keytab from krb1001 to krb2001 (the last one contained only one principal for 2001, the first one both for 1001 and 2001)
  • 14:05 moritzm: installing openldap security updates on ro replicas
  • 14:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:02 elukey: restart krb5-kpropd.service on krb2001 to force the pick up of new client configs
  • 13:35 bblack: cache_text: Executing "varnishadm -n frontend param.set nuke_limit 1000" - T266373
  • 13:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 13:30 moritzm: installing openldap security updates on corp replicas
  • 13:08 Urbanecm: EU B&C done (~15 minutes ago)
  • 12:43 akosiaris: sync staging cluster's helmfile.d/admin state. Aside from calico, the rest is a noop
  • 12:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 12:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: 5488f56: Fix NewcomerTasksCacheRefreshJob (T268008) (duration: 01m 05s)
  • 12:25 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: 45d71a3: Fix NewcomerTasksCacheRefreshJob (T268008) (duration: 01m 05s)
  • 12:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/{bnwiki,bnwiki-1.5x,bnwiki-2x}.png (T265553)
  • 12:13 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=releases
  • 12:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: 70aabf7: Regenerate Bengali Wikipedia logo (T265553) (duration: 01m 06s)
  • 12:06 akosiaris@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=wikifeeds
  • 12:01 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after restarting mysql T266483 (duration: 01m 06s)
  • 12:00 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=blubberoid,name=eqiad
  • 11:56 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=frwiki; T246539)
  • 11:56 marostegui: Restart mysql on pc1009 T266483
  • 11:56 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; T246539)
  • 11:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 and place pc1010 instead of it T266483 (duration: 01m 18s)
  • 11:40 XioNoX: eqiad row D: remove un-needed "enable" keywords
  • 10:59 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99)
  • 10:59 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert
  • 10:58 jbond42: renew sretest1002 ssl cert to test cookbook
  • 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:25 godog: ms-be1022 - disable failed sdb
  • 10:01 XioNoX: eqiad row D: Standardize interfaces descriptions
  • 09:56 moritzm: uploaded libexif 0.6.21-2+deb8u4+wmf1 to jessie-wikimedia
  • 09:22 elukey: set dns_canonicalize_hostname = false to all kerberos clients
  • 09:13 jbond42: renew puppet certificate of seaborgium
  • 08:34 marostegui: Stop MySQL on es1011, es1012, es1014 T268100 T268101 T268102
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1012 from dbctl T268101', diff saved to https://phabricator.wikimedia.org/P13326 and previous config saved to /var/cache/conftool/dbconfig/20201118-082942-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13325 and previous config saved to /var/cache/conftool/dbconfig/20201118-082636-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13324 and previous config saved to /var/cache/conftool/dbconfig/20201118-082618-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 80%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13323 and previous config saved to /var/cache/conftool/dbconfig/20201118-081115-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13322 and previous config saved to /var/cache/conftool/dbconfig/20201118-075612-root.json
  • 07:45 marostegui: Deploy schema change on db1098:3316 T267335 T267399
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13321 and previous config saved to /var/cache/conftool/dbconfig/20201118-074108-root.json
  • 07:28 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; T246539)
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13320 and previous config saved to /var/cache/conftool/dbconfig/20201118-072605-root.json
  • 07:16 marostegui: Run check table on s6 on db1125:3316 T267090
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 30%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13319 and previous config saved to /var/cache/conftool/dbconfig/20201118-071101-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13318 and previous config saved to /var/cache/conftool/dbconfig/20201118-065558-root.json
  • 06:53 elukey: restart also mirror maker on kafka-main1001/1003 (seems not related but just to clear old errors and a possible weird state)
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 100%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13317 and previous config saved to /var/cache/conftool/dbconfig/20201118-064556-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13316 and previous config saved to /var/cache/conftool/dbconfig/20201118-064054-root.json
  • 06:37 elukey: restart kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1002 - consumer msg rate low since kafka-main2003 went down for codfw c7 failure
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 75%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13315 and previous config saved to /var/cache/conftool/dbconfig/20201118-063052-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13314 and previous config saved to /var/cache/conftool/dbconfig/20201118-062551-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1014 from dbctl', diff saved to https://phabricator.wikimedia.org/P13313 and previous config saved to /var/cache/conftool/dbconfig/20201118-062547-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 50%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13312 and previous config saved to /var/cache/conftool/dbconfig/20201118-061549-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13311 and previous config saved to /var/cache/conftool/dbconfig/20201118-061340-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1027 as new es1 master', diff saved to https://phabricator.wikimedia.org/P13310 and previous config saved to /var/cache/conftool/dbconfig/20201118-061218-marostegui.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1011 from dbctl', diff saved to https://phabricator.wikimedia.org/P13309 and previous config saved to /var/cache/conftool/dbconfig/20201118-061112-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1032 with minimum weight on es1 T261717', diff saved to https://phabricator.wikimedia.org/P13308 and previous config saved to /var/cache/conftool/dbconfig/20201118-060641-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 25%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13307 and previous config saved to /var/cache/conftool/dbconfig/20201118-060045-root.json
  • 05:47 marostegui: Run check table on enwiki on db1124:3311 T267090
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 10%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13306 and previous config saved to /var/cache/conftool/dbconfig/20201118-054542-root.json
  • 00:53 tgr_: also deployed Suggested Edits: Guard against task type not existing (T268012)
  • 00:52 tgr@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: Suggested edits: Guard against empty topic data (T268015) (duration: 01m 07s)
  • 00:27 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable watchlist expiry feature on Wikidata & Commons (T266874) (duration: 01m 03s)

2020-11-17

  • 22:54 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 00m 07s)
  • 22:54 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
  • 22:53 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 12m 51s)
  • 22:45 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 22:40 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
  • 22:39 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 22:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 22:10 mutante: otrs1001 - systemctl start otrs-cache-cleanup
  • 22:08 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere (duration: 11m 07s)
  • 22:07 mutante: otrs1001 - removing otrs-cache-cleanup cron from otrs's crontab - adding same command as systemd timer. gerrit:637038 T265138
  • 21:57 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere
  • 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw (duration: 07m 11s)
  • 21:24 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw
  • 20:56 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.18
  • 20:43 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; T246539)
  • 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.18 (duration: 39m 37s)
  • 19:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:52 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.18
  • 19:50 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010 (duration: 02m 03s)
  • 19:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010
  • 19:46 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.11 (duration: 13m 05s)
  • 19:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:18 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:12 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: wgEventStreamsDefaultSettings in beta should only set eqiad as topic prefix - T253069 (duration: 02m 26s)
  • 19:12 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:38 ejegg: updated standalone SmashPig deployment from 09f29c1da5 to 63dffcb11f
  • 18:36 ejegg: updated fundraising python tools from 68e054c9ad to 41cab089da
  • 18:09 jynus: stopping db1139 for hw maintenance T261405
  • 17:59 dpifke@deploy1001: Finished deploy [performance/navtiming@8eaf7db]: (no justification provided) (duration: 00m 05s)
  • 17:58 dpifke@deploy1001: Started deploy [performance/navtiming@8eaf7db]: (no justification provided)
  • 17:37 dpifke@deploy1001: Finished deploy [performance/coal@43b91df]: (no justification provided) (duration: 00m 06s)
  • 17:37 dpifke@deploy1001: Started deploy [performance/coal@43b91df]: (no justification provided)
  • 17:34 dpifke@deploy1001: Finished deploy [statsv/statsv@249d073]: (no justification provided) (duration: 00m 05s)
  • 17:34 dpifke@deploy1001: Started deploy [statsv/statsv@249d073]: (no justification provided)
  • 17:27 dpifke@deploy1001: Finished deploy [statsv/statsv@873ea90]: (no justification provided) (duration: 00m 05s)
  • 17:27 dpifke@deploy1001: Started deploy [statsv/statsv@873ea90]: (no justification provided)
  • 17:19 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 17:16 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55d4d41]: (no justification provided) (duration: 00m 04s)
  • 17:16 dpifke@deploy1001: Started deploy [performance/arc-lamp@55d4d41]: (no justification provided)
  • 17:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: (no justification provided) (duration: 00m 04s)
  • 17:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: (no justification provided)
  • 17:08 dpifke@deploy1001: Finished deploy [performance/coal@5a32eb2]: (no justification provided) (duration: 00m 04s)
  • 17:08 dpifke@deploy1001: Started deploy [performance/coal@5a32eb2]: (no justification provided)
  • 16:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:46 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 jbond42: re-enable puppet fleet wide
  • 16:36 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:33 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:22 moritzm: uploaded zeromq3 4.0.5+dfsg-2+deb8u2+wmf1 to jessie-wikimedia
  • 16:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:13 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:04 volans: powercycle ms-be1030.eqiad.wmnet, unresponsive to ping/ssh, no prompt in console, nothing in hw logs
  • 15:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:16 jbond42: disable puppet fleet wide
  • 15:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:59 cdanis@deploy1001: Synchronized docroot/thankyou: Special docroot for thankyouwiki T259312 d2a20ec57 (duration: 00m 55s)
  • 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:57 elukey: stutdown stat1008 for ram expansion
  • 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:49 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:47 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:43 XioNoX: codfw row A: move ganeti and LVS from interface-range to individual term
  • 14:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:37 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; T246539)
  • 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:03 XioNoX: codfw row A: standardize interfaces
  • 13:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 13:36 XioNoX: codfw row B: move ganeti, Cloud and LVS from interface-range to individual term
  • 13:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 13:22 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 13:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:09 XioNoX: codfw row B: remove extra "enable"
  • 12:59 Lucas_WMDE: EU backport&config window done (again ☺)
  • 12:58 moritzm: updating idp-test* to 6.2.4-2
  • 12:57 XioNoX: codfw row B: Standardize interfaces descriptions
  • 12:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: Suggested Edits: Guard against task type not existing (T268012) (duration: 00m 58s)
  • 12:53 bblack: cpNNNN: removing old (30d+) failure reports from /var/cache/ocsp
  • 12:42 moritzm: IDP updated to 6.2.4
  • 12:33 Lucas_WMDE: reopen EU backport&config window
  • 12:23 XioNoX: codfw row C: move ganeti and LVS from interface-range to individual term
  • 12:15 XioNoX: codfw row C: remove extra "enable"
  • 12:15 Lucas_WMDE: EU backport&config window done
  • 12:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2006.codfw.wmnet
  • 12:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove migration settings in InitialiseSettings.php (T264286), 2/2 (labs) (duration: 00m 56s)
  • 12:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove migration settings in InitialiseSettings.php (T264286), 1/2 (prod) (duration: 00m 56s)
  • 12:05 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Remove migration settings in Wikibase.php (T264286) (duration: 00m 57s)
  • 11:51 XioNoX: codfw row C: Standardize interfaces descriptions
  • 10:46 marostegui: Run a test on check_private_data on clouddb1013 for s1 and s3 - T267090
  • 10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 in pc2 after restarting mysql T266483 (duration: 00m 56s)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:19 marostegui: Restart mysql on pc1008 T266483
  • 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 and place pc1010 instead of it T266483 (duration: 00m 57s)
  • 09:29 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 09:17 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:14 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 09:10 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:56 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:56 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1028 as new es3 master', diff saved to https://phabricator.wikimedia.org/P13301 and previous config saved to /var/cache/conftool/dbconfig/20201117-085542-marostegui.json
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 before decommissioning it and pool es1026 as new es2 master', diff saved to https://phabricator.wikimedia.org/P13300 and previous config saved to /var/cache/conftool/dbconfig/20201117-085432-marostegui.json
  • 08:52 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13299 and previous config saved to /var/cache/conftool/dbconfig/20201117-084744-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13298 and previous config saved to /var/cache/conftool/dbconfig/20201117-084733-root.json
  • 08:43 marostegui: Truncate tendril.global_status_log - T231185
  • 08:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 80%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13297 and previous config saved to /var/cache/conftool/dbconfig/20201117-083241-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 80%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13296 and previous config saved to /var/cache/conftool/dbconfig/20201117-083229-root.json
  • 08:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:22 volans: restart netbox on netbox1001 to test new logging configuration
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13295 and previous config saved to /var/cache/conftool/dbconfig/20201117-081737-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13294 and previous config saved to /var/cache/conftool/dbconfig/20201117-081726-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 60%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13293 and previous config saved to /var/cache/conftool/dbconfig/20201117-080234-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 60%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13292 and previous config saved to /var/cache/conftool/dbconfig/20201117-080222-root.json
  • 07:58 XioNoX: codfw row D: Convert LVS ranges to individual interfaces
  • 07:54 XioNoX: codfw row D: explicitly set access ports to "interface-mode access"
  • 07:49 XioNoX: split codfw row D ganeti switch ports out of the interface group
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13291 and previous config saved to /var/cache/conftool/dbconfig/20201117-074730-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13290 and previous config saved to /var/cache/conftool/dbconfig/20201117-074719-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 30%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13289 and previous config saved to /var/cache/conftool/dbconfig/20201117-073227-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 30%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13288 and previous config saved to /var/cache/conftool/dbconfig/20201117-073216-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 100%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13287 and previous config saved to /var/cache/conftool/dbconfig/20201117-073057-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 100%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13286 and previous config saved to /var/cache/conftool/dbconfig/20201117-073032-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13285 and previous config saved to /var/cache/conftool/dbconfig/20201117-071723-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13284 and previous config saved to /var/cache/conftool/dbconfig/20201117-071712-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 75%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13283 and previous config saved to /var/cache/conftool/dbconfig/20201117-071553-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 75%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13282 and previous config saved to /var/cache/conftool/dbconfig/20201117-071529-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 20%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13281 and previous config saved to /var/cache/conftool/dbconfig/20201117-070220-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 20%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13280 and previous config saved to /var/cache/conftool/dbconfig/20201117-070209-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 50%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13278 and previous config saved to /var/cache/conftool/dbconfig/20201117-070050-root.json
  • 07:00 marostegui: Stop mysql on db1124: s1 and s3, this will generate lag on enwiki and s3 on labsdb - T267090
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 50%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13277 and previous config saved to /var/cache/conftool/dbconfig/20201117-070025-root.json
  • 06:51 marostegui: Upgrade db1077 and pc2010 to 10.4.17
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13276 and previous config saved to /var/cache/conftool/dbconfig/20201117-064716-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13275 and previous config saved to /var/cache/conftool/dbconfig/20201117-064705-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 25%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13274 and previous config saved to /var/cache/conftool/dbconfig/20201117-064546-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 25%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13273 and previous config saved to /var/cache/conftool/dbconfig/20201117-064522-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1034 with minimum weight on es3 T261717', diff saved to https://phabricator.wikimedia.org/P13272 and previous config saved to /var/cache/conftool/dbconfig/20201117-063933-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1033 with minimum weight on es2 T261717', diff saved to https://phabricator.wikimedia.org/P13271 and previous config saved to /var/cache/conftool/dbconfig/20201117-063805-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 10%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13270 and previous config saved to /var/cache/conftool/dbconfig/20201117-063043-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 10%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13269 and previous config saved to /var/cache/conftool/dbconfig/20201117-063019-root.json
  • 02:37 dwisehaupt: shifted portion of thank you emails flowing through frmx's to 60% of the total volume
  • 01:59 eileen_: civicrm revision is b6fe8bd791, config revision is 61e2000391

2020-11-16

  • 23:28 mutante: cumin1001 - sudo systemctl start cumin-check-aliases (to confirm switching cron to timer worked) T265138
  • 22:22 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 22:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 22:09 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 22:06 mutante: planet - fixed updates of uk.planet which failed due to non-ASCII chars in a URL - since updates are systemd timers now that affects the entire systemd state monitoring
  • 21:40 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 21:40 rzl@cumin1001: conftool action : set/weight=1; selector: name=mw2250.codfw.wmnet,cluster=videoscaler,service=canary
  • 21:38 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet,cluster=jobrunner
  • 21:30 mutante: peek2001 - mv /var/lib/peek/git to git.old ; run puppet ; let it fix git checkout
  • 21:07 rzl: disable puppet on jobrunners T264991
  • 20:40 mutante: planet1002/planet2002 - delete entire crontab of user planet, drop update cronjobs after switching to systemd timers with gerrit:636105 (T265138)
  • 20:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:06 mutante: releases2002 systemctl reset-failed should clear Icinga systemd alert after gerrit:641228
  • 20:05 dwisehaupt: disabling process-control jobs and moving to maintenance mode for maint window
  • 19:57 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint (duration: 02m 27s)
  • 19:51 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint
  • 19:48 effie: disable puppet on parsoid servers - T264991
  • 19:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 18:59 mutante: mw2255 - is pooled and puppet works on next run, after it removed php 7.2 config files
  • 18:56 mutante: running puppet on mw2313 and mw2255 which were listed in puppetboard as failed puppet runs
  • 18:15 rzl: disable puppet on 'A:mw-api and not A:mw-api-canary' T264991
  • 18:05 effie: disable puppet on all appservers
  • 17:48 elukey: enable and run puppet on kafka-main2003 (it will start kafka services) - T267865
  • 17:42 dwisehaupt: frmon1001 upgraded to buster
  • 17:36 volans: moved interfaces in Netbox from old to new switch - T267865
  • 17:24 vgutierrez: switching back from lvs2010 to lvs2007 - T267865
  • 17:21 vgutierrez: repooling cp2037 and cp2038 - T267865
  • 16:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 16:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:16 XioNoX: update c7 serial in row C VC config - T267865
  • 16:16 rzl: disable puppet on A:mw-api-canary T264991
  • 16:14 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 16:08 effie: disable puppet in appservers canaries to install ICU 63 - T264991
  • 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet
  • 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2037.codfw.wmnet
  • 16:06 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 16:03 hnowlan: joined maps2006 to maps codfw cassandra cluster
  • 16:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:57 hnowlan: roll-restarting eqiad restbase for java security updates
  • 15:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:40 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:40 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 14:16 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 in pc1 after restarting mysql T266483 (duration: 00m 59s)
  • 14:06 marostegui: Restart pc1007's mysql T266483
  • 14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 and place pc1010 instead of it T266483 (duration: 01m 00s)
  • 13:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 13:00 kormat: running schema change against s1 in codfw T259831
  • 12:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:43 moritzm: installing tcpdump security updates
  • 12:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:25 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 12:25 hnowlan: roll-restarting restbase-codfw
  • 12:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 12:10 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 hnowlan: roll restarting sessionstore for java updates
  • 11:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:13 moritzm: installing poppler security updates
  • 10:46 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:46 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:45 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:44 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:44 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
  • 09:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:39 godog: centrallog1001 move invalid config /etc/logrotate.d/logrotate-debug to /etc
  • 08:35 moritzm: installing codemirror-js security updates
  • 08:32 XioNoX: asw-c-codfw> request system power-off member 7 - T267865
  • 08:24 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb] (duration: 00m 07s)
  • 08:23 joal@deploy1001: Started deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb]
  • 08:23 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb] (duration: 10m 09s)
  • 08:13 joal@deploy1001: Started deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb]
  • 08:08 XioNoX: asw-c-codfw> request system power-off member 7 - T267865
  • 06:35 marostegui: Stop replication on s3 codfw master (db2105) for MCR schema change deployment T238966
  • 06:14 marostegui: Stop MySQL on es1018, es1015, es1019 to clone es1032, es1033, es1034 - T261717
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1018, es1015, es1019 - T261717', diff saved to https://phabricator.wikimedia.org/P13262 and previous config saved to /var/cache/conftool/dbconfig/20201116-060624-marostegui.json
  • 06:02 marostegui: Restart mysql on db1115 (tendril/dbtree) due to memory usage
  • 00:55 shdubsh: re-applied mask to kafka and kafka-mirror-main-eqiad_to_main-codfw@0 on kafka-main2003 and disabled puppet to prevent restart - T267865
  • 00:19 elukey: run 'systemctl mask kafka' and 'systemctl mask kafka-mirror-main-eqiad_to_main-codfw@0' on kafka-main2003 (for the brief moment when it was up) to avoid purged issues - T267865
  • 00:09 elukey: sudo cumin 'cp2028* or cp2036* or cp2039* or cp4022* or cp4025* or cp4028* or cp4031*' 'systemctl restart purged' -b 3 - T267865

2020-11-15

  • 22:10 cdanis: restart some purgeds in ulsfo as well T267865 T267867
  • 22:03 cdanis: T267867 T267865 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b2 -s10 'A:cp and A:codfw' 'systemctl restart purged'
  • 14:00 cdanis: powercycling ms-be1022 via mgmt
  • 11:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:21 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:12 vgutierrez: depooling lvs2007, lvs2010 taking over text traffic on codfw - T267865
  • 10:00 elukey: cumin 'cp2042* or cp2036* or cp2039*' 'systemctl restart purged' -b 1
  • 09:57 elukey: restart purged on cp4028 (consumer stuck due to kafka-main2003 down)
  • 09:55 elukey: restart purged on cp4025 (consumer stuck due to kafka-main2003 down)
  • 09:53 elukey: restart purged on cp4031 (consumer stuck due to kafka-main2003 down)
  • 09:50 elukey: restart purged on cp4022 (consumer stuck due to kafka-main2003 down)
  • 09:42 elukey: restart purged on cp2028 (kafka-main2003 is down and there are connect timeouts errors)
  • 09:07 Urbanecm: Change email for SUL user Botopol via resetUserEmail.php (T267866)
  • 08:27 elukey: truncate -s 10g /var/lib/hadoop/data/n/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000177/stderr on an-worker1100
  • 08:24 elukey: sudo truncate -s 10g /var/lib/hadoop/data/c/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000019/stderr on an-worker1098

2020-11-13

  • 22:06 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=myvwiki autopatrolled # T105570
  • 22:04 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki editor # T105570
  • 21:42 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwikinews reviewer # T105570
  • 21:40 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=bnwiki editor # T105570
  • 21:39 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki flood # T105570
  • 21:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=test2wiki upwizcampeditors # T105570
  • 21:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=aawiki communityapplica # T105570
  • 21:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwiki epadmin # T105570
  • 16:50 _joe_: manually rotate user.log on centrallog1001 and moved it to /srv/user.log.manual-rotation
  • away: updated fundraising CiviCRM from f7954c6659 to 74d795408f
  • 08:15 vgutierrez: restart acme-chief on acmechief1001
  • 01:30 TimStarling: on mwmaint1002 running fixT260485.php unmerged fixup script from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMaintenance/+/640348

2020-11-12

  • 19:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0f0f839: Enable "Cite" button in toolbar for enwiktionary (T267504) (duration: 00m 58s)
  • 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3ce18e6: Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T267784) (duration: 01m 00s)
  • 16:12 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux at mwmaint1002 (wiki=jawiki; T246539)
  • 16:11 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; T246539)
  • 13:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; T246539)
  • 11:40 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 11:35 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 11:30 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 11:12 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 11:08 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 11:02 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 09:19 hashar@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo: Revert "filerepo: clean up shared cache keys to avoid key metrics clutter" - T267668 (duration: 01m 01s)
  • 09:12 hashar: Pulled https://gerrit.wikimedia.org/r/640746 on deployment server for # T267668
  • 03:46 ejegg: updated python fundraising tools from 7853f426ee to 68e054c9ad

2020-11-11

  • 16:44 XioNoX: Revert "temporarily route Italy to codfw"
  • 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:30 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 15:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:52 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 14:29 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 13:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=cp3054.esams.wmnet
  • 12:25 Lucas_WMDE: EU backport&config window done
  • 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Remove propagateChangeVisibility repo setting (duration: 00m 58s)
  • 12:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable propagatePageDeletion on Wikidata (duration: 00m 59s)
  • 12:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/DiscussionTools/includes/CommentParser.php: Backport: Fix getHeadlineNodeAndOffset() returning text nodes (T267284) (duration: 01m 01s)
  • 10:34 XioNoX: delete unused interfaces from asw-d-codfw
  • 09:53 XioNoX: prioritized DE-CIX IXP - T262681
  • 02:18 ryankemper: (WDQS deploy completed)
  • 00:48 ryankemper: Restarting `wdqs-categories` one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 00:47 ryankemper: Restarted `wdqs-categories` across wdqs test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 00:47 ryankemper: Restarted `wdqs-updater` simultaneously across all wdqs hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 00:47 ryankemper: [wdqs deploy] following deploy, example query succeeds on `query.wikidata.org`, proceeding to post deploy steps
  • 00:46 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@03219df]: 0.3.55 (duration: 11m 24s)
  • 00:46 ryankemper: T222669 [Elasticsearch reindex] Began long-running reindex of cirrus elasticsearch for `codfw`, `eqiad`, and `cloudelastic`. 3 tmux sessions on `ryankemper@mwmaint1002`: `reindex_eqiad`, `reindex_codfw`, `reindex_cloudelastic`
  • 00:38 ryankemper: Following deploy to canary `wdqs1003`, automated tests are passing as is a manual test of an example query. Proceeding...
  • 00:34 ryankemper@deploy1001: Started deploy [wdqs/wdqs@03219df]: 0.3.55
  • 00:32 ryankemper: About to begin wdqs deploy; before-deploy tests on canary `wdqs1003` are passing
  • 00:09 eileen: civicrm revision changed from d0cd7f6dbb to e5d12cc46c, config revision is e2d133eff4

2020-11-10

  • 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:05 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 21:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:58 jgleeson: update civicrm revision changed from c36a5cc1b1 to d0cd7f6dbb
  • 21:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:47 ebernhardson: unban elastic1050 from eqiad search psi cluster
  • 21:28 cstone: civicrm revision changed from b1342c4129 to c36a5cc1b1
  • 21:24 brennen@deploy1001: sync-file aborted: Testing: README.md sync-file with ssh -n for T223287 (duration: 00m 37s)
  • 21:23 brennen: testing some scap operations, modified to use ssh -n for debugging T223287
  • 21:11 ebernhardson: ban elastic1050 from eqiad psi cluster due to excessive load
  • 21:02 brennen@deploy1001: Finished scap: Backport: language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614) and Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587) (duration: 34m 46s)
  • 20:27 brennen@deploy1001: Started scap: Backport: language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614) and Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)
  • 20:10 brennen@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Turn on formatnum logging (T267587, T267370) (duration: 01m 02s)
  • 19:06 hknust: holger mwmaint1002 Stop T219279
  • 18:31 hknust: holger mwmaint1002 Start T219279
  • 17:57 effie: pool mw1263 mw1264
  • 17:31 effie: briefly depool mw1263 and mw1264
  • 17:30 jynus: about to shutdown db1139 for hw maintenance T261405
  • 17:13 dwisehaupt: upping thank you mail flow through frmx's to 30% of the total runs
  • 16:32 XioNoX: add cloud-storage1-b-codfw to, well, codfw switches - T267378
  • 16:20 effie: pool mw1263
  • 16:17 hashar: Restarting Gerrit on gerrit1001
  • 16:12 hashar: Restarted Gerrit on gerrit2001 for config change
  • 15:53 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9 (duration: 01m 06s)
  • 15:52 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9
  • 15:38 moritzm: installing 4.19.152 kernel packages on buster hosts (only installing the package, reboots will happen separately)
  • 15:28 effie: depool mw1263 - T244340
  • 15:09 ejegg: updated fundraising python tools from 087a596d3a to 7853f426ee
  • 14:21 effie: pooling mw1276 - T244340
  • 13:51 moritzm: imported php-memcached 3.0.1+2.2.0-1~wmf3+buster1 to component/php72 for buster-wikimedia
  • 13:29 marostegui: Restart db2093 to pick up report_host - T266483
  • 13:17 marostegui: Restart db1117* to pick up report_host - T266483
  • 12:46 effie: depool mw1276 to install onhost memcached - T244340
  • 12:33 Lucas_WMDE: EU backport&config window done
  • 12:33 moritzm: installing wireshark security updates
  • 12:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Switch parser cache to using "mcrouter-with-onhost-tier" (T264604) (duration: 00m 57s)
  • 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/mc.php: Config: Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches (T264604) (duration: 00m 57s)
  • 12:04 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/Wikibase: Backport: Revert JS parser commits (T266671) (duration: 01m 04s)
  • 08:59 hashar: Restarted Gerrit for plugins deployment
  • 08:06 hashar: Restarting Gerrit on gerrit2001 / gerrit-replica
  • 08:04 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - T184086 (duration: 00m 10s)
  • 08:04 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - T184086
  • 07:40 elukey: import hue_4.8.0-2 to buster-wikimedia
  • 06:53 marostegui: Restart dbstore* to pick up report_host - T266483
  • 06:44 marostegui: Restart pc1010 to pick up report_host - T266483

2020-11-09

  • 22:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:14 mbsantos@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs (T222377) (duration: 02m 23s)
  • 21:11 mbsantos@deploy1001: Started deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs (T222377)
  • 20:53 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=maps2002.*
  • 20:36 cdanis: depool maps2002
  • 20:26 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 01m 09s)
  • 20:25 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)
  • 20:24 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 11m 36s)
  • 20:13 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)
  • 20:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.16
  • 20:04 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:01 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:58 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 18:32 mepps: updated payments-wiki from 388490e86d to 8612ed1002, config revision is 987e839869
  • 17:53 XioNoX: re-order asw-d-codfw interfaces-ranges
  • 17:51 XioNoX: standardize asw-d-codfw interfaces descriptions
  • 17:33 effie: updating mwdebug2002 to ICU 63 - T264991
  • 17:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 05s)
  • 16:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
  • 16:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:40 moritzm: imported 2.0.2+0.5.7-1~wmf3+php72+buster1 to component/php72 for buster-wikimedia
  • 16:34 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=trwiki; T246539)
  • 16:34 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; T246539)
  • 16:20 XioNoX: Netbox prod: mass import from PuppetDB (cables, etc) - T262899
  • 16:04 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:55 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:12 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 62c2e02: abusefilter.php: Enable wgAbuseFilterNotificationsPrivate by default for WMF wikis (T266298) (duration: 01m 07s)
  • 14:34 hashar: Restarting Gerrit
  • 14:07 hashar@deploy1001: Finished deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # T232678 (duration: 00m 18s)
  • 14:07 hashar@deploy1001: Started deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # T232678
  • 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:03 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:03 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; T246539)
  • 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:59 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:40 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 12:13 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwikinews --fix --add-prefix=BROKEN # T266925
  • 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 11b8f62: Add wgNamespaceAliases for zhwikinews (T266925) (duration: 01m 06s)
  • 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 87b3eed: Enable DiscussionTools as a beta feature on fiwiki (T265446) (duration: 01m 06s)
  • 11:58 moritzm: installing remaining openldap updates on stretch
  • 11:57 jynus: restart dbstore1004 mariadb instances
  • 10:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 10:36 XioNoX: add 185.15.56.240/29 IPs to relevant cloudsw interfaces - T265288
  • 10:35 effie: merging 638109 and roll restart ms-fe* hosts to pick up the change
  • 10:11 XioNoX: renumber cloud-xlink1-eqiad
  • 09:56 Urbanecm: Purge https://vote.wikimedia.org/wiki/Main_Page (T262689)
  • 09:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=svwiki; T246539)
  • 09:52 hashar: Restarting Gerrit on gerrit1001 and gerrit2001 in order to have the JVM to exit after OutOfMemory # T267517
  • 09:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b0a81f: Revert "Change votewiki language temporarily to fa for fawiki elections" (T262689) (duration: 01m 08s)
  • 09:37 moritzm: installing libexif security updates
  • 09:06 godog: enable thanos query-frontend on thanos-fe hosts - T261281
  • 08:24 XioNoX: configure traceoptions on pfw3-eqiad - T263833
  • 08:11 hashar: Restarting Gerrit on gerrit1001 and gerrit2001
  • 07:58 hashar: Restarted CI Jenkins on contint2001 for Java upgrade
  • 07:17 elukey: restart gerrit on gerrit2001 (OOM registered for two days ago, uptime from systemctl since a month ago, probably in a weird state)
  • 01:35 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/tests/phpunit/maintenance/categoryChangesAsRdfTest.php: this was cherry-picked to make CI pass, pushing it out just for a clean staging dir (duration: 01m 06s)
  • 01:32 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.api/upload.js: fixing UBN T266903 (duration: 01m 06s)
  • 01:30 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.Upload.js: fixing UBN T266903 (duration: 01m 07s)
  • 01:29 tstarling@deploy1001: sync-file aborted: fixing UBN T266903 (duration: 00m 01s)

2020-11-08

  • 23:08 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.api/upload.js: fixing UBN T266903 (duration: 01m 06s)
  • 23:06 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.Upload.js: fixing UBN T266903 (duration: 01m 35s)
  • 20:34 cdanis: repool esams
  • 19:48 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:48 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:16 cdanis: depool esams
  • 18:35 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:35 cdanis@cumin1001: START - Cookbook sre.network.cf

2020-11-06

  • 23:38 dwisehaupt: frdata1001 upgraded to buster
  • 22:40 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling (duration: 01m 08s)
  • 22:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling
  • 22:29 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling (duration: 00m 26s)
  • 22:29 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling
  • 20:57 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/skins/CologneBlue/: T267278 (duration: 01m 05s)
  • 20:56 reedy@deploy1001: Synchronized php-1.36.0-wmf.14/skins/CologneBlue/: T267278 (duration: 01m 10s)
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:54 cwhite@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
  • 17:02 dwisehaupt: rolled out new thank_you_mail_send process_control scripts to utilize frmx hosts
  • 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2005.codfw.wmnet
  • 14:46 moritzm: installing wireshark security updates
  • 14:36 hnowlan: resyncing database on maps1001
  • 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:05 hnowlan: started cassandra bootstrap of maps2005
  • 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:30 hnowlan: joining maps2005 to cassandra cluster
  • 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:09 moritzm: uploaded openjdk-8 8u272-b10-1~deb10u1 to buster-wikimedia/component/jdk
  • 10:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:06 dcausse: restarted elastic on elastic1063 (T265113)
  • 09:57 moritzm: installing spice security updates
  • 09:32 moritzm: installing libsndfile security updates
  • 09:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 moritzm: installing openldap security updates on stretch/buster (client-side tools/libs only, slapd updates already deployed)
  • 04:38 ryankemper: [Deploy finished] WDQS deploy is complete; the service is healthy per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1604633917530&to=1604637475930
  • 04:36 ryankemper: Finished restarting wdqs categories one host at a time across all wdqs production instances
  • 04:02 ryankemper: Restarting wdqs categories one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` (in progress)
  • 04:01 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:01 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:00 ryankemper: `query.wikidata.org` looks good following deploy, proceeding to post-deploy steps
  • 03:59 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@27a5c54]: 0.3.54 (duration: 11m 22s)
  • 03:51 ryankemper: Tests passing on canary `wdqs1003` following initial deployment, proceeding with deploy to rest of fleet
  • 03:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@27a5c54]: 0.3.54
  • 03:48 ryankemper: About to begin wdqs deploy, tests passing on canary `wdqs1003`
  • 00:53 brennen@deploy1001: Finished scap: Synchronizing to pick up i18n for gerrit:639505. Will resume moving train to group1 on Monday morning (US) (T263182) (duration: 69m 02s)

2020-11-05

  • 23:44 brennen@deploy1001: Started scap: Synchronizing to pick up i18n for gerrit:639505. Will resume moving train to group1 on Monday morning (US) (T263182)
  • 23:38 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/includes/media/FormatMetadata.php: Backport: media: Support GPSAltitudeRef exif tag - FormatMetData.php (T267370) (duration: 07m 22s)
  • 23:29 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages/i18n/exif: Backport: media: Support GPSAltitudeRef exif tag - i18n/exif files (T267370) (duration: 01m 08s)
  • 23:09 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/vendor: Backport: Bump wikimedia/parsoid to 0.13.0-a16 (T267146) (duration: 01m 14s)
  • 20:54 hnowlan: reenabled tilerator in eqiad
  • 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.14
  • 20:44 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 39s)
  • 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
  • 20:39 hnowlan: finished removenode of maps2002 cassandra
  • 20:22 brennen: train: waiting ~15 minutes before rolling forward to group1.
  • 20:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
  • 20:15 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/CentralAuth/includes/specials/SpecialCentralAuth.php: Backport: Dont double-format numeric edit count (T267362) (duration: 01m 06s)
  • 19:44 Urbanecm: Morning B&C window done
  • 19:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/modules/homepage/: 81cb1c7: Suggested edits: Export task count from start editing dialog (T266868; T263040) (duration: 01m 07s)
  • 19:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 453b9c6: Fix DiscussionTools wikis config for thwiki/tgwiki (T266303) (duration: 01m 08s)
  • 18:32 razzi: shutting down kafka-jumbo1005 to allow dcops to upgrade NIC
  • 17:52 akosiaris: restart uwsgi-ores in all ores1* nodes per complaint on IRC that max redis clients have been reached T263910
  • 17:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.14
  • 17:48 razzi: shutting down kafka-jumbo1004 to allow dcops to upgrade NIC
  • 17:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
  • 17:41 brennen: train is currently unblocked; rolling to group0 (T263182)
  • 17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages: Backport: language: Clean up $separatorTransformTable in km/la/my (T267091) (duration: 01m 12s)
  • 17:21 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/resources/Resources.php: Backport: mediawiki.action.edit.preview: Add versionCallback to improve startup perf (T266311) (duration: 01m 10s)
  • 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2002.codfw.wmnet
  • 17:14 hnowlan: rebuilding cassandra on maps2002
  • 17:14 jayme: imported kubernetes 1.16.15 to component/kubernetes-future stretch-wikimedia
  • 17:05 hnowlan: restarting maps2004 postgres for config change
  • 17:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 razzi: shutting down kafka-jumbo1003 to allow dcops to upgrade NIC
  • 16:26 razzi: shutting down kafka-jumbo1002 to allow dcops to upgrade NIC
  • 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 15:50 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 15:41 moritzm: installing junit4 security updates
  • 14:55 elukey: shutdown kafka-jumbo1001 to swap NICs (1g -> 10g)
  • 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 jbond42: enable puppet fleet wide to post restart puppetdb
  • 14:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 jbond42: disable puppet fleet wide to restart puppetdb
  • 13:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:52 jbond42: upgrade freetype on jessie
  • 12:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:34 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:09 marostegui: Upgrade mysql on pc2010
  • 11:58 jynus: shutting down db1139 in preparation of maintenance T261405
  • 11:55 marostegui: Upgrade mysql on db1077
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1012 to es1 master, es1011 to es2 master, es1014 to es3 (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13230 and previous config saved to /var/cache/conftool/dbconfig/20201105-114223-marostegui.json
  • 11:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:05 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=dewiki; T246539)
  • 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:55 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:16 godog: grafana-rw.wikimedia.org active and sso-enabled - T262512
  • 09:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13227 and previous config saved to /var/cache/conftool/dbconfig/20201105-094356-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13226 and previous config saved to /var/cache/conftool/dbconfig/20201105-094348-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13225 and previous config saved to /var/cache/conftool/dbconfig/20201105-094336-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13224 and previous config saved to /var/cache/conftool/dbconfig/20201105-092853-root.json
  • 09:28 moritzm: enabling CAS on grafana1002, editing dashboards will be interrupted for a bit
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13223 and previous config saved to /var/cache/conftool/dbconfig/20201105-092845-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13222 and previous config saved to /var/cache/conftool/dbconfig/20201105-092833-root.json
  • 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13219 and previous config saved to /var/cache/conftool/dbconfig/20201105-091350-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13218 and previous config saved to /var/cache/conftool/dbconfig/20201105-091341-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13217 and previous config saved to /var/cache/conftool/dbconfig/20201105-091329-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13216 and previous config saved to /var/cache/conftool/dbconfig/20201105-085846-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13215 and previous config saved to /var/cache/conftool/dbconfig/20201105-085838-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13214 and previous config saved to /var/cache/conftool/dbconfig/20201105-085826-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13213 and previous config saved to /var/cache/conftool/dbconfig/20201105-084343-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13212 and previous config saved to /var/cache/conftool/dbconfig/20201105-084334-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13211 and previous config saved to /var/cache/conftool/dbconfig/20201105-084323-root.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13210 and previous config saved to /var/cache/conftool/dbconfig/20201105-084250-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13209 and previous config saved to /var/cache/conftool/dbconfig/20201105-083304-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13208 and previous config saved to /var/cache/conftool/dbconfig/20201105-083142-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13207 and previous config saved to /var/cache/conftool/dbconfig/20201105-081638-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13206 and previous config saved to /var/cache/conftool/dbconfig/20201105-080135-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1031 on es3 with minimium weight after being cloned from es1017 T261717', diff saved to https://phabricator.wikimedia.org/P13205 and previous config saved to /var/cache/conftool/dbconfig/20201105-075625-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1030 on es2 with minimium weight after being cloned from es1013 T261717', diff saved to https://phabricator.wikimedia.org/P13204 and previous config saved to /var/cache/conftool/dbconfig/20201105-075507-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1029 on es1 with minimium weight after being cloned from es1016 T261717', diff saved to https://phabricator.wikimedia.org/P13203 and previous config saved to /var/cache/conftool/dbconfig/20201105-075358-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13202 and previous config saved to /var/cache/conftool/dbconfig/20201105-074631-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 T267216', diff saved to https://phabricator.wikimedia.org/P13201 and previous config saved to /var/cache/conftool/dbconfig/20201105-072352-marostegui.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 100%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13200 and previous config saved to /var/cache/conftool/dbconfig/20201105-071017-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 100%: After cloning es1030 T261717', diff saved to https://phabricator.wikimedia.org/P13199 and previous config saved to /var/cache/conftool/dbconfig/20201105-070616-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 100%: After cloning es1031 T261717', diff saved to https://phabricator.wikimedia.org/P13198 and previous config saved to /var/cache/conftool/dbconfig/20201105-070610-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 75%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13197 and previous config saved to /var/cache/conftool/dbconfig/20201105-065514-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 75%: After cloning es1030 T261717', diff saved to https://phabricator.wikimedia.org/P13196 and previous config saved to /var/cache/conftool/dbconfig/20201105-065113-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 75%: After cloning es1031 T261717', diff saved to https://phabricator.wikimedia.org/P13195 and previous config saved to /var/cache/conftool/dbconfig/20201105-065107-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 50%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13193 and previous config saved to /var/cache/conftool/dbconfig/20201105-064010-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 50%: After cloning es1030 T261717', diff saved to https://phabricator.wikimedia.org/P13192 and previous config saved to /var/cache/conftool/dbconfig/20201105-063610-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 50%: After cloning es1031 T261717', diff saved to https://phabricator.wikimedia.org/P13191 and previous config saved to /var/cache/conftool/dbconfig/20201105-063603-root.json
  • 06:34 elukey: truncate application_1601916545561_129457's taskmanager.log (~600G) on an-worker1113 due to partition 'e' full
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 25%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13190 and previous config saved to /var/cache/conftool/dbconfig/20201105-062507-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 25%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13189 and previous config saved to /var/cache/conftool/dbconfig/20201105-062454-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 25%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13188 and previous config saved to /var/cache/conftool/dbconfig/20201105-062446-root.json
  • 01:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407] (duration: 00m 08s)
  • 01:56 milimetric@deploy1001: Started deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407]
  • 01:56 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407] (duration: 08m 34s)
  • 01:47 milimetric@deploy1001: Started deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407]

2020-11-04

  • 20:36 Urbanecm: Late B&C Morning window completed, deployment host is clear
  • 20:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee0ba54: Disable the search in header A/B test (T265333) (duration: 01m 06s)
  • 20:33 ejegg: updated payments-wiki from 1ad4ba9639 to 388490e86d
  • 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NewcomerTask event stream to EventGate on testwiki - T259163 (duration: 01m 07s)
  • 20:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 82579bf: Enable wgImagePreconnect on remaining wikis (T123582) (duration: 01m 06s)
  • 20:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d2a5772: Enable DiscussionTools as a beta feature on almost all Wikipedias (T266303) (duration: 01m 07s)
  • 20:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fb5c032: Enable wgCheckUserLogLogins at all wikis but loginwiki (T253802) (duration: 01m 08s)
  • 19:59 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.16 (duration: 62m 44s)
  • 18:57 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.16
  • 18:52 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.10 (duration: 27m 38s)
  • 18:51 Urbanecm: Strip 2FA for Mark83 at SUL (T267257)
  • 18:20 elukey: restart memcached on mc1036 to pick up new settings (see https://gerrit.wikimedia.org/r/639099)
  • 18:15 hknust: holger@mwmaint1002 END - Run updateRestrictions.php
  • 17:44 hknust: holger@mwmaint1002 START - Run updateRestrictions.php
  • 17:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch (duration: 01m 15s)
  • 17:13 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch
  • 17:07 effie: Reimage mc1036 for real this time
  • 16:40 brennen: 1.36.0-wmf.16 was branched at f51ccd2 for T263182
  • 16:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:10 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:39 effie: Reimage mc1036 to buster - T252391
  • 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on all wikis - T259163 (duration: 00m 58s)
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on testwiki - T259163 (duration: 00m 59s)
  • 14:37 jynus: restart mysql at db1133 T266483
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:17 elukey: upload hue 4.8.0-1+deb10u1 to buster-wikimedia
  • 14:15 jynus: restart mysqls at db209[789],db210[01], db2139, db2141 T266483
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:59 jynus: restart mysqls at db1150 T266483
  • 13:54 jynus: restart mysqls at db1145 T266483
  • 13:51 jynus: restart mysqls at db1140 T266483
  • 13:47 jynus: restart mysqls at db1139 T266483
  • 13:43 jynus: restart mysqls at db1116 T266483
  • 13:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jynus: restart mysqls at db1102 T266483
  • 13:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:35 jynus: restart mysqls at db1095 T266483
  • 13:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:50 Lucas_WMDE: EU backport&config done
  • 12:11 Urbanecm: Run scap pull at snapshot1010 manually
  • 12:09 Urbanecm: scap-sync file returned `snapshot1010.eqiad.wmnet returned [255]: Host key verification failed.`
  • 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ed3c43d: Add www.irishstatutebook.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T267193) (duration: 01m 02s)
  • 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:23 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P13185 and previous config saved to /var/cache/conftool/dbconfig/20201104-102341-kormat.json
  • 10:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=fiwiki; T246539)
  • 10:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P13184 and previous config saved to /var/cache/conftool/dbconfig/20201104-101729-kormat.json
  • 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 _joe_: restarting envoyproxy on all of restbase codfw, sending the command in parallel via cumin, to test poolcounter usage by the safe restart scripts
  • 10:05 _joe_: restarting envoyproxy on restbase20{09,10} to test poolcounter usage by the safe restart scripts
  • 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 09:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 09:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:44 moritzm: uploaded freetype 2.5.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13182 and previous config saved to /var/cache/conftool/dbconfig/20201104-080033-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13181 and previous config saved to /var/cache/conftool/dbconfig/20201104-080024-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13180 and previous config saved to /var/cache/conftool/dbconfig/20201104-075953-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13179 and previous config saved to /var/cache/conftool/dbconfig/20201104-074530-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13178 and previous config saved to /var/cache/conftool/dbconfig/20201104-074520-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13177 and previous config saved to /var/cache/conftool/dbconfig/20201104-074449-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13176 and previous config saved to /var/cache/conftool/dbconfig/20201104-073026-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13175 and previous config saved to /var/cache/conftool/dbconfig/20201104-073017-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13174 and previous config saved to /var/cache/conftool/dbconfig/20201104-072946-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13173 and previous config saved to /var/cache/conftool/dbconfig/20201104-071523-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13172 and previous config saved to /var/cache/conftool/dbconfig/20201104-071513-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13171 and previous config saved to /var/cache/conftool/dbconfig/20201104-071443-root.json
  • 07:09 elukey: manual cleanup of mcelog and its wmf-auto-restart (failing) on mw1381 (kernel 4.19, doesn't support mcelog)
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 es1013 es1017 T261717', diff saved to https://phabricator.wikimedia.org/P13170 and previous config saved to /var/cache/conftool/dbconfig/20201104-070121-marostegui.json
  • 07:00 marostegui: Stop mysql on es1016, es1013, es1017 to clone es1029, es1030, es1031 T261717
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13169 and previous config saved to /var/cache/conftool/dbconfig/20201104-070020-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13168 and previous config saved to /var/cache/conftool/dbconfig/20201104-070010-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13167 and previous config saved to /var/cache/conftool/dbconfig/20201104-065939-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 100%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13166 and previous config saved to /var/cache/conftool/dbconfig/20201104-065926-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 100%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13165 and previous config saved to /var/cache/conftool/dbconfig/20201104-065905-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 100%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13164 and previous config saved to /var/cache/conftool/dbconfig/20201104-065849-root.json
  • 06:52 elukey: force start of rasdaemon.service on dumpsdata1002 (its auto-restart unit was failing for it)
  • 06:47 elukey: set an-presto1004's netbox status as "active" (was: failed) after hw maintenance - T253438
  • 06:44 elukey: force restart of uwsgi-ores on ores1005 - daemon down after reload, max client reached error messages in the logs
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 75%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13163 and previous config saved to /var/cache/conftool/dbconfig/20201104-064422-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 75%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13162 and previous config saved to /var/cache/conftool/dbconfig/20201104-064402-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 75%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13161 and previous config saved to /var/cache/conftool/dbconfig/20201104-064345-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1028 with minimum weight after recloning T261717', diff saved to https://phabricator.wikimedia.org/P13160 and previous config saved to /var/cache/conftool/dbconfig/20201104-063028-marostegui.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 50%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13159 and previous config saved to /var/cache/conftool/dbconfig/20201104-062919-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 50%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13158 and previous config saved to /var/cache/conftool/dbconfig/20201104-062858-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 50%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13157 and previous config saved to /var/cache/conftool/dbconfig/20201104-062842-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1027 with minimum weight after recloning T261717', diff saved to https://phabricator.wikimedia.org/P13156 and previous config saved to /var/cache/conftool/dbconfig/20201104-061829-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1026 with minimum weight after recloning T261717', diff saved to https://phabricator.wikimedia.org/P13155 and previous config saved to /var/cache/conftool/dbconfig/20201104-061549-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 25%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13154 and previous config saved to /var/cache/conftool/dbconfig/20201104-061416-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 25%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13153 and previous config saved to /var/cache/conftool/dbconfig/20201104-061355-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 25%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13152 and previous config saved to /var/cache/conftool/dbconfig/20201104-061339-root.json

2020-11-03

  • 22:56 _joe_: repooling mw1346
  • 22:55 _joe_: depooling mw1346
  • 22:49 cdanis: mw1342 restart-php7.2-fpm
  • 22:37 cdanis: repool mw1278 and mw1279
  • 22:35 cdanis: ✔️ cdanis@mw1290.eqiad.wmnet ~ 🕠🍺 sudo restart-php7.2-fpm
  • 22:34 cdanis: restart-php7.2-fpm and pool on mw1276
  • 22:31 cdanis: depool mw1276 and mw1279 also
  • 22:25 cdanis: ✔️ cdanis@mw1278.eqiad.wmnet ~ 🕠🍺 sudo depool
  • 21:16 hashar: Gerrit: triggering java garbage collection # T263008
  • 19:32 gehel: restarting blazegraph on wdqs1007 to reset ban list
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:45 cmjohnson1: shutting elastic1063 down to reseat DIMM T265113
  • 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:13 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:13 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 16:04 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:59 elukey: shutdown kafka-jumbo1006 to replace 1G with 10G nic
  • 15:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:08 moritzm: imported php-redis/xdebug to component/php72 for buster-wikimedia
  • 14:37 moritzm: imported php-apcu-bc/php-igbinary/tideways-xhprof to component/php72 for buster-wikimedia
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:53 moritzm: imported php-mongodb/php-wmerrors/wikidiff2 to component/php72 for buster-wikimedia
  • 13:43 sobanski: Removing db1091 from tendril and zarcillo T267088
  • 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 lsobanski@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:24 lsobanski@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 moritzm: imported php-apcu/php-geoip/php-imagick/php-mailparse to component/php72 for buster-wikimedia
  • 11:57 moritzm: running "reprepro clearvanished" to prune thirdparty/orchestrator
  • 11:51 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: T266985 (duration: 00m 03s)
  • 11:51 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: T266985
  • 11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 11:23 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 11:23 hnowlan: resyncing postgres replica maps1001
  • 11:03 Amir1: rolling restart of ores
  • 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:45 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: T266985 (duration: 00m 07s)
  • 10:45 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: T266985
  • 10:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:22 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: T266985 (duration: 00m 26s)
  • 10:21 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: T266985
  • 10:16 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 02m 15s)
  • 10:14 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
  • 10:13 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 01m 45s)
  • 10:11 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
  • 10:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:57 kormat: uploaded orchestrator 3.2.3-2 to apt
  • 09:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P13139 and previous config saved to /var/cache/conftool/dbconfig/20201103-090523-kormat.json
  • 09:00 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P13138 and previous config saved to /var/cache/conftool/dbconfig/20201103-090013-kormat.json
  • 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 godog: Prometheus re-enable compactions - T261281
  • 06:59 marostegui: Remove db1091 from tendril and zarcillo T267088
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1091 from dbctl T267088', diff saved to https://phabricator.wikimedia.org/P13137 and previous config saved to /var/cache/conftool/dbconfig/20201103-065756-marostegui.json
  • 06:46 marostegui: Deploy schema change on s1 codfw master: T265349
  • 06:16 marostegui: Stop MySQL on es1014 to clone es1028 T261717
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 to reclone es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13136 and previous config saved to /var/cache/conftool/dbconfig/20201103-061423-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1019 to es3 master (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13135 and previous config saved to /var/cache/conftool/dbconfig/20201103-061403-marostegui.json
  • 06:11 marostegui: Stop MySQL on es1012 to clone es1027 T261717
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 to reclone es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13134 and previous config saved to /var/cache/conftool/dbconfig/20201103-060727-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1018 to es1 master (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13133 and previous config saved to /var/cache/conftool/dbconfig/20201103-060705-marostegui.json
  • 06:04 marostegui: Stop MySQL on es1011 to clone es1026 T261717
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 to reclone es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13132 and previous config saved to /var/cache/conftool/dbconfig/20201103-060054-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1015 to es2 master (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13131 and previous config saved to /var/cache/conftool/dbconfig/20201103-060038-marostegui.json
  • 04:39 cstone: civicrm revision changed from cd13d9e30f to b1342c4129
  • 02:13 shdubsh: restart ES on logstash1009 - oom killed
  • 01:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:59 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:40 robh@cumin1001: START - Cookbook sre.hosts.downtime

2020-11-02

  • 22:19 twentyafterfour: restart php7.3-fpm on phab1001
  • 22:03 twentyafterfour: applied 113a244a66 on phab1001 to hotfix T240862
  • 20:22 eileen: process-control config revision is 313a36312f re-enable thank you
  • 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:47 eileen: civicrm revision changed from 3317d30356 to cd13d9e30f, config revision is db912e3bba
  • 19:45 eileen: process-control config revision is db912e3bba - thankyou job off for testing
  • 19:07 Urbanecm: Deployed security fix for T205908
  • 19:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:59 andrewbogott: added dcaro to ops and wmf ldap groups
  • 18:59 mutante: decom'ing testvm1001
  • 18:58 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:14 XioNoX: push new pfw policies - T267051
  • 16:39 ejegg: updated payments-wiki from adc3369cb3 to 1ad4ba9639
  • 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:36 moritzm: imported php-excimer/php-luasandbox to component/php72 for buster-wikimedia
  • 14:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:34 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
  • 14:17 kormat: uploaded orchestrator 3.2.3-1 to apt
  • 14:01 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove $wgExtDistListFile, unused - T266024 (duration: 00m 58s)
  • 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 13:40 elukey: roll restart zookeeper ok an-conf* to pick up new openjdk upgrades
  • 13:40 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 13:03 Lucas_WMDE: EU backport&config window done
  • 13:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/Wikibase: Backport: Revert JS parser commits (T266671) (duration: 01m 09s)
  • 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Response namespace at otrs_wikiwiki to namespaces searched by default (T266917) (duration: 00m 58s)
  • 12:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon, 2/2 (Beta) (duration: 00m 57s)
  • 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon, 1/2 (production) (duration: 01m 02s)
  • 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon (duration: 00m 58s)
  • 12:15 volans: upgraded python3-wmflib to 0.0.4 on cumin[12]001
  • 12:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Fix array depth for properties array (T266835), Beta part (prod no-op) (duration: 00m 58s)
  • 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Fix array depth for properties array (T266835) (duration: 00m 59s)
  • 12:02 volans: uploaded python3-wmflib_0.0.4 to apt.wikimedia.org buster-wikimedia
  • 11:51 effie: disable puppet on thumbor1001 and thumbor1002 to test 636024
  • 11:51 effie: disable thumbor on thumbor1001 and thumbor1002 to test 636024
  • 11:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 11:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:06 godog: upgrade thanos to 0.16.0 on prometheus hosts - T261281
  • 10:59 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:50 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 10:23 moritzm: installing openldap security updates on corp LDAP replicas
  • 08:46 XioNoX: add uRPF strict to ulsfo office links - T266561
  • 08:41 moritzm: installing openldap security updates on LDAP replicas
  • 08:40 godog: upgrade thanos to 0.16 in codfw/eqiad - T261281
  • 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf

2020-11-01

  • 22:41 Urbanecm: mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=metawiki Turkmen # T266976
  • 09:52 ariel@deploy1001: Finished deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run (duration: 00m 04s)
  • 09:52 ariel@deploy1001: Started deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run
  • 09:16 ariel@deploy1001: Finished deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed (duration: 00m 04s)
  • 09:16 ariel@deploy1001: Started deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed
  • 01:26 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:26 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:16 rzl@cumin1001: dbctl commit (dc=all): 'Depool db1091', diff saved to https://phabricator.wikimedia.org/P13124 and previous config saved to /var/cache/conftool/dbconfig/20201101-011600-rzl.json

2020-10-31

  • 00:12 mutante: removed Nuria from wmf group, she is already in nda group (T266086)

2020-10-30

  • 23:35 foks: removing two files for legal compliance
  • 23:32 mutante: adding query.wikidata.org to TLS cert for webserver-misc-apps.discovery.wmnet T266702
  • 23:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:02 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:00 jiji@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:59 mutante: mw1267,mw1268 - scap pull and repool - back to prod - T266164
  • 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
  • 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
  • 20:56 mutante: mw1267,mw1268 - scap pull
  • 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:06 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:04 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:48 cdanis: the above scap began (and mostly finished) several minutes ago but is hanging on a couple hosts down for maintenance
  • 18:48 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: lower frwiki featured feeds limit 1a41ef634 T266865 (duration: 05m 14s)
  • 18:48 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕝☕ scap sync-file wmf-config/InitialiseSettings.php 'lower frwiki featured feeds limit 1a41ef634 T266865'
  • 18:27 hashar@deploy1001: Finished deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index (duration: 00m 06s)
  • 18:27 hashar@deploy1001: Started deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index
  • 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:19 effie: disable puppet on mc1036 and mc2036 - T252391
  • 17:18 effie: enable puppet on all mediawiki and mc* hosts
  • 16:19 elukey: kafka-jumbo1006 still running with 1g nick
  • 15:36 effie: stopping puppet on mediawiki and mc* hosts
  • 15:11 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:09 rzl: downtiming mc2036 for buster reimage
  • 14:42 elukey: stop kafka-jumbo1006 to swap NICs (1g -> 10g, d1 -> d4 rack)
  • 14:14 cmjohnson1: moving mw1267 and mw168 to rack A8 eqiad T266164
  • 12:29 XioNoX: set normal VRRP balancing on cr2-eqiad
  • 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 ladsgroup@deploy1001: Synchronized static/images/project-logos: Revert: Changing logo of Wikidata for the brithday (duration: 01m 12s)
  • 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:07 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 08:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:54 elukey: decom an-tool1006 (old analytics test vm) - T255139
  • 08:53 elukey@cumin1001: START - Cookbook sre.hosts.decommission

2020-10-29

  • 23:59 eileen: process-control config revision is 6891d35bce
  • 23:39 Urbanecm: Evening B&C window done
  • 23:38 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikiquote --add-prefix=BROKEN --fix # T266605 # P13112
  • 23:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ddb7e08: Add namespace aliases to Turkish Wikiquote (T266605) (duration: 00m 57s)
  • 23:36 eileen: process-control config revision is 1114512f90
  • 23:29 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikisource --add-prefix=BROKEN --fix # T266606 # P13111
  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c3a8555: Add namespace aliases to Turkish Wikisource (T266606) (duration: 00m 56s)
  • 23:23 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikibooks --fix # T266608
  • 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1800d11: Add namespace aliases to Turkish Wikibooks (T266608) (duration: 00m 57s)
  • 23:22 eileen: civicrm revision changed from e1d65b0f3a to 3317d30356, config revision is d70fe02cb9
  • 23:18 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwiktionary --fix # T266609
  • 23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 090f757: Add namespace aliases to Turkish Wiktionary (T266609) (duration: 00m 58s)
  • 22:35 mutante: mw1268 - depooled for T266164
  • 22:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
  • 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:32 mutante: mw1269 rsyncd/ferm for scap proxy was enabled - mw1268 rsyncd/ferm for scan proxy was removed - deploy1001 scap-proxies dsh group was adjusted
  • 22:21 mutante: replacing scap proxy for rack A7 eqiad because mw1268 needs to move physically (T266164)
  • 22:21 bstorm: updated packages for thirdparty/kubeadm-k8s-1-17 to prepare for install T263284
  • 22:10 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:08 razzi@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:06 mutante: depooled mw1267 (T266164)
  • 22:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
  • 22:04 mutante: scandium - puppet disabled again (but only until tomorrow), downtimed in Icinga, for ongoing parsoid tests from testreduce1001
  • 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:23 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:06 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:31 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:31 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session on mwmaint1002 (wiki=ukwiki; T246539)
  • 19:13 Amir1: rolling restart of ores uwsgi
  • 19:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:16 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikiLove on hewikiquote (T266744) (duration: 00m 57s)
  • 18:09 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:07 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 18:07 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:06 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 18:06 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:06 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
  • 18:05 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikiquote wikilove # T266744
  • 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b7eaaab: [cswiki] Set wgGEHomepageManualAssignmentMentorsList to Wikipedie:Potřebuji pomoc/Mentoři/Manuální (T245639) (duration: 00m 57s)
  • 17:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 17:29 hashar: Restarted CI Jenkins a bit ago
  • 17:15 hashar: CI: killed all java agents (java upgrade)
  • 17:12 hashar: Stopping CI Jenkins
  • 16:59 XioNoX: Delete cr1-eqiad:ae2.1120 and related static routes - T265288
  • 16:46 _joe_: restarted kartotherian on all servers in eqiad at the same time
  • 16:38 XioNoX: Move cr2-eqiad:ae2.1120 to cloudsw1-d5:irb.1120 - T265288
  • 16:34 XioNoX: force VRRP master on cr1-eqiad - T265288
  • 16:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
  • 16:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
  • 15:34 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: switch restbase to use envoy, https (duration: 00m 57s)
  • 15:22 moritzm: installing bacula updates from Buster point release
  • 15:22 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/intersection/: 483c3bc: Attempt to add a query cache to DPL (T263220) (duration: 00m 58s)
  • 15:16 papaul: poweroff mc2029 for relocation
  • 15:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 19c5aff: Set wgDLPQueryCacheTime to 120 at all wikis (T263220) (duration: 00m 59s)
  • 15:09 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase to use envoy, https (duration: 00m 57s)
  • 15:06 vgutierrez: rolling restart of ATS to upgrade to trafficserver 8.0.8-1wm3 - T265911
  • 14:59 papaul: poweroff sessionstore2002 for relocation
  • 14:36 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:35 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:33 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:29 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:24 elukey: restart zookeeper on an-conf1001 for openjdk upgrades
  • 14:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:08 godog: bump FS for prometheus codfw global instance
  • 13:54 elukey: roll out profile::java on all zookeeper instances
  • 13:53 moritzm: installing Java 11 security updates
  • 13:52 bblack: authdns1001 - restart gdnsd - T266746
  • 13:46 bblack: authdns2001 - restart gdnsd - T266746
  • 13:38 bblack: staggered restart of gdnsd on dns[12345]001 (1/2 recursors in each DC) - T266746
  • 13:29 bblack: staggered restart of gdnsd on dns[12345]002 (1/2 recursors in each DC) - T266746
  • 13:25 Urbanecm: Correction: Obviously 1002 (T246539)
  • 13:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=idwiki; T246539)
  • 13:21 moritzm: installing bluez security updates on stretch
  • 12:56 marostegui: Make orchestrator discover pc2 T266485
  • 12:55 marostegui: Deploy orchestrator grants on pc2 T266485
  • 12:44 marostegui: Deploy grants for cluster alias on pc1 T266485
  • 12:35 moritzm: upgrade idp-test* hosts to latest Java securiy updates
  • 12:35 moritzm: restart idp-test
  • 12:34 ariel@deploy1001: Finished deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables (duration: 00m 05s)
  • 12:33 ariel@deploy1001: Started deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables
  • 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 11:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 11:14 Urbanecm: EU B&C window done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 28152b7: Add another SDC property to search for matching media statements (T264925) (duration: 00m 58s)
  • 11:11 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:07 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:07 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:06 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:06 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:12 elukey: restart tilerator on maps100[1,4] - redis errors in the logs
  • 10:11 elukey: restart tilerator on maps1002 - redis errors in the logs
  • 10:03 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:03 elukey: drop 10.64.21.6/24 and 2620:0:861:105:10:64:21:6/64 from netbox (an-tool-ui1001 related records)
  • 09:59 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Fix cxserver's configuration to use envoy (duration: 00m 59s)
  • 09:52 elukey: add gdnsd.service to all gdnsd hosts (with LimitNOFILE=infinity as override) - no daemon restart done - T266746
  • 09:41 marostegui: Deploy schema change on s8 wikidata codfw master (db2079) T264109
  • 09:33 elukey: clean up 10.64.21.7/24 and 2620:0:861:105:10:64:21:7/64 from netbox (an-test-ui1001 already have ips previously allocated by makevm)
  • 09:32 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 09:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:54 vgutierrez: turn off ECDHE-ECDSA-AES128-SHA support on the main caching cluster - T258405
  • 08:54 moritzm: fixing up stray jenkins auto restart timers on secondary releases server
  • 08:53 vgutierrez: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 T266567 T264398
  • 08:48 moritzm: fixing up stray mcelog auto restart timers on kubestage*
  • 08:38 moritzm: fixing up stray cas auto restart timers on secondary IDP servers
  • 08:19 moritzm: fixing up stray pmacctd auto restart timers on netflow*
  • 08:19 moritzm: fixing up stray pcacctd auto restart timers on netflow*
  • 08:02 marostegui: Disconnect replication codfw -> eqiad on s1 T266663
  • 07:56 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns1001
  • 07:54 marostegui: Disconnect replication codfw -> eqiad on s4 T266663
  • 07:50 vgutierrez: restart haproxy on authdns2001
  • 07:49 marostegui: Disconnect replication codfw -> eqiad on s8 T266663
  • 07:48 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 07:46 marostegui: Disconnect replication codfw -> eqiad on s3 T266663
  • 07:43 vgutierrez: restart anycast-healthchecker on authdns2001
  • 07:34 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns2001
  • 07:27 elukey: "sudo truncate -s 10g /var/log/daemon.log" on authdns2001
  • 06:52 marostegui: Disconnect replication codfw -> eqiad on s2 T266663
  • 06:38 marostegui: Disconnect replication codfw -> eqiad on s7 T266663
  • 06:36 marostegui: Disconnect replication codfw -> eqiad on s6 T266663
  • 06:25 elukey: execute 'truncate -s 10g /var/log/syslog.1 on authdns2001 - root partition full
  • 06:23 marostegui: Disconnect replication codfw -> eqiad on s5 T266663
  • 06:10 marostegui: Disconnect replication codfw -> eqiad on es4 and es5 T266663
  • 06:07 marostegui: Disconnect replication codfw -> eqiad on x1 T266663
  • 05:58 marostegui: Disconnect replication codfw -> eqiad on pc1, pc2 and pc3 T266663
  • 04:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 01:41 mutante: scandium reimaged a second time after making puppet changes to ensure nodejs/npm is NOT installed anymore (T257906)
  • 01:17 ryankemper: T266492 Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad`
  • 01:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 00:51 ryankemper: Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy
  • 00:14 Amir1: rolling restart of ores
  • 00:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:04 ryankemper: Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 00:03 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 00:03 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 00:02 ryankemper: Following wdqs deploy, https://query.wikidata.org successfully responds to an example query
  • 00:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s)

2020-10-28

  • 23:54 ryankemper: Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet
  • 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
  • 23:52 ryankemper@deploy1001: deploy aborted: 0.3.53 (duration: 00m 00s)
  • 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
  • 22:54 mutante: scandium - scap pull after reinstalling OS
  • 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:41 ryankemper: Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`
  • 21:22 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:22 ladsgroup@deploy1001: Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)
  • 19:56 jgleeson: updated Smashpig from 2246685626 to 09f29c1da5
  • 19:53 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 19:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:36 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 tgr_: Morning deploys done
  • 18:55 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Temporary enable 'editpage' warn logging (T251023) (duration: 00m 57s)
  • 18:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "cirrus: Hardcode more_like to codfw cirrus cluster" (duration: 00m 56s)
  • 18:45 tgr@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Config: Revert "Revert "Increase cirrus morelike pool counter by 20%"" () (duration: 00m 57s)
  • 18:43 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: Suggested edits: Include page ID with task preview data (T266600) (duration: 00m 59s)
  • 18:19 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Removing obsolete license definition (duration: 01m 00s)
  • 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:02 elukey@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 17:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 hnowlan: reimporting OSM data for eqiad
  • 17:24 hnowlan: removing OSM database on maps1004
  • 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
  • 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
  • 16:18 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet
  • 16:16 hnowlan: Disabling tilerator in eqiad
  • 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:06 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:05 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:03 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:51 Amir1: restarting uwsgi on ores in eqiad
  • 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 15:23 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 15:10 godog: roll restart logstash5 in codfw
  • 14:50 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 14:05 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 12:39 moritzm: installing libdatetime-timezone-perl updates
  • 11:46 XioNoX: configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - T266561
  • 10:39 ema: due to T266651, cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 T266567 T264398
  • 10:38 elukey: clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - T266648
  • 10:35 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:25 ema: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 T266567 T264398
  • 10:20 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:26 jayme: imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia
  • 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:37 jynus: updated dump grants on db2093
  • 07:53 volans: upgraded python3-wmflib to 0.0.3 on the cumin hosts - T257905
  • 07:40 godog: update thanos-fe1002 to thanos 0.16.0 - T261281
  • 07:22 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 04:43 ryankemper: T266492 Finished rolling restart of codfw cirrus cluster
  • 04:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 02:58 ryankemper: T266492 Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw`
  • 02:57 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
  • 02:12 eileen: tools revision changed from a2a91d6c6a to 087a596d3a
  • 00:40 eileen: civicrm revision changed from 4fdfb8408b to e1d65b0f3a, config revision is f16003ab62

2020-10-27

  • 22:20 mutante: systemctl reset-failed on various servers to see which are coming back later from failed auto_restart and which don't
  • 21:40 mutante: mwmaint2001 - systemctl reset-failed - mediawiki_job_parser_cache_purging.service
  • 20:56 mutante: ms-be1057 is network down but running, NO-CARRIER on NIC, cable disconnected?
  • 20:43 mutante: releases2002 - systemctl reset-failed .. after removing wmf_auto_restart_rsync
  • 20:13 mutante: gerrit1001/gerrit2001: manually deleting list_mediawiki_extensions cron job (T266024)
  • 19:40 eileen: civicrm revision changed from bb7c08bf6d to 4fdfb8408b, config revision is f16003ab62
  • 18:35 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:22 mutante: gerrit1001/2001 - sudo rm /var/www/mediawiki-extensions.txt
  • 17:18 ejegg: updated payments-wiki from 4c1503ad91 to adc3369cb3
  • 16:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:34 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:05 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:42 mepps: updated payments-wiki-staging from 5fdd29bc16 to 4c1503ad91
  • 15:25 ema: cp4032: downgrade varnish to 6.0.4 T264398
  • 15:13 ema: cp4032: varnish-frontend-restart with libvmod-netmapper 1.9-1 T266567
  • 14:55 ema: upload libvmod-netmapper 1.9-1 to buster-wikimedia component/varnish6 T266567
  • 14:49 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:40 _joe_: restarting envoyproxy on the jobrunners in codfw
  • 14:36 akosiaris: rolling restart of all pods in codfw changeprop-jobqueue
  • 14:27 _joe_: restart php-fpm on jobrunners in codfw
  • 14:17 cdanis: ran puppet on alert1001
  • 14:16 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 14:11 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:09 rzl@cumin1001: MediaWiki read-only period ends at: 2020-10-27 14:09:02.873019
  • 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:06 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:06 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:01 rzl@cumin1001: MediaWiki read-only period starts at: 2020-10-27 14:01:54.999830
  • 14:01 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:56 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:56 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:55 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:50 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:49 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:46 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 13:04 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 13:01 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 12:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 12:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 12:51 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:14 ema: A:cp remove libvarnishapi1, replaced by libvarnishapi2 a while ago T261487
  • 11:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:06 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:54 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:21 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqiad - T265589
  • 10:20 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqsin - T265589
  • 10:19 XioNoX: update policies from-zone production to-zone junos-host on mr1-ulsfo - T265589
  • 10:15 XioNoX: update policies from-zone production to-zone junos-host on mr1-esams - T265589
  • 10:06 XioNoX: update policies from-zone production to-zone junos-host on mr1-codfw - T265589
  • 08:58 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:39 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
  • 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 08:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:15 godog: update thanos-fe2002 to thanos 0.16.0 - T261281
  • 07:35 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 06:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 06:50 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-4
  • 06:42 ryankemper: T263970 Set number of replicas to 2 (from previous value of 1) for all codfw indices matching `apifeatureusage*`, new shards have been assigned without issue

2020-10-26

  • 23:12 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set (T266501) (duration: 01m 00s)
  • 22:30 mutante: netflow5001 - systemctl reset-failed
  • 21:44 rzl: live test of sre.switchdc.mediawiki complete, the foregoing logging noise had no actual production impact
  • 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 21:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 21:41 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 21:37 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-10-26 21:37:17.809596
  • 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 21:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 21:35 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-10-26 21:35:20.837214
  • 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 21:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 21:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 21:32 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 21:32 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 21:31 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:31 rzl: starting a live test of sre.switchdc.mediawiki, which will create some logging noise but no actual production impact
  • 20:54 mutante: scandium rm /usr/local/bin/update_parsoid.sh (gerrit:636494)
  • 20:15 ladsgroup@deploy1001: Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata (T261326) (duration: 06m 53s)
  • 20:08 ladsgroup@deploy1001: Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata (T261326)
  • 19:31 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:29 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:26 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove variant setting override (no-op) (T265556) (duration: 00m 57s)
  • 18:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure $wgBabelCategoryNames on ndswiki (T264990) (duration: 00m 58s)
  • 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www.legislation.gov.uk to $wgCopyUploadsDomains on commonswiki (T265690) (duration: 00m 58s)
  • 18:47 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Make variant D the default, remove variant A (T265372, T265556) (duration: 00m 58s)
  • 18:46 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/vendor/wikimedia/parsoid/: Bump wikimedia/parsoid to v0.13.0-a13, enabling 6-element DSRs (T266285) (duration: 00m 58s)
  • 18:43 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/skins/Vector/: Fix logic in collapsibleTabs code (T71729) (duration: 00m 58s)
  • 18:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wtp2001-wtp2020 from LinterSubmitterWhitelist (T265558) (duration: 00m 59s)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Make variant D the default on all wikis (T265556) (duration: 00m 58s)
  • 17:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 17:48 mutante: an-worker109* - systemctl reset-failed to clear Icinga alerts related to wmf_auto_restart changes
  • 17:45 mutante: releases2002,netmon2001, various other hosts - systemctl reset-failed to clear Icinga alerts related to wmf_auto_restart changes
  • 17:39 krinkle@deploy1001: Synchronized php-1.36.0-wmf.13/resources/src/mediawiki.util/: T265809, I1011f6 (duration: 01m 00s)
  • 16:41 XioNoX: bounce security log on pfw3-eqiad - T263833
  • 16:29 XioNoX: set security-log traceoptions on pfw3-eqiad - T263833
  • 16:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:51 rzl@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|graphoid|kartotherian|mathoid|mobileapps|ores|parsoid|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|termbox|wdqs|wdqs-internal|wikifeeds|zotero,name=eqiad
  • 15:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=eqiad
  • 15:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
  • 15:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
  • 15:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
  • 15:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=termbox,name=eqiad
  • 15:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 15:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 15:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=schema,name=eqiad
  • 15:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
  • 15:08 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase,name=eqiad
  • 15:05 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
  • 15:02 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=eqiad
  • 14:59 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=proton,name=eqiad
  • 14:56 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid,name=eqiad
  • 14:53 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:50 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mobileapps,name=eqiad
  • 14:47 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=eqiad
  • 14:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki (duration: 16m 43s)
  • 14:44 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:41 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=graphoid,name=eqiad
  • 14:38 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
  • 14:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=eqiad
  • 14:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=eqiad
  • 14:30 ppchelko@deploy1001: Started deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki
  • 14:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
  • 14:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
  • 14:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=echostore,name=eqiad
  • 14:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=cxserver,name=eqiad
  • 14:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
  • 14:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=api-gateway,name=eqiad
  • 14:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=apertium,name=eqiad
  • 14:06 rzl@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|graphoid|kartotherian|mathoid|mobileapps|ores|parsoid|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|termbox|wdqs|wdqs-internal|wikifeeds|zotero,name=eqiad
  • 13:48 moritzm: imported cas 6.2.4-1 to apt.wikimedia.org T265857
  • 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bff6b37: Add foto.digitalarkivet.no to wgCopyUploadsDomains whitelist of Wikimedia Commons (T266390) (duration: 01m 14s)
  • 11:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 11:26 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:11 vgutierrez: upgrade trafficserver to 8.0.8-1wm3 on cp4032 - T265911
  • 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 11:02 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:51 vgutierrez: manually reloading nginx on cloudelastic[1005-1006]
  • 10:29 vgutierrez: upload trafficserver 8.0.8-1wm3 to apt.wm.org (buster) - T265911
  • 10:18 godog: roll restart pybal to apply latest configuration
  • 09:51 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-3
  • 09:31 moritzm: restarting PHP FPM on mw canaries to pick up freetype update
  • 09:04 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:58 moritzm: installing freetype security updates for stretch
  • 08:57 XioNoX: remove down sessions to AS38758
  • 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:43 XioNoX: remove down sessions to AS8560
  • 08:41 XioNoX: remove down sessions to AS31334
  • 08:28 XioNoX: remove down sessions to AS6327
  • 08:27 XioNoX: remove down sessions to AS8674
  • 08:25 XioNoX: remove down sessions to AS24429
  • 08:21 XioNoX: remove down sessions to AS16509
  • 06:59 _joe_: rolling restart of php7.2-fpm on the codfw jobrunners, to reduce the number of dangling transcodes after restarting cp-jobqueue for a deploy
  • 06:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 06:16 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=jobrunner,dc=codfw,name=mw224.*
  • 06:15 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=videoscaler,dc=codfw,name=mw228.*
  • 06:10 marostegui: Warm up tables T261914

2020-10-25

  • 15:53 dwisehaupt: kernel upgrade and reboot for frdb1003
  • 15:50 dwisehaupt: kernel upgrade and reboot for fran1001

2020-10-23

  • 22:56 mutante: added Nuria to "nda" LDAP group - leaving her in "wmf" until the actual last day - shell account remains so no puppet change needed in ldap_only_admins (T266086)
  • 15:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:04 ema: rolling thumbor-instances restart to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/636012/ T266155
  • 12:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 10:57 kormat: uploaded orchestrator v3.2.3 to apt.wikimedia.org buster-wikimedia - T266023 (forgot to log this earlier)
  • 10:56 volans: uploaded python3-wmflib_0.0.3 to apt.wikimedia.org buster-wikimedia - T257905
  • 10:09 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-2
  • 09:51 moritzm: masking slapd on the old Stretch replicas to uncover potential direct access outside of the LVSes T264388
  • 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:31 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-1
  • 09:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:09 volans: upgrading spicerack to 0.0.44 on cumin hosts - T257905

2020-10-22

  • 22:42 mutante: ganeti1001 - adding 2 more vcpus to VM testreduce1001 - T257940
  • 22:03 mutante: deploy1002 - armed keyholder, all deployment keys loaded T265963
  • 21:56 mutante: deploy1002 - scap pull and added to mediawiki-installation "dsh" group - will be part of scap trains but just like any appserver (T265963)
  • 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:13 mutante: deploy1002 currently cloning ALL the deployment repos - new setup
  • 18:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:54 mutante: applying deployment_server role to new server deploy1002 - might show up in monitoring but is not prod yet, deploy1001 still is
  • 18:34 mutante: adding mcrouter cert for deploy1002.eqiad.wmnet T265963
  • 18:12 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Expand to group1 (T123582) (duration: 00m 56s)
  • 18:12 volans: cumin 'A:dns-rec' 'rec_control wipe-cache wikimedia.org$' - T258729
  • 18:07 chaomodus: Updating eqiad public network DNS to automation
  • 17:50 volans: cumin 'A:dns-rec' 'rec_control wipe-cache eqiad.wmnet$' - T258729
  • 17:49 elukey: add thirdparty/bigtop14 to buster-wikimedia
  • 17:46 chaomodus: Updating eqiad private network DNS to automation
  • 17:21 bd808@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 17:21 bd808@cumin1001: Added views for new wiki: smnwiki T264900
  • 17:07 bd808@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:56 moritzm: installing remaining mariadb-10.3 updates for buster (as packaged in Debian, not the wmf-mariadb package)
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:13 andrewbogott: upgrading mariadb on cloudcontrol1003, 1004, 1005
  • 14:05 ottomata: bump camus version to wmf12 for all camus jobs. should be no-op now. - T251609
  • 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for all eventgate-analytics-external bound streams - T251609 (duration: 01m 02s)
  • 13:55 moritzm: depooling ldap-eqiad-replica01/ldap-eqiad-replica02 T264388
  • 13:41 moritzm: pooling ldap-replica1001/1002 T264388
  • 13:10 moritzm: depooling ldap-replica2001/2002 T264388
  • 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.14
  • 13:01 moritzm: pooling ldap-replica2004 T264388
  • 12:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for 3 eventgate-analytics bound streams - T251609 (duration: 01m 05s)
  • 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 52ad2d4: Do not log logins at loginwiki via CU (T253802) (duration: 01m 06s)
  • 12:03 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
  • 11:59 Lucas_WMDE: EU backport&config window done
  • 11:58 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable propagatePageDeletion on Test Wikidata, 2/2 (duration: 01m 04s)
  • 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable propagatePageDeletion on Test Wikidata, 1/2 (duration: 01m 02s)
  • 11:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=huwiki; T246539)
  • 11:39 moritzm: restarting nginx on acmechief*, debmonitor*, schema*, puppetdb* to pick up freetype update
  • 11:38 marostegui: Compare s1-s8 tables - T261914
  • 11:33 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: Config: Add ary, avk, awa, lld, shy and smn to InterwikiSortOrders.php (duration: 01m 08s)
  • 11:31 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 moritzm: restarting apache and smokeping* on netmon* to pick up freetype update
  • 11:21 moritzm: correction: installing freetype security updates for buster (stretch TBD)
  • 10:43 moritzm: installing freetype security updates for stretch/buster
  • 10:33 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:27 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:38 arturo: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 change to network data yaml
  • 08:31 kormat: enabling replication from eqiad to codfw T261914
  • 08:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:52 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 03:37 eileen: civicrm revision changed from 4dce7bf535 to bb7c08bf6d, config revision is 9a522d03dd
  • 03:13 eileen: civicrm revision changed from 3c3dcf80ae to 4dce7bf535, config revision is 9a522d03dd
  • 01:12 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@870829c]: 0.3.52 (duration: 09m 07s)
  • 01:04 ryankemper: Tests passing on canary `wdqs1003`, proceeding with wdqs deploy for rest of fleet
  • 01:03 ryankemper@deploy1001: Started deploy [wdqs/wdqs@870829c]: 0.3.52

2020-10-21

  • 23:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: T266033 (duration: 01m 05s)
  • 23:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: T265751 T265754 (duration: 01m 08s)
  • 21:38 mutante: testreduce1001 assigned 2 more GBs of RAM - rebooting (T257940, T257906)
  • 19:44 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T264963)
  • 19:15 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T264963)
  • 18:13 Urbanecm: Morning B&C window done
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 45312d3: [WikibaseMediaInfo] Fix concept chips array nesting structure (T256431) (duration: 01m 05s)
  • 18:12 mepps: updated payments-wiki-staging from db03677b2d to 5fdd29bc16
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d94e33f: cirrus: Hardcode more_like to codfw cirrus cluster (duration: 01m 05s)
  • 17:56 XioNoX: configure FB PNI in eqdfw
  • 17:43 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.14/skins/WikimediaApiPortal: Backport gerrit:635329, T266021 (duration: 01m 06s)
  • 17:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch ParserCache to JSON on testwiki gerrit:635382 (duration: 01m 05s)
  • 17:24 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 08s)
  • 17:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 06s)
  • 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 mutante: scandium - disabling puppet so that Parsoid team can make some tests on testreduce1001 today
  • 16:46 effie: restart php-fpm and pool mw2252 and mw2328
  • 15:58 Lucas_WMDE: Deployed patch for T260349
  • 15:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:31 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:28 moritzm: updating prometheus-openldap-exporter to 0+git20171128-3 to buster-wikimedia
  • 15:23 jbond42: upgrade puppetlabs-stdlib to 6.5.0 https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278
  • 15:08 moritzm: imported prometheus-openldap-exporter 0+git20171128-3 to buster-wikimedia T264388
  • 15:02 otto@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster (duration: 02m 56s)
  • 15:01 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 otto@deploy1001: Started deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster
  • 14:56 crusnov@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler T242554 (duration: 01m 07s)
  • 14:34 dcausse: restarting blazegraph on codfw servers (T263952)
  • 13:21 moritzm: pooling ldap-replica2003 T264388
  • 13:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.14 (duration: 01m 04s)
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.14
  • 11:40 matthiasmullie: EU B&C done
  • 11:33 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [WikibaseMediaInfo] Add config for related terms API (duration: 01m 04s)
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 785404f: Disable registrations stat on Special:TranslationStats (T264158) (duration: 01m 05s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1156742: Enable ContentTranslation in 5 Wikipedias as a default tool (T264737; T264738; T264739; T264740; T264741) (duration: 01m 30s)
  • 11:00 marostegui: Upgrade db2093's mariadb version T266003
  • 10:58 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=rowiki; T246539)
  • 10:37 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; T246539)
  • 10:01 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; T246539)
  • 10:00 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; T246539)
  • 09:59 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 100% - T258405
  • 09:42 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; T246539)
  • 09:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; T246539)
  • 09:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; T246539)
  • 09:37 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; T246539
  • 09:30 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; T246539)
  • 09:23 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:21 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:52 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; T246539)
  • 08:50 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; T246539
  • 08:46 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # T246539
  • 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:38 root@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:33 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:10 XioNoX: Upgrade Routinator 3000 to 0.8.0 on rpki1001 - T266001
  • 08:09 XioNoX: add Routinator 3000 0.8.0 to apt - T266001
  • 07:58 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/635319
  • 04:35 ryankemper: re-enabled icinga notifications on all wdqs hosts now that `wdqs-updater` is healthy

2020-10-20

  • 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
  • 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
  • 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
  • 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
  • 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
  • 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
  • 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
  • 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
  • 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:48 effie: depooling mw2328 - T266052
  • 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args (duration: 01m 31s)
  • 15:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args
  • 15:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: fee2d3b: Prevent uncaught warnings/exception on Special:AbuseFilter (T265994) (duration: 01m 03s)
  • 14:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: 00ef00f: Prevent uncaught warnings/exception on Special:AbuseFilter (T265994) (duration: 01m 01s)
  • 14:48 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/FileImporter/: 5eee9b7: Set originalRequest (incl. X-Forwarded-For) for remote edits (T265810) (duration: 01m 06s)
  • 14:16 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/FileImporter/: 5f8d3de: Set originalRequest (incl. X-Forwarded-For) for remote edits (T265810) (duration: 01m 09s)
  • 14:15 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master u=)]$ sudo /usr/local/sbin/fix-staging-perms
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13033 and previous config saved to /var/cache/conftool/dbconfig/20201020-135436-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 80%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13032 and previous config saved to /var/cache/conftool/dbconfig/20201020-133933-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 60%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13031 and previous config saved to /var/cache/conftool/dbconfig/20201020-132430-root.json
  • 13:19 XioNoX: install routinator 3000 0.8.0 on rpki2001 - T266001
  • 13:16 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.14
  • 13:11 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.14 (duration: 58m 03s)
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13030 and previous config saved to /var/cache/conftool/dbconfig/20201020-130926-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13029 and previous config saved to /var/cache/conftool/dbconfig/20201020-125423-root.json
  • 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:13 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.14
  • 11:37 liw: 1.36.0-wmf.14 was branched at 1b7b5f7 for T263180
  • 11:35 Lucas_WMDE: EU backport/config window done
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Backport: SearchSatisfaction: Set isAnon field (T259250) (duration: 00m 57s)
  • 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set Wikidata MF to collapse sections by default (T239195) (duration: 00m 56s)
  • 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove noratelimit from Wikidata bot group (T258354) (duration: 00m 56s)
  • 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 10:04 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 09:59 dcausse: T255399: resuming wdqs-data-reload manually from chunk no 776 on wdqs1009
  • 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .

2020-10-19

  • 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 23:56 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation (duration: 04m 33s)
  • 23:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation
  • 23:02 mutante: etherpad got restarted with new config options related to rate limiting - hopefully this fixed T265490
  • 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:19 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions (duration: 04m 48s)
  • 21:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions
  • 21:01 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:41 eileen: drush vset match_on_import 1
  • 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp2020.codfw.wmnet
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item (duration: 01m 03s)
  • 20:16 mutante: decom'ing wtp201[0-9].codfw.wmnet (pooled=inactive) T265558
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:15 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp201[0-9].codfw.wmnet
  • 20:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item
  • 20:09 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=parsoid,service=canary
  • 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:01 mutante: decom'ing wtp200[1-9].codfw.wmnet (pooled=inactive) T265558
  • 20:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp200[1-9].codfw.wmnet
  • 19:57 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads (duration: 03m 35s)
  • 19:41 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads
  • 19:35 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:34 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:33 mutante: wtp2001 - sudo confctl decommission
  • 19:29 dzahn@cumin1001: conftool action : set/weight=0; selector: dc=codfw,cluster=parsoid,service=canary
  • 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Set default variant to D on trwiki (T243445, T265556) (duration: 00m 56s)
  • 18:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 18902aa: Change votewiki language temporarily to fa for fawiki elections (T262689) (duration: 00m 56s)
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on trwiki (T243445) (duration: 00m 57s)
  • 18:29 tzatziki: removing 10 files for legal compliance
  • 18:24 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/MobileFrontend/: Fix mobile diff redirect when curid parameter is present (T265654) (duration: 00m 58s)
  • 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable variant C/D for new users (T265556) (duration: 00m 56s)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop wgHiddenPrefs hack for VE beta feature (T254349) (duration: 00m 56s)
  • 17:53 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:59 Urbanecm: mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=smnwiki --cluster=all
  • 15:31 elukey: update puppet compilers' facts
  • 14:36 bpirkle@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:634841 Add api.wikimedia.org to the list of allowed CORS origins (duration: 00m 57s)
  • 14:32 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 55s)
  • 14:30 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 56s)
  • 14:15 moritzm: installing llvm-toolchain-7 bugfix updates from Buster point release
  • 13:34 Urbanecm: Start of `[urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist` (T246539; wikis.dblist is medium wikis from group2.dblist)
  • 13:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:31 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:26 moritzm: import prometheus-openldap-exporter 0+git20171128-2+deb10u1 for buster-wikimedia T264388
  • 12:48 moritzm: installing httpcomponents-client security updates on Buster
  • 12:26 Urbanecm: Creation of smnwiki is done (T264859)
  • 12:25 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 56s)
  • 12:22 urbanecm@deploy1001: Synchronized langlist: Creating smnwiki (T264859) (duration: 00m 56s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:16 marostegui: Sanitize smnwiki on db1124:3315 and db2094:3315 - T264900
  • 12:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating smnwiki (T264859) (duration: 00m 56s)
  • 12:15 marostegui: Deploy schema change on smnwiki T265321 T264900
  • 12:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating smnwiki (T264859)
  • 12:12 urbanecm@deploy1001: Synchronized dblists: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:11 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:10 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating smnwiki (T264859) (duration: 00m 56s)
  • 11:51 moritzm: updating idp-test1001 to CAS 6.2.4
  • 11:46 moritzm: updating idp-test2001 to CAS 6.2.4
  • 11:43 Urbanecm: End of `[urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist` # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
  • 11:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` (T246539)
  • 11:40 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
  • 11:31 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:24 Urbanecm: EU B&C window done
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ce92c98: Restore bureaucrat abilities at uzwiki (T265746) (duration: 00m 56s)
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 26b9726: Disable EditorJourney (UnderstandingFirstDay) (T252391) (duration: 01m 10s)
  • 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:13 Urbanecm: Manually run `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` for several small group2 wikis (T246539)
  • 10:57 Urbanecm: Start `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` in a tmux session named updateVarDumps at mwmaint2001 (T246539)
  • 10:53 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # T246539
  • 09:09 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 08:40 jayme: updated helm to 2.16.12-1 on deploy*,chartmuseum*,contint*
  • 08:37 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog2001 - T259780
  • 08:31 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:26 jayme: updated helm to 2.16.12-1 on deploy2001
  • 08:24 jayme: imported helm 2.16.12-1 to buster-wikimedia stretch-wikimedia jessie-wikimedia - T263616
  • 08:01 godog: re-enable compaction for prometheus[12]003 - T261281
  • 07:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 07:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 07:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 ', diff saved to https://phabricator.wikimedia.org/P13022 and previous config saved to /var/cache/conftool/dbconfig/20201019-071614-marostegui.json
  • 06:46 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27 (duration: 00m 10s)
  • 06:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27

2020-10-17

  • 13:22 Urbanecm: [urbanecm@mwmaint2001 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Fæ . # T264529

2020-10-16

  • 21:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:43 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:25 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:39 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:37 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:43 thcipriani: restarting gerrit due to gc thrashing
  • 16:25 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s)
  • 16:21 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors
  • 15:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:36 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 15:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:01 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:41 effie: pooling mw2279.codfw.wmnet T264698
  • 12:11 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:09 jiji@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:35 reedy@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping T265571 (duration: 01m 12s)
  • 09:23 ema: text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 09:19 ema: upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 09:08 ema: upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 09:03 XioNoX: eqsin, push CR 634473
  • 09:01 ema: text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:53 ema: upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 08:52 XioNoX: add BGP_IXP_RS_in to eqsin RS BGP sessions
  • 08:48 ema: text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:29 ema: upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 08:24 ema: text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:09 elukey: reboot stat1005/stat1008 to pick up correct GPU settings
  • 08:09 ema: upload@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 07:59 ema: text@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 07:19 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table (duration: 04m 22s)
  • 07:15 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table
  • 06:57 XioNoX: enable cr2-eqdfw:xe-0/1/2
  • 02:14 eileen: civicrm revision changed from 585eb835d8 to 3c3dcf80ae, config revision is f76d7849bc
  • 01:01 ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`
  • 00:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 00:56 cdanis@cumin1001: START - Cookbook sre.network.cf

2020-10-15

  • 23:49 ryankemper: Began in-place reindex of `eqiad`, `codfw`, and `cloudelastic`. Running on `ryankemper@mwmaint2001` under tmux sessions `inplace_reindex_[eqiad, codfw, cloudelastic]`
  • 23:00 krinkle@deploy1001: Synchronized wmf-config/env.php: I245e84e0b8c (duration: 01m 10s)
  • 22:09 cdanis: previous sre.network.cf invocation was a no-op; just checking status
  • 22:08 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 22:08 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 22:06 mutante: depooled remaining wtp* servers in codfw. old parsoid servers, new servers are parse2* (T265558)
  • 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp2020.codfw.wmnet
  • 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[6-9].codfw.wmnet
  • 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[0-5].codfw.wmnet
  • 20:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:27 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources (duration: 06m 22s)
  • 19:43 marxarelli: all wikis promoted to 1.36.0-wmf.13 (T263179)
  • 19:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources
  • 19:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.13
  • 19:30 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:20 catrope@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing (T265500) (duration: 01m 29s)
  • 19:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing (T265500) (duration: 01m 51s)
  • 19:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/Echo/: Drop text indent in modern Vector (T264339) (duration: 01m 51s)
  • 19:09 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/: Vertically align personal tools (T264339) (duration: 01m 43s)
  • 19:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Revert "clientError: Adds is_logged_in tag to aid filtering" (T256173) (duration: 01m 58s)
  • 19:04 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/UploadWizard/: Work around LESS calculating calc() values wrong (T265560) (duration: 02m 07s)
  • 18:32 mutante: depooling wtp2005 through wtp2009 (parsoid, old server generation) T265558
  • 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[6-9].codfw.wmnet
  • 18:07 mutante: mx1001/mx2001: made previous live hack official and added benefactors@wikipedia alias, re-enabling puppet
  • 17:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:19 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 jbond42: deleteing old pcc reports in compiler1002 to free disk space
  • 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 17:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 16:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 16:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 16:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 16:51 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 16:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 16:46 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 16:11 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/CheckUser/includes/specials/: fd94002: Revert "Validate username input before constructing subpage links" (T265606) (duration: 02m 48s)
  • 15:50 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:35 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:19 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 15:07 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs (duration: 00m 59s)
  • 15:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs
  • 14:51 elukey: roll restart druid-historical daemons on druid1004-1008 to pick up new conn pooling changes
  • 14:51 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 14:45 jbond42: enable puppet post deploy puppetdb change blacklisting dynamic facts
  • 14:41 ema: varnish 6.0.6-1wm2 uploaded to apt.wikimedia.org component/varnish6 T264074
  • 14:38 jbond42: disable puppet to deploy puppetdb change blacklisting dynamic facts
  • 14:21 ema: cp3050: systemctl reload varnishkafka-webrequest.service T264074
  • 14:21 jayme: imported doxygen_1.8.19-1~deb10+wmf1 to component/ci buster-wikimedia - T265579
  • 14:12 ema: cp3050: restart varnishkafka-webrequest w/ libvarnishapi2 6.0.6-1wm2 T264074
  • 14:11 ema: cp3050: upgrade varnish to 6.0.6-1wm2 T264074
  • 14:10 ema: cp3050: upgrade varnish to 6.0.6-1wm2 T26407
  • 12:58 gilles@deploy1001: Finished deploy [performance/navtiming@dff55f8]: (no justification provided) (duration: 00m 05s)
  • 12:58 gilles@deploy1001: Started deploy [performance/navtiming@dff55f8]: (no justification provided)
  • 12:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:47 vgutierrez: restart ats-backend on cp3050
  • 10:00 akosiaris: T264209. Initiate a docker pull of docker-registry.discovery.wmnet/mwcachedir:0.0.1 from all kubernetes and kubernetes staging nodes.
  • 08:17 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 04:27 ryankemper: Rolling upgrade for cirrus `codfw` complete
  • 04:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 02:18 ryankemper: Rolling upgrade for cirrussearch `codfw` beginning
  • 02:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 02:14 ryankemper: Rolling upgrade for cirrussearch `eqiad` is complete
  • 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 00:36 ryankemper: Beginning rolling upgrade for cirrussearch `eqiad`. Cookbook will restart elasticsearch on 36 nodes total, 3 nodes at a time
  • 00:36 eileen: tools revision changed from d4e08c52de to a2a91d6c6a
  • 00:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 00:24 twentyafterfour: phabricator update was uneventful
  • 00:13 twentyafterfour: updating phabricator

2020-10-14

  • 23:35 foks: Removing one further file for legal compliance
  • 23:28 foks: Removing nine files for legal compliance
  • 23:11 ebernhardson: Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day
  • 23:08 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s)
  • 23:00 dwisehaupt: all payments hosts in eqiad are now running the REL1_35 code.
  • 22:41 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s)
  • 22:38 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression
  • 22:13 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
  • 22:12 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
  • 22:08 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s)
  • 22:04 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive
  • 22:01 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: Make attribution source logic more defensive T263599 (duration: 01m 05s)
  • 21:51 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 (T123582) (duration: 01m 03s)
  • 21:33 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: Stylesheet needs to be compatible with cached HTML T265543 (duration: 01m 07s)
  • 20:39 marxarelli: group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: T263179)
  • 20:37 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11
  • 20:32 marxarelli: rolling back group1 due to malformed html in nav menu
  • 19:46 marxarelli: 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates (T263179)
  • 19:39 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
  • 19:38 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
  • 19:33 mutante: mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed
  • 19:14 mutante: depooling 5 of the older parsoid servers in codfw
  • 19:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet
  • 18:28 Urbanecm: wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # T265347
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d6a56bb: Add rollbacker right on uzwiki (T265509) (duration: 01m 04s)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 0da8999: Add spamblacklistlog as a default right for the CU log user (T239288) (duration: 01m 05s)
  • 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:59 elukey: drain + reboot an-worker1100 to pick up GPU settings - T255138
  • 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:29 elukey: drain + reboot an-worker110[1,2] to pick up GPU settings - T255138
  • 15:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:24 jayme: enabled and ran puppet on deploy1001 - T260917
  • 14:56 elukey: drain + reboot an-worker109[8,9] to pick up GPU settings - T255138
  • 14:55 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 14:12 jayme: disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - T260917
  • 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T264209
  • 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T265183
  • 13:53 jbond42: enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs
  • 13:48 jbond42: disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs
  • 12:46 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - T258405
  • 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:43 moritzm: imported php-memcached, php-redis to component/icu63 T264991
  • 11:25 Urbanecm: EU B&C window completed
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c63632d: Enable DiscussionTools as a beta feature on 30 more wikis (T264693) (duration: 01m 15s)
  • 11:16 moritzm: imported php-igbinary, php-apcu-bc to component/icu63 T264991
  • 09:59 moritzm: imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 T264991
  • 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 08:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:09 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json
  • 06:12 marostegui: Change UNIQUE into KEY on enwikivoyage.imagelinks T265445
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json

2020-10-13

  • 23:22 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: Revert removal of variant A (T265372) (duration: 01m 04s)
  • 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Rename GrowthExperiments help desk on ptwiki (T265214) (duration: 01m 04s)
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable event logging in MediaViewer (T260582) (duration: 01m 04s)
  • 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry on frwiki, fawiki, dewiki, cswiki (T264780) (duration: 01m 04s)
  • 21:16 mutante: icinga had gerrit health alert but did not notice an issue myself and was gone next check
  • 21:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:44 mutante: bast1002 - apt-get autoremove - cleans up golang and ruby packages
  • 20:44 mutante: bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)
  • 20:15 ebernhardson: unban elastic2029 from production-search-psi-codfw
  • 20:14 ebernhardson: restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell
  • 20:06 marxarelli: 1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates (T263179)
  • 20:03 ebernhardson: add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell
  • 19:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13
  • 19:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:40 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)
  • 19:00 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.13
  • 18:58 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)
  • 18:56 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)
  • 18:53 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)
  • 18:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11
  • 18:21 marxarelli: 1.36.0-wmf.11 promoted to group1. no new errors (T263177). promoting to all wikis
  • 18:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:09 robh: scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw T238036
  • 18:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:01 robh: scs-c1-codfw firmware update via T238036
  • 17:47 marxarelli: 1.36.0-wmf.13 branched at a6be801 for T263179
  • 17:35 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)
  • 17:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
  • 17:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 marxarelli: 1.36.0-wmf.11 promoted to group0. no new errors (T263177). preparing to promote to group1
  • 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 16:39 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 16:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)
  • 16:26 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc
  • 15:56 papaul: power down ms-be2036 for maintenance
  • 15:02 godog: bounce logstash on logstash1007, GC death
  • 14:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:18 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5b28fd6: Add setmentor to wgAvailableRights (duration: 00m 59s)
  • 13:42 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:40 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:15 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # T265336
  • 13:08 moritzm: imported php-mailparse, php-mongodb, php-msgpack to component/icu63 T264991
  • 12:50 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # T265336
  • 12:49 Urbanecm: End of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix` # T265336
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 for on-site maintenance T263837 ', diff saved to https://phabricator.wikimedia.org/P12975 and previous config saved to /var/cache/conftool/dbconfig/20201013-124940-marostegui.json
  • 12:20 moritzm: imported dh-php, php-acpu, php-imagick to component/icu63 T264991
  • 11:22 moritzm: imported php-defaults, php-excimer, php-luasandbox, php-geoip to component/icu63 T264991
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 90028b4: Add suppressredirect right to reviewers on bnwiki (T265169) (duration: 00m 58s)
  • 11:14 Urbanecm: Start of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix # T265336`
  • 11:13 volans: installed spicerack_0.0.43-1+deb10u1_amd64.deb on cumin2001 , need to wait a long-rnning cookbook to end to upgrade both hosts
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e61fceb: Add namespace aliases for Turkish Wikipedia (T265336) (duration: 00m 59s)
  • 10:47 jayme: no-change rolling restart of push-notifications in codfw - T265258
  • 10:29 volans: upgrading spicerack on cumin2001 to 0.0.44
  • 10:19 ema: cp3050: clear varnishkafka-webrequest's vut->sighup via stap T264074
  • 10:09 ema: cp3050: *reload* varnishkafka-webrequest T264074
  • 10:04 volans: uploaded spicerack_0.0.44 to apt.wikimedia.org buster-wikimedia
  • 09:55 ema: cp3054: systemctl restart varnishkafka-webrequest.service T264074
  • 09:51 ema: cp3052: systemctl restart varnishkafka-webrequest.service T264074
  • 09:39 kormat: running schema change against s1 in eqiad T259831
  • 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:32 ema: cp3050: set grouping by request (vut->g_arg = 2) on varnishkafka-webrequest T264074
  • 08:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:55 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:43 kormat: running schema change against s3 in eqiad T259831
  • 07:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:37 moritzm: installing ruby security updates on stretch
  • 07:02 moritzm: installing PHP 7.0 security updates
  • 06:39 moritzm: Installing httpcomponents-client security updates for Stretch
  • 05:35 marostegui: Set global innodb_change_buffering = inserts; on pc2009 T263443

2020-10-12

  • 17:03 jayme: fixed /var/lock/ permission (1777) on ms-be2036 - T265208
  • 15:41 godog: roll-restart logstash5 in codfw
  • 14:44 _joe_: freed 1.5 GB of space on ms-be2036 by running "apt-get clean"
  • 14:05 moritzm: uploaded php7.2 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 to component/icu63 T264991
  • 12:39 moritzm: installing rails security updates on Stretch
  • 12:26 moritzm: installing spice security updates on Buster
  • 11:38 Urbanecm: EU B&C done
  • 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fff2532: [testwiki, test2wiki] Allow bureaucrats to grant import rights (duration: 00m 58s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4966e8a: Enable wgCheckUserLogLogins at all wikis but few large wikis (T253802) (duration: 00m 58s)
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Require autoconfirmed status to edit Wikidata Properties (T254280) (duration: 01m 00s)
  • 10:26 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:26 hnowlan: roll-restarting restbase201[345678] for cert refresh
  • 08:50 moritzm: uploaded libxml2 2.9.4+dfsg1-2.2+deb9u3+wmf1 to component/icu63 T264991
  • 07:54 godog: reboot ms-be2036 - T265208
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime

2020-10-10

2020-10-09

  • 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Wikidata (T264799) (duration: 00m 59s)
  • 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Commons (T264799) (duration: 00m 59s)
  • 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role (T260271)
  • 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
  • 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group1, except Commons/Wikidata (T264799) (duration: 00m 57s)
  • 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 04s)
  • 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group0 (T264799) (duration: 00m 59s)
  • 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 06s)
  • 22:01 tgr_: rolling out T264799#6533622
  • 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # T263935
  • 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
  • 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
  • 20:04 dwisehaupt: upgrading payments1001 to buster
  • 19:14 dwisehaupt: upgrading payments1002 to buster
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:30 dwisehaupt: upgrading payments1003 to buster
  • 17:53 dwisehaupt: upgrading payments1004 to buster
  • 17:52 cstone: civicrm revision changed from b86a15a430 to 585eb835d8, config revision is 57843925bb
  • 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:45 jayme: helm rollback push-notification in eqiad to revision 8
  • 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:07 XioNoX: remove user from all network devices
  • 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
  • 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:36 moritzm: installing xen security updates for buster (libs only)
  • 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission

2020-10-08

  • 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
  • 23:37 ryankemper: `cloudelastic1005` done
  • 23:31 ryankemper: `cloudelastic1004` done
  • 23:27 ryankemper: `cloudelastic1003` done
  • 23:23 ryankemper: `cloudelastic1002` done
  • 23:16 tgr_: Evening deploys done
  • 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
  • 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes everywhere (T264793) (duration: 01m 01s)
  • 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
  • 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
  • 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
  • 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
  • 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
  • 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
  • 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
  • 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - T264793 (duration: 01m 01s)
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:43 volans: deploying Netbox DNS zone consolidation - T264273
  • 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
  • 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
  • 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide: (duration: 00m 06s)
  • 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
  • 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate by default on production (T264357) (duration: 01m 06s)
  • 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
  • 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
  • 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - T210137
  • 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
  • 16:19 hashar: Restarting CI Jenkins
  • 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:21 marostegui: Set global innodb_change_buffering = all; on pc2009 T263443
  • 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 T264991
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:29 kart_: Updated cxserver to 2020-10-08-053343-production (T264407, T264859)
  • 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
  • 10:37 moritzm: installing Postgres security updates on netboxdb1001
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
  • 10:32 moritzm: installing Postgres security updates on netboxdb2001
  • 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
  • 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 godog: roll-restart swift-object-replicator on ms-be2* - T261633
  • 08:19 kormat: running schema change against s8 in eqiad T259831
  • 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 gehel: repooling wdqs2002
  • 07:55 marostegui: Rebuild db2125 from snapshots - T260670
  • 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
  • 07:40 gehel: depooled wdqs2002 to catch up on lag
  • 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
  • 07:23 moritzm: installing pyzmq updates from Buster point release
  • 07:00 dcausse: depooling wdqs2002 (catching-up lag)
  • 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) T242453
  • 06:51 _joe_: enable notifications for wdqs-ssl-codfw
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:05 ejegg: updated fundraising python tools from 5515923ef7 to d4e08c52de
  • 00:31 tgr_: evening deploys done
  • 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (again, forgot to rebase the previous time) (duration: 00m 59s)
  • 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (duration: 00m 57s)
  • 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group0 (T264793) (duration: 00m 58s)

2020-10-07

  • 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: Log when SessionManager is emitting cookies (T264793) (duration: 01m 00s)
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
  • 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
  • 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in T264859. https://en.wikipedia.org/wiki/Inari_Sami | https://iso639-3.sil.org/code/smn |
  • 18:30 ryankemper: search team's backport deploy is complete
  • 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: cloudelastic: envoy sits in front now (T263073) (duration: 00m 58s)
  • 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'`
  • 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
  • 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
  • 18:20 ryankemper: (backport) HEAD set to 834b457 as expected
  • 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - T261260 (duration: 01m 01s)
  • 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680 on deployment staging area and mw2001
  • 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 jgleeson: updated civicrm from 39b4f954ed to b86a15a430
  • 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo T242602
  • 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - T259780
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 (T263986)
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
  • 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 04s)
  • 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
  • 11:22 Urbanecm: EU B&C window done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f85bc30: Enable bot passwords at all fishbowl and private wikis (T258356) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
  • 11:14 urbanecm@deploy1001: sync-file aborted: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6cdeea2: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia (T264161) (duration: 00m 59s)
  • 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 T263443
  • 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups T259831', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
  • 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
  • 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 T264755 ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
  • 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - T264588
  • 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
  • 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl T264700', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
  • 07:14 marostegui: Stop MySQL es2015 for decommissioning T264700
  • 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 02:37 eileen: civicrm revision changed from a30da7f92a to 39b4f954ed, config revision is 0ca9a3a055
  • 01:00 cdanis: repool esams; cr2-esams router upgrade complete
  • 00:43 cdanis: T259621 cdanis@re1.cr2-esams> request chassis routing-engine master switch
  • 00:40 cdanis: T259621 cdanis@re1.cr2-esams> request system reboot other-routing-engine
  • 00:36 cdanis: T259621 cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
  • 00:26 cdanis: T259621 cdanis@re0.cr2-esams> request chassis routing-engine master switch
  • 00:22 cdanis: T259621 cdanis@re0.cr2-esams> request system reboot other-routing-engine
  • 00:15 cdanis: T259621 cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
  • 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs (T252526)

2020-10-06

  • 23:55 mutante: 🖧 switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* (T252526) 🖧
  • 23:53 mutante: 🖧 switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* (T252526) 🖧
  • 23:52 mutante: 🖧 switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* (T252526) 🖧
  • 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
  • 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
  • 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
  • 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
  • 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:40 Urbanecm: Morning B&C done
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: 2118d26: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 00s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: d428ccb: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 03s)
  • 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 T263493 T259622 (duration: 00m 58s)
  • 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 T263493 T259622 (duration: 00m 59s)
  • 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 T264043 (duration: 00m 59s)
  • 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 T264637 (duration: 00m 58s)
  • 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 T264637 (duration: 00m 58s)
  • 15:41 godog: centrallog* delete archived logs from old, single file, organization
  • 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
  • 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - T263789
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - T262946
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
  • 14:36 hnowlan: repooling restbase2009
  • 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
  • 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
  • 14:08 marostegui: Reboot db1076 for kernel upgrade T264755
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 14:03 marostegui: Power cycle db1076 T264755
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
  • 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
  • 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
  • 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - T264157
  • 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 T263443
  • 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
  • 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
  • 12:08 jbond42: deploy puppetlabs-stdlib 5.2
  • 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # T264430 # P12930
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 07c19f9: arbcom_ruwiki: Set AK as alias for NS_PROJECT (T264430) (duration: 00m 58s)
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b1a4fa: ruewiki: Add rollbacker, grantable and revokable by sysops (T264147) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5cc7027: Allow bureaucrats to remove sysop permissions on Commons (T261481) (duration: 00m 58s)
  • 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5f9721b: GrowthExperiments: Change Help Page URL for kowiki (T254364) (duration: 01m 00s)
  • 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
  • 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:48 effie: set mw2279.codfw.wmnet as inactive T264698
  • 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
  • 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
  • 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
  • 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
  • 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
  • 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
  • 09:59 effie: enable puppet on mc20*
  • 09:41 effie: enable puppet on mc10*
  • 09:38 effie: disable puppet on mc*
  • 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
  • 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
  • 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 T263443
  • 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo T264700 T264386
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 T264700 ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
  • 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl T264386', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json

2020-10-05

  • 23:11 ejegg: updated payments staging from 52704ffe24 to db03677b2d
  • 22:27 mutante: removing shinken puppet module and role
  • 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster T264053
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings T264053
  • 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings T264053
  • 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings T264053
  • 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
  • 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
  • 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
  • 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
  • 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: T263133 T264035 (duration: 22m 23s)
  • 14:25 elukey: shutdown an-master1001 for ram expansion
  • 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: T263133 T264035
  • 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:54 elukey: shutdown stat1005 for ram upgrade
  • 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
  • 12:39 moritzm: installing curl security updates on remaining hosts
  • 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" (T264295) (duration: 00m 59s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: be73f15: Move changetags right from users to sysop [trwiki] (T264508) (duration: 00m 59s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cd30b62: wgSkipSkins: Exclude contenttranslation skin from skin options for users (T263093) (duration: 00m 59s)
  • 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 T264398
  • 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 T264398
  • 10:08 moritzm: installing ldap-replica1002 T264390
  • 09:52 moritzm: installing ldap-replica1001 T264390
  • 09:22 moritzm: installing ldap-replica2003 T264390
  • 09:02 hnowlan: bootstrapping restbase1030-b
  • 08:57 moritzm: installing ldap-replica2004 T264390
  • 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
  • 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
  • 08:23 godog: prometheus codfw/ops, add 100G to the LV
  • 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 07:46 marostegui: Stop mysql on es2017 T264386
  • 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 06:52 XioNoX: add static NAT to pfw3-eqiad - T264356
  • 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 T264386 ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json

2020-10-03

  • 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: 840545f: Restrict flow-hide right to autoconfirmed users on zhwiki (T264489) (duration: 01m 17s)
  • 00:08 ejegg: updated fundraising CiviCRM from 256adda03c to a30da7f92a

2020-10-02

  • 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
  • 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:27 effie: enable puppet on mw2271
  • 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
  • 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
  • 17:15 mutante: submitted puppet refactoring change on maps servers
  • 16:49 effie: disable puppet on mw2271 and briefly depool it
  • 15:39 _joe_: restarting redis on rdb2003, instance 6380
  • 15:28 hnowlan: bootstrapping restbase1030-a
  • 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server T261531 4573776bd 2fb4c20ae (duration: 01m 01s)
  • 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
  • 14:08 effie: enable puppet on mwdebug1001
  • 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
  • 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis (T258356)
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
  • 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
  • 12:26 effie: disable puppet on mwdebug1001
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
  • 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
  • 12:05 hnowlan: bootstrapping restbase1029-c
  • 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
  • 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
  • 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
  • 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
  • 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
  • 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255875 T255869
  • 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875 T255869
  • 09:12 jayme: running puppet on lvs servers - T255875 T255869
  • 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 (T264221)
  • 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:07 hnowlan: bootstrapping restbase1029-b cassandra
  • 09:05 hashar: gerrit: running garbage collector
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
  • 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
  • 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
  • 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 08:29 moritzm: installing pyzmq bugfix update from buster point release
  • 08:24 moritzm: installing nginx security updates on puppetdb*
  • 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
  • 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 07:42 moritzm: installing libcommons-compress-java security updates
  • 07:35 godog: swift codfw-prod bump weight for ms-be2057 - T261633
  • 07:29 godog: prometheus codfw/k8s, add 50G to the LV
  • 07:23 moritzm: installing libx11 security updates on buster
  • 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at T264362
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl T264261', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json

2020-10-01

  • 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
  • 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
  • 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at 7ab9a74 for T264365
  • 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well T264363
  • 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at 796693c for T264365
  • 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
  • 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 refs T263177 (duration: 01m 06s)
  • 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11 refs T263177
  • 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train T264257 T263177 (duration: 00m 59s)
  • 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days T264053 (duration: 00m 59s)
  • 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 13m 42s)
  • 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 01m 34s)
  • 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
  • 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:46 volans: migrating esams DNS records to the autogenerated ones from Netbox - T258729
  • 16:19 bblack: rebooting lvs1016 to a fresh state for interface config and error counters, etc - T264227
  • 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously - T264227
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously
  • 14:55 jayme: running ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889 on lvs1015.eqiad.wmnet - T244843 T255878
  • 14:55 moritzm: installing npm security updates on buster
  • 14:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:53 jayme: running ipvsadm -D -t 10.2.1.10:8081; ipvsadm -D -t 10.2.1.47:8889 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:48 jayme: restarting pybal on lvs2010.codfw.wmnet - T244843 T255878
  • 14:42 jayme: running puppet on lvs servers - T244843 T255878
  • 14:35 Urbanecm: Create bot_passwords table at all private wikis (T258356)
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12886 and previous config saved to /var/cache/conftool/dbconfig/20201001-142156-kormat.json
  • 14:14 andrewbogott: reimaging cloudvirt-wdqs1001 to buster
  • 14:12 effie: enable puppet on mw2271
  • 14:08 moritzm: installing pillow security updates
  • 14:06 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12885 and previous config saved to /var/cache/conftool/dbconfig/20201001-140653-kormat.json
  • 13:59 moritzm: installing nginx security updates on schema*
  • 13:51 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12884 and previous config saved to /var/cache/conftool/dbconfig/20201001-135149-kormat.json
  • 13:50 klausman: rebooting an-worker1096 for cluster maintenance
  • 13:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:43 vgutierrez: use synthetic warning for 2% of ECDHE-ECDSA-AES128-SHA pageviews - T258405
  • 13:29 moritzm: restarting mw canaries to pick up curl update
  • 13:22 moritzm: installing curl security updates on stretch
  • 12:57 kormat@cumin1001: dbctl commit (dc=all): 'db2136 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12883 and previous config saved to /var/cache/conftool/dbconfig/20201001-125707-kormat.json
  • 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12882 and previous config saved to /var/cache/conftool/dbconfig/20201001-123925-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12881 and previous config saved to /var/cache/conftool/dbconfig/20201001-122422-kormat.json
  • 12:15 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 00m 59s)
  • 12:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 01m 00s)
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12880 and previous config saved to /var/cache/conftool/dbconfig/20201001-120919-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12879 and previous config saved to /var/cache/conftool/dbconfig/20201001-115415-kormat.json
  • 11:14 arturo: pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 (T263284)
  • 11:09 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=kuwiktionary --fix # T262046
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 58a8c82: kuwiktionary: Create Jinûvesazî namespace (T262046) (duration: 01m 01s)
  • 10:47 kormat@cumin1001: dbctl commit (dc=all): 'db2119 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12878 and previous config saved to /var/cache/conftool/dbconfig/20201001-104716-kormat.json
  • 10:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:55 hnowlan: adding buster host restbase1028-b to cassandra
  • 08:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:38 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P12877 and previous config saved to /var/cache/conftool/dbconfig/20201001-083321-marostegui.json
  • 08:28 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:27 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:22 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 ', diff saved to https://phabricator.wikimedia.org/P12875 and previous config saved to /var/cache/conftool/dbconfig/20201001-081308-marostegui.json
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P12874 and previous config saved to /var/cache/conftool/dbconfig/20201001-071442-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091 ', diff saved to https://phabricator.wikimedia.org/P12873 and previous config saved to /var/cache/conftool/dbconfig/20201001-071413-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12872 and previous config saved to /var/cache/conftool/dbconfig/20201001-071347-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12871 and previous config saved to /var/cache/conftool/dbconfig/20201001-071321-marostegui.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2083', diff saved to https://phabricator.wikimedia.org/P12870 and previous config saved to /var/cache/conftool/dbconfig/20201001-071241-marostegui.json
  • 07:12 elukey: restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2083', diff saved to https://phabricator.wikimedia.org/P12869 and previous config saved to /var/cache/conftool/dbconfig/20201001-071155-marostegui.json
  • 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Make es2033 master of es2 T261717', diff saved to https://phabricator.wikimedia.org/P12867 and previous config saved to /var/cache/conftool/dbconfig/20201001-063104-marostegui.json
  • 06:18 jayme: imported envoyproxy 1.15.1 to buster-wikimedia, stretch-wikimedia - T264157
  • 05:45 marostegui: Stop MySQL on es2011 T264261
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 T264261', diff saved to https://phabricator.wikimedia.org/P12866 and previous config saved to /var/cache/conftool/dbconfig/20201001-054335-marostegui.json
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Deploy schema change on s3 (testwikidatawiki) T264109
  • 05:19 marostegui: Repool labsdb1011
  • 04:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:27 krinkle@deploy1001: Synchronized php-1.36.0-wmf.10/includes/parser/: Ia3357b2f593c (duration: 00m 58s)
  • 01:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: 1721d2aa0 - Reject ParserCache entries from the last wmf.11 deployment (duration: 05m 13s)

2020-09-30

  • 22:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:46 cdanis: depool mw2356 and mw2319
  • 21:45 eileen: civicrm revision changed from 5a53bfe6ed to 256adda03c, config revision is 646817a2c0
  • 21:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 also
  • 21:19 ejegg: updated fundraising CiviCRM from 6e843649ac to 5a53bfe6ed
  • 21:04 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback
  • 21:00 twentyafterfour@deploy1001: scap failed: average error rate on 5/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 20:58 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 20s)
  • 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
  • 20:47 mutante: temp disabling puppet on C:profile::swift::stats_reporter hosts, applying gerrit:631158 refactoring change
  • 20:36 mutante: temp disabling puppet on swift::storage (swift-be) hosts, applying gerrit:631157 refactoring change
  • 19:21 mutante: activating DHCP and squid on install[345]001.wikimedia.org
  • 19:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 19:01 effie: disable puppet on mw2271 and use onhost memcached - T263958
  • 19:00 hoo@deploy1001: Synchronized wmf-config/: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:58 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on svwiki (T257220) (duration: 00m 58s)
  • 18:36 bblack: lvs1016 pybal diff alerts downtimed in icinga for ~48h to reduce annoying flappy alert spam, with reference to https://phabricator.wikimedia.org/T264227
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki (T225027) (duration: 00m 58s)
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put search in header for anons on all wikis, not just desktop-improvements wikis (T263032) (duration: 00m 59s)
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable clientError on Wikidata and all Wikipedias except enwiki (T255585) (duration: 00m 58s)
  • 18:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move search in header for anons (T263032) (duration: 00m 59s)
  • 17:52 bblack: lvs1016: restart pybal
  • 17:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:01 hnowlan: finished adding restbase2018-a to the cassandra cluster
  • 16:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 cicalese@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Add beta config for API Portal/OAuth communications (duration: 00m 58s)
  • 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:21 mutante: re-enabled puppet on install2003
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:28 moritzm: removed librsvg 2.40.20-3+wmf1+stretch1 from component/thumbor, superseded by 2.40.21-0+deb9u1 released via stretch-security
  • 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 cmjohnson1: powering down ores100[3-9 to upgrade memory in each T259909
  • 14:05 elukey: create thirdparty/amd-rocm33 for stretch-wikimedia
  • 14:03 cmjohnson1: powering down ores1002 to upgrade memory T259909
  • 13:55 cmjohnson1: powering down ores1001 to upgrade memory T259909
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 hnowlan: started bootstrapping restbase1028-a, first buster restbase host
  • 12:39 marostegui: Deploy schema change on db2080, db2081 T264109
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P12858 and previous config saved to /var/cache/conftool/dbconfig/20200930-123851-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P12857 and previous config saved to /var/cache/conftool/dbconfig/20200930-123824-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P12856 and previous config saved to /var/cache/conftool/dbconfig/20200930-123753-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080', diff saved to https://phabricator.wikimedia.org/P12855 and previous config saved to /var/cache/conftool/dbconfig/20200930-123659-marostegui.json
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 effie: enable puppet P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 11:21 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Enable Special:TranslationStats (T263004) (duration: 00m 59s)
  • 11:06 effie: disable puppet on P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 10:57 moritzm: installing librsvg security updates
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:21 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:07 kormat: deploying schema change to s4/eqiad T259831
  • 10:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:50 jayme: imported envoyproxy 1.15.1 to buster-wikimedia component/envoy-future - T264157
  • 09:12 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:45 kormat: deploying schema change to s7/eqiad T259831
  • 08:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2016 from dbctl T264156', diff saved to https://phabricator.wikimedia.org/P12853 and previous config saved to /var/cache/conftool/dbconfig/20200930-080817-marostegui.json
  • 08:06 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:00 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 07:56 akosiaris: upgrade termbox to latest chart, fixing various prometheus-statsd-export configuration minor issues.
  • 07:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 07:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1131 on s6 eqiad master T263227, also give weight to db1093 as new API host', diff saved to https://phabricator.wikimedia.org/P12852 and previous config saved to /var/cache/conftool/dbconfig/20200930-074417-marostegui.json
  • 07:41 marostegui: Starting s6 eqiad failover from db1093 to db1131 - T263227
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T263227', diff saved to https://phabricator.wikimedia.org/P12851 and previous config saved to /var/cache/conftool/dbconfig/20200930-071841-marostegui.json
  • 07:05 marostegui: Stop mysql on es2016 before decommissioning T264156
  • 07:01 elukey@deploy1001: Finished deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 (duration: 00m 49s)
  • 07:00 elukey@deploy1001: Started deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2016 T264156', diff saved to https://phabricator.wikimedia.org/P12850 and previous config saved to /var/cache/conftool/dbconfig/20200930-065838-marostegui.json
  • 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:19 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2082', diff saved to https://phabricator.wikimedia.org/P12849 and previous config saved to /var/cache/conftool/dbconfig/20200930-061036-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P12848 and previous config saved to /var/cache/conftool/dbconfig/20200930-061005-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12847 and previous config saved to /var/cache/conftool/dbconfig/20200930-060754-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12846 and previous config saved to /var/cache/conftool/dbconfig/20200930-060705-marostegui.json
  • 05:43 marostegui: Remove es2019 from tendril and zarcillo T264063
  • 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Reduce busy-time from 3600 to 1800 on labsdb1010
  • 02:30 eileen: process-control config revision is 646817a2c0
  • 00:41 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/: Backport: Ensure variant A homepage sidebar is always at least 300px (T263905) (duration: 01m 01s)

2020-09-29

  • 23:35 mutante: created testvm3001.esams.wmnet to test install3001
  • 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Echo app push on all Wikipedias (T262936) (duration: 00m 59s)
  • 23:20 Urbanecm: Evening B&C window completed
  • 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 68d7af9: Enable watchlist expiry feature (wikisource; T260461) (duration: 00m 58s)
  • 23:18 eileen: process-control config revision is 8b39770e93
  • 23:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bc6dda2: Enable watchlist expiry feature (T260461) (duration: 00m 58s)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:52 eileen: process-control config revision is 16a6dcafd6
  • 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 eileen: civicrm revision changed from 035ad1c351 to 06a5289d1a, config revision is 2622fd2c09
  • 22:45 eileen: process-control config revision is 2622fd2c09 jobs disabled
  • 22:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:26 mutante: phab1001 - re-enabled puppet and running it
  • 22:24 ejegg: CiviCRM rolled back from 4aa0aeccd1 to 035ad1c351
  • 22:16 eileen: civicrm revision changed from 035ad1c351 to 4aa0aeccd1, config revision is b9120969bf
  • 21:59 mutante: temp. disabled puppet on phab1001
  • 21:49 mutante: restarted aphlict service on aphlict1001
  • 21:47 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 13m 45s)
  • 21:34 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 21:30 mutante: started DHCP service on install2003 again
  • 21:22 mutante: temp stopping DHCP service on install2003 for a test
  • 21:09 mutante: rebooting testvm5001 for install test after switching DHCP/TFTP in eqsin to new dedicated VM
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 cdanis@cumin1001: dbctl commit (dc=all): 'depool db2125', diff saved to https://phabricator.wikimedia.org/P12843 and previous config saved to /var/cache/conftool/dbconfig/20200929-205453-cdanis.json
  • 20:51 mutante: DHCP server for EQSIN switched from bast5001 to install5001 (T252526)
  • 20:45 twentyafterfour@deploy1001: Finished scap: testwikis to 1.36.0-wmf.11 refs T263177 (duration: 69m 57s)
  • 19:44 andrewbogott: apt-get update && apt-get upgrade on wikitech-static
  • 19:40 mutante: temp. disabling puppet on ms-fe (swift-proxy) hosts, applying puppet refactoring change carefully
  • 19:35 twentyafterfour@deploy1001: Started scap: testwikis to 1.36.0-wmf.11 refs T263177
  • 19:29 twentyafterfour: Checked out mediawiki 1.36.0-wmf.11 on deploy1001 see T263177
  • 17:30 hnowlan: ported cassandra-tools-wmf to wikimedia-buster
  • 17:12 jbond42: update libdbi-perl on dbmonitor1001 and helium
  • 17:02 jbond42: re-enable puppet to post deploy puppetdb change
  • 16:57 jbond42: disable puppet to deploy puppetdb change
  • 16:34 chaomodus: deploying eqsin automated DNS
  • 15:51 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:39 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:23 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:00 vgutierrez: restarting acme-chief on acmechief1001
  • 14:48 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:41 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:32 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 14:30 bblack: switching eqsin and esams public-facing unified certs to letsencrypt - https://gerrit.wikimedia.org/r/c/operations/puppet/+/630847
  • 14:06 moritzm: installing facter updates from Buster 10.6 point release
  • 13:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:49 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2126 from dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12841 and previous config saved to /var/cache/conftool/dbconfig/20200929-134926-kormat.json
  • 13:47 ema: text@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12840 and previous config saved to /var/cache/conftool/dbconfig/20200929-134018-kormat.json
  • 13:36 ema: upload@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:28 moritzm: installing lua5.3 security updates
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12839 and previous config saved to /var/cache/conftool/dbconfig/20200929-132515-kormat.json
  • 13:10 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12838 and previous config saved to /var/cache/conftool/dbconfig/20200929-131011-kormat.json
  • 12:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 12:55 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12837 and previous config saved to /var/cache/conftool/dbconfig/20200929-125508-kormat.json
  • 12:53 moritzm: installing QT security updates
  • 12:29 kormat@cumin1001: dbctl commit (dc=all): 'db2108 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12836 and previous config saved to /var/cache/conftool/dbconfig/20200929-122914-kormat.json
  • 12:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:28 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db2126 to dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12835 and previous config saved to /var/cache/conftool/dbconfig/20200929-122811-kormat.json
  • 12:05 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:54 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:28 vgutierrez: disabling DHE-RSA-AES128-SHA support - T258405
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12834 and previous config saved to /var/cache/conftool/dbconfig/20200929-111804-root.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12833 and previous config saved to /var/cache/conftool/dbconfig/20200929-110300-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12832 and previous config saved to /var/cache/conftool/dbconfig/20200929-104757-root.json
  • 10:42 XioNoX: re-enable TFTP ALGs on all mr
  • 10:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 moritzm: installing libdbi-perl security updates for stretch/buster
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12831 and previous config saved to /var/cache/conftool/dbconfig/20200929-103253-root.json
  • 10:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1104 on s8 eqiad master T239238', diff saved to https://phabricator.wikimedia.org/P12830 and previous config saved to /var/cache/conftool/dbconfig/20200929-100723-kormat.json
  • 10:05 kormat: Starting s8 eqiad failover from db1109 to db1104 - T239238
  • 10:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 kormat@cumin1001: dbctl commit (dc=all): 'Set db1104 with weight 0 T239238', diff saved to https://phabricator.wikimedia.org/P12829 and previous config saved to /var/cache/conftool/dbconfig/20200929-095135-kormat.json
  • 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:17 marostegui: Depool labsdb1010 from web role
  • 09:08 jbond42: update rails on puppetmasters
  • 08:21 jayme: switching esams pybal back to conf1006 - T196487
  • 08:01 ema: cp3050: varnish upgrade to 6.0.6-1wm1 T263557
  • 07:55 gehel: badblocks check on wdqs1009 - T263125
  • 07:46 marostegui: Stop MySQL on es2019 before decommissioning T264063
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2019 from dbctl T264063', diff saved to https://phabricator.wikimedia.org/P12825 and previous config saved to /var/cache/conftool/dbconfig/20200929-074602-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2019 T264063', diff saved to https://phabricator.wikimedia.org/P12824 and previous config saved to /var/cache/conftool/dbconfig/20200929-060538-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 as es3 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12823 and previous config saved to /var/cache/conftool/dbconfig/20200929-060253-marostegui.json
  • 05:13 marostegui: Stop mysql and reboot es2026 - T263837
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 T263837', diff saved to https://phabricator.wikimedia.org/P12822 and previous config saved to /var/cache/conftool/dbconfig/20200929-051236-marostegui.json
  • 05:10 marostegui: Remove es2013 from tendril and zarcillo T263740
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:32 tgr_: B&C done
  • 00:31 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: Backport: Add (and increment) CacheDecorator cache version ([PHABRICATOR-TASK]) (duration: 00m 58s)
  • 00:09 mutante: TFTP/install server for eqsin switched from bast5001 to install5001 - T252526

2020-09-28

  • 23:56 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T264053: Remove commonswiki from sidebar search (duration: 01m 09s)
  • 23:42 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/ConfigurationLoader/PageConfigurationLoader.php: Backport: Properly handle namespaces in tasktype template configuration (T264029) (duration: 01m 03s)
  • 22:27 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:25 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:24 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:51 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:10 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:12 ejegg: updated staging payments-wiki from 43470629cc to 885d87a905
  • 18:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:15 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:15 Urbanecm: Morning B&C done
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c7e08bc: Enable search in header A/B test for logged in users (T263032) (duration: 00m 58s)
  • 17:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 ejegg: updated payment-wiki from b2eb456ed1 to 2083498811
  • 16:34 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 16:20 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:08 hnowlan: reimaging new restbase hosts - restbase1028, restbase1029, restbase1030
  • 16:08 XioNoX: push pfw policies - T264013
  • 15:51 papaul: poweroff elastic2037 for DIMM replacing
  • 15:26 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1114 T196487', diff saved to https://phabricator.wikimedia.org/P12818 and previous config saved to /var/cache/conftool/dbconfig/20200928-152635-kormat.json
  • 15:25 hashar: Restarting CI Jenkins for plugins uninstallation T260565
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 moritzm: installing glib-networking security updates
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:40 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1006.eqiad.wmnet
  • 14:33 XioNoX: repool eqiad
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:05 moritzm: uploaded libdbi-perl 1.631-3+wmf1 for jessie-wikimedia T259102
  • 13:58 XioNoX: asw2-d-eqiad# run request system power-off member 4
  • 13:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1006.eqiad.wmnet
  • 13:45 XioNoX: downtiming all eqiad row D hosts - T196487
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:38 godog: roll restart object-replicator on ms-be2* for higher concurrency - T261633
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:20 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:19 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 13:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:57 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:37 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:31 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:29 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript resetUserEmail.php --wiki=arbcom_ruwiki 'Adamant.pwn' 'adamant.pwn@hotmail.com' # T262812
  • 12:28 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript createAndPromote.php --wiki=arbcom_ruwiki --bureaucrat --sysop 'Adamant.pwn' <PASSWORD REDACTED> # T262812
  • 12:26 Urbanecm: arbcom_ruwiki is created (T262812)
  • 12:26 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 48s)
  • 12:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:23 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arbcom_ruwiki (T262812)
  • 12:20 urbanecm@deploy1001: Synchronized dblists: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:19 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:17 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:59 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: prep for rack switch upgrade T196487', diff saved to https://phabricator.wikimedia.org/P12815 and previous config saved to /var/cache/conftool/dbconfig/20200928-115904-kormat.json
  • 11:43 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 56s)
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 61eac95: Creation of patroller group on arz.wikipedia (T262218) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 57s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:33 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:25 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 09:48 ema: upload@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:29 ema: text@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:17 _joe_: changing the restbase public TLS certs to include restbase-async.discovery.wmnet
  • 09:17 XioNoX: restart bird on dns2001 - T262372
  • 09:15 jynus: restart db1077 for upgrade and cleanup T187984
  • 09:06 XioNoX: restart bird on centrallog2001 - T262372
  • 09:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:00 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:56 dcausse: T263970: recovering lost apifeature indices (copying eqiad indices -> codfw)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:46 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:37 elukey: decommission the hadoop test cluster (analytics1028->41)
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:36 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:34 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:32 ema: text@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:28 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12813 and previous config saved to /var/cache/conftool/dbconfig/20200928-082825-kormat.json
  • 08:21 ema: upload@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:21 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2113 from contributions/logpager/recentchanges*/watchlist T263842', diff saved to https://phabricator.wikimedia.org/P12812 and previous config saved to /var/cache/conftool/dbconfig/20200928-082114-kormat.json
  • 08:13 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12811 and previous config saved to /var/cache/conftool/dbconfig/20200928-081321-kormat.json
  • 08:07 jayme: restarting pybal on lvs3005 for switching to conf1005 - T196487
  • 08:06 jayme: restarting pybal on lvs3006 for switching to conf1005 - T196487
  • 08:02 jayme: restarting pybal on lvs3007 for switching to conf1005 - T196487
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 07:58 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12810 and previous config saved to /var/cache/conftool/dbconfig/20200928-075817-kormat.json
  • 07:54 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 07:43 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12809 and previous config saved to /var/cache/conftool/dbconfig/20200928-074313-kormat.json
  • 07:29 _joe_: restarting pybal on the LVS primaries
  • 07:24 dcausse: T263970: forcing allocation of enwiki_general_1587198756 (chi@eqiad)
  • 07:18 _joe_: restarting pybal on the backup LVS in eqiad, codfw to pick up the new wikifeeds endpoint
  • 07:17 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
  • 07:09 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2028 as es1 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12806 and previous config saved to /var/cache/conftool/dbconfig/20200928-065938-marostegui.json
  • 06:15 marostegui: Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) T263443
  • 05:55 marostegui: Stop MySQL on es2013 before decommissioning it T263740
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2013 from dbctl T263740', diff saved to https://phabricator.wikimedia.org/P12805 and previous config saved to /var/cache/conftool/dbconfig/20200928-055410-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013 T263740', diff saved to https://phabricator.wikimedia.org/P12804 and previous config saved to /var/cache/conftool/dbconfig/20200928-054846-marostegui.json
  • 05:22 marostegui: Decrease labsdb1011 weight

2020-09-27

  • 06:36 elukey: powercycle analytics1048

2020-09-26

  • 19:20 chrisalbon: sudo service uwsgi-ores restart
  • 02:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
  • 01:56 cdanis: ❌cdanis@cumin2001.codfw.wmnet ~ 🕙🍺 sudo cumin 'A:ores and A:codfw' 'systemctl restart celery-ores-worker.service uwsgi-ores.service '
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ 🕤🍺 sudo systemctl restart uwsgi-ores.service
  • 01:11 cdanis: ✔️ cdanis@ores2001.codfw.wmnet ~ 🕘🍺 sudo systemctl restart celery-ores-worker.service
  • 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2020-09-25

  • 23:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables (duration: 26m 57s)
  • 22:36 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables
  • 22:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity (duration: 10m 42s)
  • food: updated fundraising CiviCRM from eb90dbcfd3 to 035ad1c351
  • 22:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity
  • 21:23 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment (duration: 11m 33s)
  • 21:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment
  • 20:26 effie: installing memcached 1.4.33-1+deb9u1 on mwdebug1001
  • 19:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1 (duration: 53m 58s)
  • 18:40 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1
  • 17:47 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/MobileFrontend/: Backport: Make all section `collapsible` during server side rendering (T263832) (duration: 00m 59s)
  • 17:37 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3 (duration: 02m 01s)
  • 17:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3
  • 16:35 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import (duration: 01m 10s)
  • 16:34 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import
  • 16:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Promote 1.35.0 to stable in extensiondistributor (duration: 00m 57s)
  • 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:23 jynus: fixing enwikivoyage ipblocks inconsistency cluster-wide T263842
  • 14:54 elukey: install linux-image-4.19-amd64 on an-worker1096 + reboot
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Add db2113 to various groups T263842', diff saved to https://phabricator.wikimedia.org/P12797 and previous config saved to /var/cache/conftool/dbconfig/20200925-121332-kormat.json
  • 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:23 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:10 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 10:42 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:28 moritzm: reimaging sretest1002 to validate puppetised sources.list with a new installation T158562
  • 09:58 moritzm: restarting archiva to pick up Java security update
  • 09:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 ema: upload@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 ema: text@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 06:50 elukey: shutdown ganeti5002 (mistakenly powercycled it without seeing T261130)
  • 06:40 elukey: powercycle ganeti5002 (no instances running on it, mgmt console shows no tty usable)
  • 06:34 elukey: reboot stat1004 to pick up kernel settings
  • 03:10 ejegg: updated payments-wiki from f89c594e12 to b2eb456ed1
  • 02:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: new codfw, T263798 (duration: 09m 05s)
  • 02:27 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 00m 07s)
  • 02:27 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 02:20 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: new codfw, T263798
  • 02:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798 (duration: 06m 09s)
  • 02:14 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798

2020-09-24

  • 23:39 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 01m 58s)
  • 23:37 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 21:40 mutante: mw1349 - systemctl reset-failed
  • 21:03 cdanis: reprepro: add backported ipvsadm 1:1.31-1+deb10u1 to buster-wikimedia
  • 21:00 andrew@deploy1001: Finished deploy [horizon/deploy@404e205]: (no justification provided) (duration: 01m 05s)
  • 20:59 andrew@deploy1001: Started deploy [horizon/deploy@404e205]: (no justification provided)
  • 20:41 andrew@deploy1001: Finished deploy [horizon/deploy@24368a5]: (no justification provided) (duration: 02m 10s)
  • 20:39 andrew@deploy1001: Started deploy [horizon/deploy@24368a5]: (no justification provided)
  • 20:35 andrew@deploy1001: Finished deploy [horizon/deploy@85125d1]: (no justification provided) (duration: 00m 52s)
  • 20:34 andrew@deploy1001: Started deploy [horizon/deploy@85125d1]: (no justification provided)
  • 19:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 19:47 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: cloudelastic: envoy sits in front now (duration: 00m 59s)
  • 19:41 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 00m 36s)
  • 19:41 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:39 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 01m 08s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:30 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: dev (duration: 00m 44s)
  • 19:29 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: dev
  • 19:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.10
  • 19:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bcf9fcb: Enable mobile block notice tracking in MobileFrontend (T260218) (duration: 01m 04s)
  • 18:58 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate on itwiki and svwiki (T262436) (duration: 01m 05s)
  • 18:01 mutante: temp. disabled puppet on install4001/install5001 - applying install_server role to new servers, starting with install3001
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:24 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:21 jbond42: enable puppet fleet wide post update puppetdb postgres logging
  • 17:19 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 jbond42: disable puppet fleet wide to update puppetdb postgres loggin
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:09 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:04 mutante: syncing facts to puppet compiler hosts
  • 17:01 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:26 robh: properly pooled mw1360 this time T262151
  • 16:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:04 XioNoX: pfw3-eqiad> restart security-log gracefully
  • 15:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/AbuseFilter/includes/Hooks/AbuseFilterHookRunner.php: 5e88c36: HookRunner: onAbuseFilterGenerateUserVars should run generateUserVars (T263750) (duration: 01m 06s)
  • 15:46 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=simplewiki --username="Oversight~simplewiki"` (T263760)
  • 15:44 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=enwiki --username=Oversight` (T263760)
  • 15:43 Urbanecm: Rename all local Oversight accounts but enwiki to Oversight~dbname, see task for full list (T263760)
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12794 and previous config saved to /var/cache/conftool/dbconfig/20200924-152626-root.json
  • 15:15 robh: mw1360 scap and repooled post work via T262151
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 66%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12793 and previous config saved to /var/cache/conftool/dbconfig/20200924-151120-root.json
  • 15:10 jayme: switched zotero service-proxy listener to use TLS - T255869
  • 15:00 XioNoX: repool eqiad - T256112
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 33%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12792 and previous config saved to /var/cache/conftool/dbconfig/20200924-145617-root.json
  • 14:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:28 XioNoX: [Netops] In window: turn VC-ports on/off for proper cabling: - T256112
  • 14:19 XioNoX: remove damping on anycast group for cr2-codfw
  • 14:18 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255869
  • 14:16 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255869
  • 14:16 XioNoX: [Netops] Disable unused VC ports to not risk them going online at connect: - T256112
  • 14:09 jayme: running puppet on lvs servers - T255869
  • 14:09 cmjohnson1: removing the cable connected to FPC1:1/0 (DAC 3m) FPC8:1/0 (DAC 3m)
  • 13:58 moritzm: upgrading mariadb on cloudcontrol-2001/2003/2004
  • 13:52 XioNoX: depool eqiad for row D recabling - T256112
  • 13:32 ottomata: Increased retention time for *.mediawiki.job.processMediaModeration topics in kafka main-eqiad and main-codfw to 31 days (as per request from Pchelolo )
  • 13:22 elukey: moved the hadoop cluster to puppet TLS certificates - T253957
  • 13:17 XioNoX: add damping to anycast BGP - T262372
  • 12:58 jayme: switched mathoid service-proxy listener to use TLS - T255875
  • 12:50 moritzm: upgrading bird on centtrallog1001
  • 12:43 gehel: restarting wdqs-categories on wdqs1009
  • 12:43 moritzm: installing netty-3.9 security updates
  • 12:42 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 12:30 ema: upload@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:29 godog: swift codfw-prod: rebalance only, no weight change
  • 12:27 kormat: powering off db2125 for maintenance T260670
  • 12:25 moritzm: installing xorg-server security updates
  • 12:09 ema: text@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:02 ema: cp4022: upgrade varnish to 6.0.6-1wm1 T263557
  • 11:40 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/Translate/tag/TPSection.php: fa4900e: Fix validation of translation unit section names (T263546) (duration: 01m 07s)
  • 11:25 jbond42: re-enable puppet fleet wide
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fdab74c: Enable ContentTranslation in Bashkir, Urdu and Welsh WPs as a default tool (T258504; T260022; T260024) (duration: 01m 05s)
  • 11:21 jbond42: disable puppet fleet wide to reduce log level on puppetdb
  • 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 90c7291: Move DiscussionTools out of beta on arwiki, cswiki, huwiki (T249394); d8553f3: Simplify DiscussionTools config (duration: 01m 11s)
  • 11:06 moritzm: installing imagemagick security updates on stretch
  • 11:02 jbond42: re-enable puppet fleet wide
  • 10:51 jbond42: disable puppet fleet wide to deploy a puppetmaster change
  • 10:49 moritzm: installing libproxy security updates
  • 10:23 volans: uploaded python3-wmflib_0.0.2 to apt.wikimedia.org buster-wikimedia
  • 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12789 and previous config saved to /var/cache/conftool/dbconfig/20200924-102025-kormat.json
  • 10:05 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12788 and previous config saved to /var/cache/conftool/dbconfig/20200924-100521-kormat.json
  • 10:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12787 and previous config saved to /var/cache/conftool/dbconfig/20200924-095018-kormat.json
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:48 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875
  • 09:46 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875
  • 09:43 jayme: running puppet on lvs servers - T255875
  • 09:35 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12786 and previous config saved to /var/cache/conftool/dbconfig/20200924-093514-kormat.json
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:20 ema: cp4021: repool with varnish 6.0.6-1wm1 T263557
  • 09:19 ema: cp4021: redepool with varnish to 6.0.6-1wm1 T263557
  • 09:14 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12785 and previous config saved to /var/cache/conftool/dbconfig/20200924-091445-kormat.json
  • 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 ema: cp4021: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 09:05 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12784 and previous config saved to /var/cache/conftool/dbconfig/20200924-082443-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12783 and previous config saved to /var/cache/conftool/dbconfig/20200924-082319-root.json
  • 08:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 08:15 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:15 XioNoX: configure vrrp_master_pinning in codfw - T263212
  • 08:10 moritzm: installing mariadb-10.1/mariadb-10.3 updates (packaged version from Debian, not the wmf-mariadb variants we used for mysqld)
  • 08:09 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:08 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 66%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12782 and previous config saved to /var/cache/conftool/dbconfig/20200924-080816-root.json
  • 07:58 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:57 marostegui: Remove es2018 from tendril and zarcillo T263613
  • 07:57 XioNoX: configure vrrp_master_pinning in eqiad - T263212
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 33%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12781 and previous config saved to /var/cache/conftool/dbconfig/20200924-075312-root.json
  • 07:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:49 godog: roll restart logstash codfw, gc death
  • 07:25 XioNoX: push pfw policies - T263674
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Place db2073 into vslow, not api in s4', diff saved to https://phabricator.wikimedia.org/P12780 and previous config saved to /var/cache/conftool/dbconfig/20200924-064018-marostegui.json
  • 06:22 elukey: powercycle elastic2037 (host stuck, no mgmt serial console working, DIMM errors in racadm getsel)
  • 05:57 marostegui: Remove es2012 from tendril and zarcillo T263613
  • 05:41 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2012 and es2018 from dbctl - T263615 T263613', diff saved to https://phabricator.wikimedia.org/P12778 and previous config saved to /var/cache/conftool/dbconfig/20200924-053001-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12777 and previous config saved to /var/cache/conftool/dbconfig/20200924-052207-marostegui.json
  • 01:25 ryankemper: Root cause of sigkill of `elasticsearch_5@production-logstash-eqiad.service` appears to be OOMKill of the java process: `Killed process 1775 (java) total-vm:8016136kB, anon-rss:4888232kB, file-rss:0kB, shmem-rss:0kB`. Service appears to have restarted itself and is healthy again
  • 01:21 ryankemper: Observed that `elasticsearch_5@production-logstash-eqiad.service` is in a `failed` state since `Thu 2020-09-24 00:53:53 UTC`; appears the process received a SIGKILL - not sure why
  • 01:19 ryankemper: Getting `connection refused` when trying to `curl -X GET 'http://localhost:9200/_cluster/health'` on `logstash1009`
  • 01:16 ryankemper: (after) `{"cluster_name":"production-elk7-codfw","status":"green","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":868,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:16 ryankemper: Ran `curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'`, cluster status is green again
  • 01:15 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:14 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0

2020-09-23

  • 23:52 mutante: alert1001 - systemctl restar ircecho because icinga-wm left the chat
  • 23:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cbd77e3: Add new Racine namespace to frwiktionary (T263525) (duration: 01m 05s)
  • 23:44 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:40 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 22382a9: remove wtp2005 from wgLinterSubmitterWhitelist (T257903) (duration: 01m 04s)
  • 23:14 eileen: civicrm revision changed from 32a82aa1b7 to eb90dbcfd3, config revision is 2a55766237
  • 23:13 eileen: civicrm revision is 32a82aa1b7, config revision is 2a55766237
  • 23:10 mutante: ganeti5003 - rebooting install5001 - OS install on 3001/4001/5001 T263684
  • 23:04 mutante: ganeti4003 - rebooting install4001
  • 22:51 mutante: ganeti5003 - rebooting install5001
  • 22:27 mutante: ganeti5003 - gnt-instance start install5001
  • 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:38 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:30 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.10 (duration: 01m 04s)
  • 21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.10
  • 21:24 dancy@deploy1001: Finished scap: (no justification provided) (duration: 42m 52s)
  • 21:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:06 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:57 mepps: updated payments-wiki from 7bb99ce03a to f89c594e12
  • 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 20:42 dancy: dancy@deploy1001 Started scap: Deploying fixes for T263601 and T263675 to 1.36.0-wmf.10
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:41 dancy@deploy1001: Started scap: (no justification provided)
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:36 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:36 eileen: civicrm revision changed from a789afd79b to 32a82aa1b7, config revision is 2a55766237
  • 20:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:30 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 20:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:27 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:22 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:18 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:08 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:06 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:02 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:42 robh: ganeti5002 firmware update before hw testing via T261130
  • 18:57 ryankemper: (Above deploy complete)
  • 18:54 ryankemper: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'` from `ryankemper@deploy1001:/srv/mediawiki-staging`
  • 18:47 ryankemper: Above deploy appears successful, test requests seem to be taking 40ms instead of the previous 140ms
  • 18:31 ryankemper: HEAD of `/srv/mediawiki-staging` is now at 7a96d63 as expected
  • 18:13 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:13 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:12 Urbanecm: urbanecm@deploy1001: scap sync-file wmf-config/InitialiseSettings.php 'b1554f36be68106c9364f4aa2fd70d759ad74356: Set $wgCategoryCollation = uca-tr on trwikiquote (T263628)'
  • 18:11 Urbanecm: Logmsgbot seems to be down
  • 17:29 robh: migrating ganeti instances off ganeti5002 for troubleshooting per T261130
  • 16:37 sukhe: upload dnsdist_1.4.0-1~deb10u2 to apt.wm.o (buster) - T252132
  • 16:00 herron: switching icinga over from icinga1001 to alert1001 T247966
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2088:3312 from api now that db2104/db2126 are done T259831', diff saved to https://phabricator.wikimedia.org/P12775 and previous config saved to /var/cache/conftool/dbconfig/20200923-160010-kormat.json
  • 15:58 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12774 and previous config saved to /var/cache/conftool/dbconfig/20200923-155819-kormat.json
  • 15:57 robh: updating firmware on mw1360, troubleshooting nic failure issue T262151
  • 15:57 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialBlock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 07s)
  • 15:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialUnblock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 08s)
  • 15:53 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:51 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:48 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:45 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:44 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:43 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12773 and previous config saved to /var/cache/conftool/dbconfig/20200923-154315-kormat.json
  • 15:40 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:37 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:28 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12772 and previous config saved to /var/cache/conftool/dbconfig/20200923-152812-kormat.json
  • 15:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:13 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12771 and previous config saved to /var/cache/conftool/dbconfig/20200923-151308-kormat.json
  • 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 kormat@cumin1001: dbctl commit (dc=all): 'db2126 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12770 and previous config saved to /var/cache/conftool/dbconfig/20200923-144441-kormat.json
  • 14:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 herron: grew prometheus1004 prometheus-ops filesystem to 1.6T
  • 14:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable repo config propagateChangeVisibility everywhere, 2/2 (duration: 01m 06s)
  • 14:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable repo config propagateChangeVisibility everywhere, 1/2 (duration: 01m 06s)
  • 13:50 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12769 and previous config saved to /var/cache/conftool/dbconfig/20200923-135028-kormat.json
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12768 and previous config saved to /var/cache/conftool/dbconfig/20200923-133525-kormat.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 100%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12766 and previous config saved to /var/cache/conftool/dbconfig/20200923-132918-root.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12765 and previous config saved to /var/cache/conftool/dbconfig/20200923-132022-kormat.json
  • 13:20 moritzm: installing ruby-json security updates
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 75%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12764 and previous config saved to /var/cache/conftool/dbconfig/20200923-131414-root.json
  • 13:05 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12763 and previous config saved to /var/cache/conftool/dbconfig/20200923-130518-kormat.json
  • 13:04 moritzm: installing multipath-tools bugfix updates from buster 10.5 point release
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12762 and previous config saved to /var/cache/conftool/dbconfig/20200923-125911-root.json
  • 12:49 moritzm: installing libunwind bugfix updates from buster 10.5 point release
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2104 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12761 and previous config saved to /var/cache/conftool/dbconfig/20200923-123922-kormat.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P12760 and previous config saved to /var/cache/conftool/dbconfig/20200923-123806-marostegui.json
  • 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Add db2088:3312 to api while db2104 gets depooled T259831', diff saved to https://phabricator.wikimedia.org/P12759 and previous config saved to /var/cache/conftool/dbconfig/20200923-123649-kormat.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly db2074 ', diff saved to https://phabricator.wikimedia.org/P12758 and previous config saved to /var/cache/conftool/dbconfig/20200923-123528-root.json
  • 12:22 ema: cp4027: repool with varnish 6.0.6-1wm1 T263557
  • 12:09 ema: cp4027: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 11:52 moritzm: installing GNUTLS bugfix updates from buster 10.5 point release
  • 11:51 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.GrowthTasksApi.js: 73b5ce8: Fix GrowthTasksApi lazy-loading flags for pages with no views (T263611) (duration: 01m 05s)
  • 11:49 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEdit.js: 1ab31a9: Mark pageviews as not used in the mobile postedit notice (T263611) (duration: 01m 06s)
  • 11:38 Urbanecm: Revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629188 and fetch to deploy1001 to unblock EU B&C deployment (T237467; cc twentyafterfour)
  • 11:27 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12756 and previous config saved to /var/cache/conftool/dbconfig/20200923-112712-kormat.json
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12755 and previous config saved to /var/cache/conftool/dbconfig/20200923-111209-kormat.json
  • 11:08 Urbanecm: Create ContentTranslation tables at testwiki using SQL files from `/srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql` (T263417
  • 10:57 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12754 and previous config saved to /var/cache/conftool/dbconfig/20200923-105705-kormat.json
  • 10:42 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12753 and previous config saved to /var/cache/conftool/dbconfig/20200923-104202-kormat.json
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12752 and previous config saved to /var/cache/conftool/dbconfig/20200923-102120-kormat.json
  • 10:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12751 and previous config saved to /var/cache/conftool/dbconfig/20200923-100156-marostegui.json
  • 10:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Configure entityDataCachePaths for Wikibase (duration: 01m 05s)
  • 09:59 elukey: update puppet compiler's facts
  • 09:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 2/2 (production no-op) (duration: 01m 04s)
  • 09:55 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 1/2 (duration: 01m 16s)
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12750 and previous config saved to /var/cache/conftool/dbconfig/20200923-094511-marostegui.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12748 and previous config saved to /var/cache/conftool/dbconfig/20200923-083200-marostegui.json
  • 08:29 moritzm: installing dbus security updates on buster
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12747 and previous config saved to /var/cache/conftool/dbconfig/20200923-080651-marostegui.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12746 and previous config saved to /var/cache/conftool/dbconfig/20200923-071129-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 to re-add change_revision_id index T262856', diff saved to https://phabricator.wikimedia.org/P12745 and previous config saved to /var/cache/conftool/dbconfig/20200923-070926-marostegui.json
  • 06:34 marostegui: Stop MySQL on es2012 and es2018 T263613 T263615
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2018 T263615', diff saved to https://phabricator.wikimedia.org/P12744 and previous config saved to /var/cache/conftool/dbconfig/20200923-063140-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.org/P12743 and previous config saved to /var/cache/conftool/dbconfig/20200923-060812-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index removal T262856', diff saved to https://phabricator.wikimedia.org/P12742 and previous config saved to /var/cache/conftool/dbconfig/20200923-055850-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 T262856', diff saved to https://phabricator.wikimedia.org/P12741 and previous config saved to /var/cache/conftool/dbconfig/20200923-055531-marostegui.json
  • 05:37 marostegui: Purge global_status_log table on tendril - T252331
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:03 marostegui: Remove triggers from db2094:3313 for MCR schema change T238966
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12739 and previous config saved to /var/cache/conftool/dbconfig/20200923-050234-marostegui.json
  • 04:25 eileen: civicrm revision changed from 8f32b6301f to a789afd79b, config revision is 9933605187

2020-09-22

  • 23:27 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: clientError: enable on ja,es,de,ru,it,zh,pt wikipedias (T255585) (duration: 01m 04s)
  • 23:24 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry feature (T261249) (duration: 01m 06s)
  • 21:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:46 ebernhardson: T259539 enabled adaptive replica selection on elasticsearch at search.svc.eqiad.wmnet:9[246]43
  • 20:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:43 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.10
  • 20:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 42m 21s)
  • 20:30 mutante: gerrit2001 (gerrit-replica) restarting gerrit service
  • 19:49 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 19:44 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.5 (duration: 17m 59s)
  • 19:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 16:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:00 robh: running dell epsa test on down host mw1360 per T262151
  • 14:34 moritzm: installing nginx security updates on buster
  • 14:33 shdubsh: restart apache on prometheus nodes to pick up new ext endpoint
  • 14:24 ema: upload libvmod-re2 1.5.3-1 to buster-wikimedia component/varnish6 T261632
  • 14:24 papaul: rebooting ms-be2019
  • 14:15 XioNoX: upgrade FNM on netflow2001 - T257035
  • 14:12 jayme: running ipvsadm -D -t 10.2.1.19:1970; ipvsadm -D -t 10.2.1.21:24766 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:12 jayme: running ipvsadm -D -t 10.2.2.19:1970; ipvsadm -D -t 10.2.2.21:24766 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255868 T255877
  • 14:11 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:10 XioNoX: upgrade FNM on netflow5001 - T257035
  • 14:09 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255868 T255877
  • 14:09 shdubsh: restart statsv on webperf[1-2]001 to route metrics through statsd-exporter
  • 14:09 XioNoX: upgrade FNM on netflow1001 - T257035
  • 14:06 XioNoX: upgrade FNM on netflow3001 - T257035
  • 14:05 jayme: running puppet on lvs servers - T255868 T255877
  • 14:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:02 hnowlan: roll-restarting restbase codfw for java updates
  • 13:59 XioNoX: add fastnetmon_1.1.7 to buster-wikimedia repo - T257035
  • 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:55 ema: upload varnish-modules 0.15.0-1+wmf1 to buster-wikimedia component/varnish6 T261632
  • 13:49 marostegui: Deploy MCR change on db2098:3313 - T238966
  • 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:39 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:35 ema: upload libvmod-netmapper 1.8-1 to buster-wikimedia component/varnish6 T261632
  • 12:54 ema: upload varnishkafka 1.1.0-1 to buster-wikimedia component/varnish6 T261632
  • 12:11 moritzm: installing python3.7 security updates on Buster
  • 12:09 moritzm: installing bundler updates on buster
  • 11:59 Urbanecm: Reset password for SUL User:Freibo
  • 11:58 Lucas_WMDE: EU backport&config window done
  • 11:56 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource --fix | tee T263358.fix # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:55 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource | tee T263358.dryrun # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Create Portal and Portal_talk namespaces on trwikisource, and fix an incorrect alias (T263358) (duration: 00m 57s)
  • 11:47 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Removing Wikipedia store link from enwiki (T262329) (duration: 00m 57s)
  • 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set timezone for wikis of the CWIRP to Europe/Rome (T263123) (duration: 00m 59s)
  • 11:35 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:35 hnowlan: roll-restarting restbase eqiad for java updates
  • 11:25 ema: upload varnish 6.0.6-1wm1 to buster-wikimedia component/varnish6 T261632
  • 11:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:13 moritzm: installing intel-microcode 3.20200616.1 on Buster baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 11:00 moritzm: installing intel-microcode 3.20200616.1 on Stretch baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 10:51 XioNoX: Add policy-options for primary IXPs to all routers - T262517
  • 10:48 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:48 hnowlan: roll-restarting sessionstore for java security updates
  • 10:44 moritzm: installing bacula security updates on stretch
  • 10:33 moritzm: installing remaining libx11 security updates
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12733 and previous config saved to /var/cache/conftool/dbconfig/20200922-101342-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12732 and previous config saved to /var/cache/conftool/dbconfig/20200922-101324-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12731 and previous config saved to /var/cache/conftool/dbconfig/20200922-101308-root.json
  • 10:00 kormat: deploying schema change to s2 in eqiad. labsdb will have s2 lag until this finishes. T259831
  • 09:59 jayme: running ipvsadm -D -t 10.2.1.45:34192; ipvsadm -D -t 10.2.1.42:35192 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:59 jayme: running ipvsadm -D -t 10.2.2.45:34192; ipvsadm -D -t 10.2.2.42:35192 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255873 T255870
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12730 and previous config saved to /var/cache/conftool/dbconfig/20200922-095839-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12729 and previous config saved to /var/cache/conftool/dbconfig/20200922-095821-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12728 and previous config saved to /var/cache/conftool/dbconfig/20200922-095805-root.json
  • 09:57 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:55 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255873 T255870
  • 09:51 jayme: running puppet on lvs servers - T255873 T255870
  • 09:46 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
  • 09:46 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12727 and previous config saved to /var/cache/conftool/dbconfig/20200922-094336-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12726 and previous config saved to /var/cache/conftool/dbconfig/20200922-094317-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12725 and previous config saved to /var/cache/conftool/dbconfig/20200922-094302-root.json
  • 09:30 volans: repooling ulsfo after merging DNS migration to Netbox zonefiles - T258729
  • 09:30 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.uptime (exit_code=0)
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12724 and previous config saved to /var/cache/conftool/dbconfig/20200922-092832-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12723 and previous config saved to /var/cache/conftool/dbconfig/20200922-092814-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12722 and previous config saved to /var/cache/conftool/dbconfig/20200922-092758-root.json
  • 09:26 jbond@cumin1001: START - Cookbook sre.pdus.uptime
  • 09:24 XioNoX: replace BGP_IXP_in with BGP_IXP_PRIMARY_in on cr3-ulsfo IX BGP group - T262517
  • 09:22 XioNoX: add BGP_IXP_PRIMARY_in to cr3-ulsfo - T262517
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12721 and previous config saved to /var/cache/conftool/dbconfig/20200922-091329-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12720 and previous config saved to /var/cache/conftool/dbconfig/20200922-091310-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12719 and previous config saved to /var/cache/conftool/dbconfig/20200922-091255-root.json
  • 09:11 jbond42: update snmp string on ps1-a8-codfw
  • 09:05 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12718 and previous config saved to /var/cache/conftool/dbconfig/20200922-090520-kormat.json
  • 08:58 _joe_: restart pybal on lvs2009
  • 08:56 _joe_: restarting pybal on lvs2010
  • 08:54 _joe_: restarted pybal on lvs1015
  • 08:50 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12717 and previous config saved to /var/cache/conftool/dbconfig/20200922-085017-kormat.json
  • 08:36 _joe_: restarting pybal low-traffic in eqiad to pick up lvs changes
  • 08:35 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12715 and previous config saved to /var/cache/conftool/dbconfig/20200922-083514-kormat.json
  • 08:22 volans: migrating ulsfo public DNS records to the Netbox-generated ones - T258729
  • 08:20 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12714 and previous config saved to /var/cache/conftool/dbconfig/20200922-082010-kormat.json
  • 08:13 kormat: uploaded wmfmariadbpy v0.5 to apt. deploying now to fleet
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2032, es2033 and es2034 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12713 and previous config saved to /var/cache/conftool/dbconfig/20200922-081154-marostegui.json
  • 07:57 volans: migrating ulsfo private DNS records to the Netbox-generated ones - T258729
  • 07:54 kormat@cumin1001: dbctl commit (dc=all): 'db2076 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12712 and previous config saved to /var/cache/conftool/dbconfig/20200922-075429-kormat.json
  • 07:51 jayme: running ipvsadm -D -t 10.2.1.18:8080; ipvsadm -D -t 10.2.1.46:3030 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:49 jayme: running ipvsadm -D -t 10.2.2.18:8080; ipvsadm -D -t 10.2.2.46:3030 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255879 T254581
  • 07:46 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:42 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255879 T254581
  • 07:39 jayme: running puppet on lvs servers - T255879 T254581
  • 07:34 volans: depooling ulsfo to merge DNS migration to Netbox zonefiles - T258729
  • 07:24 marostegui: Stop MySQL on es2014 - host will be decommissioned T262889
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2014 from dbctl T262889', diff saved to https://phabricator.wikimedia.org/P12711 and previous config saved to /var/cache/conftool/dbconfig/20200922-071435-marostegui.json
  • 07:11 XioNoX: cr1-codfw# run clear bfd session address fe80::f27c:c7ff:fe11:2c1b
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 for decommissioning T262889', diff saved to https://phabricator.wikimedia.org/P12710 and previous config saved to /var/cache/conftool/dbconfig/20200922-061815-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 100%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12709 and previous config saved to /var/cache/conftool/dbconfig/20200922-054455-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12708 and previous config saved to /var/cache/conftool/dbconfig/20200922-054438-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12707 and previous config saved to /var/cache/conftool/dbconfig/20200922-054430-root.json
  • 05:41 marostegui: Log remove triggers on revision table on db1124:3313 T238966
  • 05:39 marostegui: Deploy MCR schema change on s3 eqiad, this will generate lag on s3 on labsdb T238966
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2032, es2033 and es2034 into dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12706 and previous config saved to /var/cache/conftool/dbconfig/20200922-053346-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 75%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12705 and previous config saved to /var/cache/conftool/dbconfig/20200922-052951-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12704 and previous config saved to /var/cache/conftool/dbconfig/20200922-052935-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12703 and previous config saved to /var/cache/conftool/dbconfig/20200922-052926-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 50%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12702 and previous config saved to /var/cache/conftool/dbconfig/20200922-051448-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12701 and previous config saved to /var/cache/conftool/dbconfig/20200922-051431-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12700 and previous config saved to /var/cache/conftool/dbconfig/20200922-051423-root.json
  • 05:00 marostegui: Add es2032 es2033 and es2034 to tendril and zarcillo T261717
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 25%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12699 and previous config saved to /var/cache/conftool/dbconfig/20200922-045944-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12698 and previous config saved to /var/cache/conftool/dbconfig/20200922-045928-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12697 and previous config saved to /var/cache/conftool/dbconfig/20200922-045919-root.json
  • 01:35 ryankemper: `sudo cumin C:profile::services_proxy::envoy 'enable-puppet "adding cloudelastic to the service proxy --rkemper"'` done
  • 01:35 ryankemper: woot! `curl -X GET -s 'http://localhost:6105/_cluster/health'` gives a response as expected. (As do 6106 and 6107). Re-enabling puppet across the fleet...
  • 01:32 ryankemper: `sudo run-puppet-agent -e "adding cloudelastic to the service proxy --rkemper"` on `mwdebug1002.eqiad.wmnet`
  • 01:28 ryankemper: `sudo puppet-merge` done, now will run puppet on a single eqiad appserver and verify we can curl `localhost:610{5,6,7}`
  • 01:17 ryankemper: Disabling puppet on affected nodes via `sudo cumin C:profile::services_proxy::envoy 'disable-puppet "adding cloudelastic to the service proxy --rkemper"'`
  • 01:17 ryankemper: Going to test patch to stick envoy in front of `cloudelastic`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/628243

2020-09-21

  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:39 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:36 mutante: debmonitor2002 - systemctl reset-failed
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:20 mutante: releases.wikimedia.org has been converted to an active-active service with geodns/ backends in both DCs
  • 21:56 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:54 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:51 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:49 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adjust enwiktionary completion search ranking (duration: 00m 57s)
  • 20:47 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/CirrusSearch/: Remove pages from completion search by page id (duration: 01m 00s)
  • 20:04 herron: moving prometheus instance from bast3004 to prometheus3001 T243057
  • 19:46 herron: moving prometheus instance from bast4002 to prometheus4001 T243057
  • 19:38 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Push notifications deployment (4/5) (duration: 00m 57s)
  • 19:34 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Push notifications deployment (3/5) (duration: 00m 57s)
  • 19:28 mholloway-shell@deploy1001: Synchronized wmf-config/ProductionServices.php: Push notifications deployment (2/5) (duration: 00m 57s)
  • 19:26 mholloway-shell@deploy1001: Synchronized wmf-config/LabsServices.php: Push notifications deployment (1/5) (duration: 00m 57s)
  • 19:19 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:18 mepps: updated crm to 8f32b6301f
  • 19:15 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:14 ejegg: updated fundraising CiviCRM from e5ebf9d18a to 8f32b6301f
  • 19:13 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:59 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622863 T249745 (duration: 00m 56s)
  • 18:57 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365 (duration: 06m 54s)
  • 18:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on plwiki (T254239) and ptwiki (T255027) (duration: 00m 56s)
  • 18:50 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365
  • 18:33 mepps: updated crm from cc1f7e6d13 to e5ebf9d18a
  • 18:26 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Define Chinese logo variants for Modern Vector (no-op) (part 2) (T261153) (duration: 00m 56s)
  • 18:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Define Chinese logo variants for Modern Vector (no-op) (T261153) (duration: 00m 57s)
  • 18:21 catrope@deploy1001: Synchronized static/images/mobile/copyright/: Update Chinese logo variants for Modern Vector (T261153) (duration: 00m 56s)
  • 18:08 XioNoX: add NAT rule to pfw3-codfw - T263488
  • 17:42 papaul: rebooting ps1-a8-codfw firmware upgrade
  • 16:46 papaul: shutting down ms-be2019 for BBU replacing
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12696 and previous config saved to /var/cache/conftool/dbconfig/20200921-162433-root.json
  • 16:17 papaul: replacing msw-c8-codfw
  • 16:16 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12695 and previous config saved to /var/cache/conftool/dbconfig/20200921-160929-root.json
  • 16:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12694 and previous config saved to /var/cache/conftool/dbconfig/20200921-155426-root.json
  • 15:51 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 56s)
  • 15:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 59s)
  • 15:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12693 and previous config saved to /var/cache/conftool/dbconfig/20200921-153923-root.json
  • 15:24 hnowlan: roll-restarting restbase-dev for java security updates
  • 15:24 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Take db2124 back out of dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12692 and previous config saved to /var/cache/conftool/dbconfig/20200921-151210-kormat.json
  • 15:10 moritzm: rolling restart of mw canaries in codfw to pick up libx11 update
  • 15:07 moritzm: installing libx11 security updates on stretch
  • 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12691 and previous config saved to /var/cache/conftool/dbconfig/20200921-150233-kormat.json
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12690 and previous config saved to /var/cache/conftool/dbconfig/20200921-144729-kormat.json
  • 14:40 moritzm: installing qemu security updates on ganeti* stretch nodes
  • 14:37 papaul: firmware upgrade on db2127
  • 14:36 moritzm: installing qemu security updates on ganeti2011 and gnt-instance reboot debmonitor2001
  • 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:32 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12689 and previous config saved to /var/cache/conftool/dbconfig/20200921-143226-kormat.json
  • 14:30 herron: moving prometheus from bast5001 to prometheus5001 T243057
  • 14:24 papaul: disconnecting mgmt on msw-c1-codfw to re-do cable end T263138
  • 14:21 marostegui: Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing T263443
  • 14:17 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12688 and previous config saved to /var/cache/conftool/dbconfig/20200921-141722-kormat.json
  • 14:11 papaul: disconnecting mgmt on msw-d6-codfw to re-do cable end T263138
  • 14:00 moritzm: installing Java security updates on restbase/sessionstore*
  • 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2117 for schema change, add db2124 to dump/vslow in the interim T259831', diff saved to https://phabricator.wikimedia.org/P12687 and previous config saved to /var/cache/conftool/dbconfig/20200921-135821-kormat.json
  • 13:21 moritzm: installing glib-networking security updates for Stretch
  • 13:21 marostegui: Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing T263443
  • 12:59 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=codfw
  • 12:38 XioNoX: set same OSPF metric on both eqiad/codfw links - T263230
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing T263443
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing T263443
  • 11:38 effie: restart pybal on lvs2009 and lvs1015 - T256973
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed', diff saved to https://phabricator.wikimedia.org/P12684 and previous config saved to /var/cache/conftool/dbconfig/20200921-113708-marostegui.json
  • 11:35 Urbanecm: EU B&C done
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend/includes/Transforms/MoveLeadParagraphTransform.php: 3fab588: Simplify lead paragraph check (duration: 00m 59s)
  • 11:22 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a62212a: Allow local steward group members to bigdelete (duration: 00m 57s)
  • 11:12 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=shnwiktionary --fix # T256348 # P12683
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1cf4664: Set WT namespace alias to NS_PROJECT in shn.wiktionary (T256348) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 01ba828: Add archive.wul.waseda.ac.jp to the wgCopyUploadDomains (T261037) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bd51f47: Add *.70yearsindonesiaaustralia.com to the wgCopyUploadsDomains allowlist of commonswiki (T262238) (duration: 00m 57s)
  • 11:02 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 10:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 12s)
  • 09:03 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12682 and previous config saved to /var/cache/conftool/dbconfig/20200921-090343-kormat.json
  • 08:48 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12681 and previous config saved to /var/cache/conftool/dbconfig/20200921-084840-kormat.json
  • 08:48 marostegui: Stop MySQL on db2127 for on-site maintenance - T262247
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 T262247', diff saved to https://phabricator.wikimedia.org/P12680 and previous config saved to /var/cache/conftool/dbconfig/20200921-084730-marostegui.json
  • 08:33 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12679 and previous config saved to /var/cache/conftool/dbconfig/20200921-083337-kormat.json
  • 08:21 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 08:18 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12678 and previous config saved to /var/cache/conftool/dbconfig/20200921-081833-kormat.json
  • 08:15 godog: roll-restart swift-object-replicator in codfw and eqiad for increased concurrency
  • 07:53 hashar: Upgrading all CI Jenkins jobs to Quibble 0.0.45
  • 07:05 XioNoX: upgrade FNM to 1.1.7 in ulsfo - T257035
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12677 and previous config saved to /var/cache/conftool/dbconfig/20200921-060053-marostegui.json
  • 05:48 marostegui: Set innodb_change_buffering = inserts; on db2129 (s6 master) for performance testing
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12676 and previous config saved to /var/cache/conftool/dbconfig/20200921-054730-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12675 and previous config saved to /var/cache/conftool/dbconfig/20200921-052704-marostegui.json
  • 05:18 marostegui: Stop mysql on: es2013 es2016 es2019 to clone es2032 es2033 es2034 - T261717
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12674 and previous config saved to /var/cache/conftool/dbconfig/20200921-050632-marostegui.json
  • 05:06 marostegui: Deploy MCR schema change on s8 eqiad master, lag will appear on s8 (wikidata) on labsdb hosts T238966
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013,es2016 and es2019 to clone new hosts T261717', diff saved to https://phabricator.wikimedia.org/P12673 and previous config saved to /var/cache/conftool/dbconfig/20200921-050305-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2015 as es2 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12672 and previous config saved to /var/cache/conftool/dbconfig/20200921-050228-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12671 and previous config saved to /var/cache/conftool/dbconfig/20200921-045919-marostegui.json
  • 04:37 marostegui: Set innodb_change_buffering = inserts; on db2116 for performance testing
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12670 and previous config saved to /var/cache/conftool/dbconfig/20200921-043154-marostegui.json

2020-09-20

  • 08:46 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Tepig10102020' 'Davidfromtheworld' # T263317
  • 07:42 gehel: depooling wdqs2002 to catch up on lag
  • 07:36 gehel: restarting blazegraph + updater on wdqs2002

2020-09-19

  • 19:03 ariel@deploy1001: Finished deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed (duration: 00m 04s)
  • 19:02 ariel@deploy1001: Started deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed
  • 16:49 ejegg: reverted PayPal failmail diversion - IPN verification is working again
  • 16:27 ejegg: Diverted SmashPig PayPal failmail to eeggleston only

2020-09-18

  • 21:48 tzatziki: changed password for Millennium bug@ptwiki
  • 19:28 eileen: process-control config revision is 739ea754ca
  • 18:52 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:44 ryankemper: `sudo kill 254017 254018 254028 254029` to kill some dangling serdi / gzip processes, all the wikidata cleanup should be complete
  • 18:38 ryankemper: `sudo kill 126121 126122 126124 126128 249520 249521 254016 254027` on `snapshot1008` to terminate wikidata dump jobs that are in a bad state
  • 18:10 ryankemper: Removed stale `wikidatardf-dumps` crontab entry from `dumpsgen@snapshot1008`, stored backup of previous state of crontab in the (admittedly verbose) `/tmp/dumpsgen_crontab_before_removing_stale_wikidata_dump_entry_see_gerrit_puppet_patch_622342`
  • 17:15 mutante: lists1001 - apt-get install pwgen to generate passwords (this was installed on previous list server but apparently not puppetized, puppet patch coming up)
  • 16:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:09 mutante: restarting gerrit service to apply gerrit::628338 to make it dump heap if out of memory (T263008)
  • 14:15 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 00m 56s)
  • 14:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 01m 00s)
  • 13:02 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:00 kormat@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
  • 12:41 kormat: reimaging db2125 T263244
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12665 and previous config saved to /var/cache/conftool/dbconfig/20200918-123947-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12664 and previous config saved to /var/cache/conftool/dbconfig/20200918-122444-kormat.json
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12663 and previous config saved to /var/cache/conftool/dbconfig/20200918-120940-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12662 and previous config saved to /var/cache/conftool/dbconfig/20200918-115437-kormat.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125', diff saved to https://phabricator.wikimedia.org/P12661 and previous config saved to /var/cache/conftool/dbconfig/20200918-113509-marostegui.json
  • 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12660 and previous config saved to /var/cache/conftool/dbconfig/20200918-111529-kormat.json
  • 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12659 and previous config saved to /var/cache/conftool/dbconfig/20200918-105645-kormat.json
  • 10:45 jiji@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:41 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12658 and previous config saved to /var/cache/conftool/dbconfig/20200918-104141-kormat.json
  • 10:35 jiji@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:28 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12657 and previous config saved to /var/cache/conftool/dbconfig/20200918-102638-kormat.json
  • 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12656 and previous config saved to /var/cache/conftool/dbconfig/20200918-101135-kormat.json
  • 09:55 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12655 and previous config saved to /var/cache/conftool/dbconfig/20200918-095554-kormat.json
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 twentyafterfour: deployed hotfix for T263063 to phab1001
  • 09:47 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - T262527
  • 09:46 jayme: uncordoned kubestage1001 - T262527
  • 09:46 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12654 and previous config saved to /var/cache/conftool/dbconfig/20200918-094608-kormat.json
  • 09:31 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 80%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12653 and previous config saved to /var/cache/conftool/dbconfig/20200918-093105-kormat.json
  • 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 60%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12652 and previous config saved to /var/cache/conftool/dbconfig/20200918-091601-kormat.json
  • 09:00 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 40%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12651 and previous config saved to /var/cache/conftool/dbconfig/20200918-090058-kormat.json
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:56 jayme: reboot kubestage1001 for clean state - T262527
  • 08:54 elukey: change analytics-in4/in6 filters on cr1/cr2 after https://gerrit.wikimedia.org/r/628300
  • 08:47 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:45 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12650 and previous config saved to /var/cache/conftool/dbconfig/20200918-084554-kormat.json
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme: reboot kubestage1001 for kernel upgrade - T262527
  • 08:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1001 for clean state testing - T262527
  • 08:22 kormat@cumin1001: dbctl commit (dc=all): 'db2124 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12648 and previous config saved to /var/cache/conftool/dbconfig/20200918-082223-kormat.json
  • 08:16 klausman: reinstalling stat1004 with Buster
  • 07:17 moritzm: installing xdg-utils security updates
  • 07:14 XioNoX: push pfw policies - T263168
  • 07:12 jayme: draining kubestage1001 for kernel upgrade - T262527
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12647 and previous config saved to /var/cache/conftool/dbconfig/20200918-062127-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12646 and previous config saved to /var/cache/conftool/dbconfig/20200918-060815-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after rack move', diff saved to https://phabricator.wikimedia.org/P12645 and previous config saved to /var/cache/conftool/dbconfig/20200918-060724-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12644 and previous config saved to /var/cache/conftool/dbconfig/20200918-060103-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12643 and previous config saved to /var/cache/conftool/dbconfig/20200918-053758-marostegui.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2029 and es2030 to dbctl depooled - T261717', diff saved to https://phabricator.wikimedia.org/P12642 and previous config saved to /var/cache/conftool/dbconfig/20200918-053604-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12641 and previous config saved to /var/cache/conftool/dbconfig/20200918-052608-marostegui.json
  • 05:15 marostegui: Restart wikibugs

2020-09-17

  • 23:41 ejegg: updated payments-wiki from 86c997fdb2 to 7bb99ce03a
  • 23:01 ejegg: updated payments-wiki from 1e5a52ed26 to 86c997fdb2
  • 20:47 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 19b9b98: Fix APCOND_FR_NEVERBLOCKED handling (part 3; T262970) (duration: 00m 57s)
  • 19:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Filomena ciavarella' 'Filomena Ciavarella' #T262657
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:29 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:11 Urbanecm: Morning B&C done
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 40591d3: Enable DiscussionTools beta on jawiki & viwiki (T261654; T262109) (duration: 00m 56s)
  • 18:06 Urbanecm: Move /srv/mediawiki-stagging/grep (owned by tstarling) to /home/urbanecm to make working directory clean (cc TimStarling)
  • 17:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 17:20 rzl: repooled eqiad at 17:11
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:03 papaul: restarting ps1-d8-codfw
  • 16:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 01m 12s)
  • 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 02m 50s)
  • 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 07m 26s)
  • 16:33 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema (duration: 06m 14s)
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 marostegui: Restart wikibugs
  • 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 papaul: replacing msw-d8-codfw
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1131 IP after moving it to a different rack T262901', diff saved to https://phabricator.wikimedia.org/P12639 and previous config saved to /var/cache/conftool/dbconfig/20200917-160540-marostegui.json
  • 16:03 marostegui: Recreate db1131 on tendril T262901
  • 15:59 marostegui: Update rack location on zarcillo for db1131 T262901
  • 15:57 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 100% T259831', diff saved to https://phabricator.wikimedia.org/P12638 and previous config saved to /var/cache/conftool/dbconfig/20200917-155708-kormat.json
  • 15:44 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 75% T259831', diff saved to https://phabricator.wikimedia.org/P12637 and previous config saved to /var/cache/conftool/dbconfig/20200917-154431-kormat.json
  • 15:43 mepps: updated payments-wiki from 3c073a6a56 to 1e5a52ed26
  • 15:35 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 50% T259831', diff saved to https://phabricator.wikimedia.org/P12636 and previous config saved to /var/cache/conftool/dbconfig/20200917-153514-kormat.json
  • 15:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 25% T259831', diff saved to https://phabricator.wikimedia.org/P12635 and previous config saved to /var/cache/conftool/dbconfig/20200917-152019-kormat.json
  • 15:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12634 and previous config saved to /var/cache/conftool/dbconfig/20200917-151347-marostegui.json
  • 15:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12633 and previous config saved to /var/cache/conftool/dbconfig/20200917-150234-marostegui.json
  • 15:02 jynus: deploying extended grants for admin account on sys/p_s at s8@codfw T195578
  • 15:00 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2114: depool for schema change T259831', diff saved to https://phabricator.wikimedia.org/P12632 and previous config saved to /var/cache/conftool/dbconfig/20200917-145451-kormat.json
  • 14:49 cmjohnson1: ending pdu maintenance in eqiad
  • 14:40 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12631 and previous config saved to /var/cache/conftool/dbconfig/20200917-143914-marostegui.json
  • 14:32 papaul: replacing msw-d1,d2,d3,d4,d5 and d6
  • 14:31 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12630 and previous config saved to /var/cache/conftool/dbconfig/20200917-141825-marostegui.json
  • 14:02 marostegui: Start mysql on db1125 after PDU maintenance T261459
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12629 and previous config saved to /var/cache/conftool/dbconfig/20200917-140014-marostegui.json
  • 13:33 jayme: ran ipvsadm -D -t 10.2.2.14:8888 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:32 jayme: ran ipvsadm -D -t 10.2.2.31:8748 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.31:8748 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.14:8888 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'Start depooling db2114 T259831', diff saved to https://phabricator.wikimedia.org/P12628 and previous config saved to /var/cache/conftool/dbconfig/20200917-132513-kormat.json
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:19 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet
  • 13:17 marostegui: Stop MySQL on db2125 for on-site maintenance T260670
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:13 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.9
  • 12:18 cmjohnson1: pdu swap maintenance beginning now for racks D1, D2 and C1 eqiad
  • 11:24 matthiasmullie: End Euro B&C
  • 11:24 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/NavigationTiming/: Account for empty layout shift sources array (duration: 01m 05s)
  • 11:22 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/WikimediaEvents/: Disable MediaSearch A/B test (duration: 01m 08s)
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12627 and previous config saved to /var/cache/conftool/dbconfig/20200917-111028-marostegui.json
  • 11:06 vgutierrez: update to acme-chief 0.29 on acmechief[12]001 - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:04 vgutierrez: upload acme-chief 0.29 to apt.wm.o (buster) - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:03 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:58 marostegui: Stop mysql on db1125 for PDU mainteanance, lag will appear on s2, s4, s6 and s7 on labsdb hosts T261459
  • 10:58 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=codfw
  • 10:51 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=codfw
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12626 and previous config saved to /var/cache/conftool/dbconfig/20200917-104816-marostegui.json
  • 10:46 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:40 oblivian@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=wikifeeds
  • 10:34 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:20 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:17 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:14 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 08:49 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - T262527
  • 08:43 jayme: uncordoned kubestage1002 after kernel upgrade - T262527
  • 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:37 godog: graphite compress /var/log/carbon logs older than 2d
  • 08:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1002 for kernel upgrade - T262527
  • 08:24 godog: graphite add 300G to /srv
  • 07:55 jayme: draining kubestage1002 for kernel upgrade - T262527
  • 07:55 jayme: cordoning kubestage1002 for kernel upgrade - T262527
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12624 and previous config saved to /var/cache/conftool/dbconfig/20200917-070145-marostegui.json
  • 06:55 hashar: Taking a heap dump of Gerrit JVM
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12623 and previous config saved to /var/cache/conftool/dbconfig/20200917-061931-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12622 and previous config saved to /var/cache/conftool/dbconfig/20200917-060312-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12621 and previous config saved to /var/cache/conftool/dbconfig/20200917-055219-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for on-site maintenace', diff saved to https://phabricator.wikimedia.org/P12620 and previous config saved to /var/cache/conftool/dbconfig/20200917-055158-marostegui.json
  • 05:46 marostegui: Stop mysql on db1131 - T262901
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2031 on es2 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12619 and previous config saved to /var/cache/conftool/dbconfig/20200917-054226-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12618 and previous config saved to /var/cache/conftool/dbconfig/20200917-053503-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12617 and previous config saved to /var/cache/conftool/dbconfig/20200917-052347-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2011 as es1 master and es2017 as es3 master and then depool es2018 and es2012 to clone es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12616 and previous config saved to /var/cache/conftool/dbconfig/20200917-051741-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12615 and previous config saved to /var/cache/conftool/dbconfig/20200917-050739-marostegui.json
  • 04:53 marostegui: Deploy schema change on s1 eqiad primary master - T238966
  • 01:22 Krinkle: krinkle@mwmaint1002 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
  • 01:22 Krinkle: krinkle@mwmaint2001 synced docroot/noc – https://gerrit.wikimedia.org/r/620138

2020-09-16

  • 23:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:37 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/GrowthExperiments/: Fix styling for mobile start module (T258008); Revert wider task card on desktop (T263042, T258704); Fix width of sidebar modules in narrow mode in variant A (T263068) (duration: 01m 09s)
  • 22:24 shdubsh: install prometheus-icinga-exporter 0.11 on icinga2001
  • 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:10 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Vector search in header on testwiki and officewiki (T262207) (duration: 01m 04s)
  • 18:00 brennen@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend: Backport: Check $coords matched some nodes before comparing contents (T263034) (duration: 01m 06s)
  • 17:58 joal@deploy1001: Finished deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0] (duration: 00m 08s)
  • 17:58 joal@deploy1001: Started deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0]
  • 17:51 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:50 joal@deploy1001: Started deploy [analytics/refinery@07056b0]: Regular analytics weekly train [analytics/refinery@07056b0]
  • 17:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:11 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:03 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:40 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:13 marostegui: Start mysql on db1093, db1109 and db1123 after pdu work is done
  • 16:12 ryankemper: `wdqs` deploy complete, service is healthy
  • 16:09 elukey: reinstall buster on an-tool1009 after a lot of tests (ganeti VM, so it is a manual work)
  • 16:00 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:49 ryankemper: sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'; sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'
  • 15:49 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b7e2d0b]: 0.3.48 (duration: 14m 40s)
  • 15:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Rename wmgWikibaseClientLocalEntitySourceName to wmgWikibaseClientItemAndPropertySourceName on Beta (T258060) (production no-op) (duration: 01m 04s)
  • 15:35 ryankemper: Canary `wdqs1003` query tests looks good, proceeding to wdqs deploy for rest of fleet
  • 15:33 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b7e2d0b]: 0.3.48
  • 15:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php (T258060) (duration: 01m 05s)
  • 15:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Use `wmgWikibaseClientItemAndPropertySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php (T258060) (duration: 01m 02s)
  • 15:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add `wmgWikibaseClientItemAndPropertySourceName` to InitialiseSettings.php (T258060) (duration: 01m 06s)
  • 14:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:41 volans: uploaded spicerack_0.0.43 to apt.wikimedia.org buster-wikimedia
  • 14:39 cmjohnson1: pdu swap rack d7-eqiad, missed this in earlier log entry
  • 14:34 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 14:02 Urbanecm: Change email address of User:Oversight@enwiki to oversight-en-wp@wikipedia.org as OTRS is back up (T262733)
  • 13:48 marostegui: Start mysql on db1121 after PDU work
  • 13:46 James_F: Restarting CI Jenkins for T262827
  • 13:08 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet
  • 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.9
  • 12:58 elukey: upload hue_4.7.1-1+deb10u1 to buster-wikimedia
  • 12:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 12:56 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 12:49 cmjohnson1: start pdu swap in racks c6 and c7, d8
  • 12:36 moritzm: powercycling mw2256 (went down with overheated CPU)
  • 12:29 moritzm: restarting exim on MXes to pick up GNUTLS update
  • 11:29 moritzm: restarting slapd on LDAP replicas to pick up GNUTLS update
  • 11:18 moritzm: installing gnutls28 security updates on remaining stretch hosts
  • 11:12 jforrester@deploy1001: Synchronized php-1.36.0-wmf.9/includes/filerepo/file: T263014 Revert "Remove support for (Archived|OldLocal)File::userCan without a user" (duration: 01m 04s)
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2027 and es2028 T261717', diff saved to https://phabricator.wikimedia.org/P12606 and previous config saved to /var/cache/conftool/dbconfig/20200916-103324-marostegui.json
  • 10:20 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.9
  • 10:14 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.9 (duration: 46m 07s)
  • 10:10 ema: upload python-acme 0.31.0-2wm1 to buster-wikimedia T263006
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12605 and previous config saved to /var/cache/conftool/dbconfig/20200916-100548-marostegui.json
  • 10:01 akosiaris: T187984 Shutdown mendelevium.
  • 09:43 jynus: deploying max_packet_size change to m3 instances, too
  • 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.9
  • 09:26 liw: moving train 1.36.0-wmf.9 to testwikis
  • 09:22 jynus: restarting gerrit service on gerrit1001, unresponsive
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12603 and previous config saved to /var/cache/conftool/dbconfig/20200916-091535-marostegui.json
  • 09:13 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 0 - T262290
  • 09:08 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 1 - T262290
  • 08:52 marostegui: Stop mysql on db1121, db1123, db1093 and db1109 for PDU work T261454 T261457
  • 08:52 XioNoX: asw-d-codfw> request system snapshot slice alternate all-members - T262290
  • 08:50 jynus: deploy new max_allowed_packet configuration to m1, m2 and m5 dbs
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12601 and previous config saved to /var/cache/conftool/dbconfig/20200916-084916-marostegui.json
  • 08:42 awight: finished security backport for https://phabricator.wikimedia.org/T262628
  • 08:41 awight@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FileImporter/src/Services/ImportPlanValidator.php: Security patch for T262628 (duration: 00m 59s)
  • 08:41 XioNoX: asw-c-codfw> request system snapshot slice alternate all-members - T262290
  • 08:27 XioNoX: asw-b-codfw> request system snapshot slice alternate all-members - T262290
  • 08:26 awight: beginning security backport for https://phabricator.wikimedia.org/T262628
  • 08:17 XioNoX: asw-a-codfw> request system snapshot slice alternate all-members - T262290
  • 08:04 akosiaris: T187984 Validated that ticket.wikimedia.org works, proceeding with a wider announcement
  • 08:02 XioNoX: asw2-d-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:49 akosiaris: T187984 Switch over ticket.discovery.wmnet to otrs1001
  • 07:48 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:44 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 07:40 XioNoX: asw2-c-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:37 akosiaris: T187984 Tested inbound email successfully
  • 07:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:26 akosiaris: T187984 Tested outbound email, switching inbound email configuration and performing tests
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12600 and previous config saved to /var/cache/conftool/dbconfig/20200916-072614-marostegui.json
  • 07:22 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 jayme@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 07:21 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:12 akosiaris: T187984 Disable gravatar in system configuration to avoid leaking agent PII through a 3rd party service
  • 07:03 akosiaris: T187984 validated that the OTRS installation is functional over SSH
  • 07:02 akosiaris: T187984 migration script done. Config updates, rebuilds, package upgrades/reinstall and index rebuilds done
  • 06:28 godog: codfw-prod: bump weight for ms-be2057 - T261633
  • 06:20 kart_: Updated cxserver to 2020-08-30-011854-production (T253439, T260557)
  • 06:20 XioNoX: asw2-b-eqiad> request system snapshot slice alternate all-members - T262290
  • 06:15 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:11 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12599 and previous config saved to /var/cache/conftool/dbconfig/20200916-061013-marostegui.json
  • 06:08 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12598 and previous config saved to /var/cache/conftool/dbconfig/20200916-060717-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 to clone es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12597 and previous config saved to /var/cache/conftool/dbconfig/20200916-055535-marostegui.json
  • 05:53 XioNoX: asw2-a-eqiad> request system snapshot slice alternate all-members - T262290
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12596 and previous config saved to /var/cache/conftool/dbconfig/20200916-055108-marostegui.json
  • 05:50 XioNoX: msw1-codfw> request system snapshot slice alternate - T262290
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2027 and es2028 to dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12595 and previous config saved to /var/cache/conftool/dbconfig/20200916-053918-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12594 and previous config saved to /var/cache/conftool/dbconfig/20200916-053507-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow', diff saved to https://phabricator.wikimedia.org/P12593 and previous config saved to /var/cache/conftool/dbconfig/20200916-052343-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12592 and previous config saved to /var/cache/conftool/dbconfig/20200916-052241-marostegui.json
  • 05:07 marostegui: Repool labsdb1010
  • 02:22 mutante: deneb - sudo systemctl start package_builder_Clean_up_build_directory to fix icinga alert after failed build attempts

2020-09-15

  • 23:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 1c0b0d1: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 56s)
  • 23:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 5beace3: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 58s)
  • 23:14 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: ac8bd38: flaggedrevs: Remove non-existent config options (duration: 00m 58s)
  • 23:07 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 23:00 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 62b21d5: Revert "Remove abusefilter-view right grant from wmf-config" (T255506) (duration: 00m 59s)
  • 20:44 brennen: removing extraneous recursive symlink /srv/mediawiki-staging/php-1.36.0-wmf.9/php-1.36.0-wmf.8
  • 18:32 Urbanecm: Morning B&C done
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 084729b: Remove abusefilter-view right grant from wmf-config (T255506) (duration: 00m 56s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1d34565: Enable MediaWiki client errors on frwiki (T255585) (duration: 00m 57s)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 79004b7: Enable the reverted tag on all wikis (T164307) (duration: 00m 56s)
  • 17:59 krinkle@deploy1001: Synchronized src/ServiceConfig.php: If727ae4335 (duration: 00m 56s)
  • 17:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out (duration: 37m 42s)
  • 17:05 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out
  • 17:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint (duration: 86m 46s)
  • 17:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:38 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:26 shdubsh: manual install prometheus-icinga-exporter upgrade on icinga2001
  • 14:53 godog: switch grafana to eqiad - T259143
  • 14:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:38 XioNoX: remove old SNMP community from all network devices
  • 14:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 56s)
  • 14:21 otto@deploy1001: sync-file aborted: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 06s)
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:18 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:14 cmjohnson1: beginning work inside racks c2, c3, c4 and c5 eqiad
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, s8, add db1092 temporarily', diff saved to https://phabricator.wikimedia.org/P12589 and previous config saved to /var/cache/conftool/dbconfig/20200915-121849-marostegui.json
  • 12:18 jbond42: update libxml2 on stretch and jessie
  • 12:08 jbond42: rolling restart of php7.2-fpm
  • 12:05 elukey: roll restart cassandra on aqs* to pick up openjdk upgrades
  • 12:05 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:44 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 294931f: Revert "Disable DynamicPageList on ruwikinews" (T262240; T262391) (duration: 00m 58s)
  • 11:17 effie: roll out scap 3.15.0-1 to all - T261234
  • 11:12 XioNoX: mass update SCS SNMP community in LibreNMS - T246890
  • 10:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:56 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:54 XioNoX: mass update PDU SNMP community in LibreNMS - T246890
  • 10:48 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 10:36 moritzm: uploaded libxml2 2.9.1+dfsg1-5+deb8u8+wmf1 for jessie-wikimedia
  • 10:33 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:22 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "testwikiswikis to 1.36.0-wmf.9"
  • 10:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 09:22 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:04 gehel: restart elasticsearch on elastic2029 (high GC
  • 09:01 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 08:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:53 elukey: roll restart druid zookeeper clusters for openjdk upgrades
  • 08:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:13 marostegui: Stop MySQL on labsdb1010 for PDU maintenance T261456
  • 08:05 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 11m 10s)
  • 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:01 akosiaris: T187984 migration script on otrs1001 proceeding as expected. Still in step 31/44, but that's what we saw in the test migration
  • 07:54 liw@deploy1001: Started scap: testwikis to 1.36.0-wmf.9
  • 07:24 godog: swift codfw add ms-be2057 at object weight 100 - T261633
  • 07:19 elukey: roll restart druid cluster to pick up openjdk updates
  • 07:19 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 07:16 XioNoX: pre-configure SGIX port on cr2-eqsin
  • 06:57 liw: 1.36.0-wmf.9 was branched at 7269b6b for T257977
  • 06:08 marostegui: Stop mysql on es2011 to clone es2028
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 to clone es2028', diff saved to https://phabricator.wikimedia.org/P12585 and previous config saved to /var/cache/conftool/dbconfig/20200915-060623-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2012 as es1 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12584 and previous config saved to /var/cache/conftool/dbconfig/20200915-060508-marostegui.json
  • 05:33 marostegui: Depool labsdb1010 for PDU maintenance
  • 05:10 marostegui: Restart sanitarium hosts on eqiad and codfw T262832

2020-09-14

  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:45 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 21:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:32 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:30 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'enable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:24 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'disable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:03 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a588eb0c6 T262087 modify wgEventStreams to reference NEL schema (duration: 00m 56s)
  • 19:00 Urbanecm: Morning B&C done
  • 18:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a5d56ed: e2f4798: Enable Special:Investigate on eswiki (T262436) (duration: 00m 56s)
  • 18:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:47 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:38 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 7d19393: Remove investigate from $wgAvailableRights (T260175) (duration: 00m 56s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d2fa653: Remove the investigate right from testwiki and frwiki (T260175) (duration: 00m 56s)
  • 18:30 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/EventStreamConfig/includes/: a4c8608: Default to using API json formatversion=2 (T251609) (duration: 00m 57s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 27ba5a1: add new parse* servers to $wgLinterSubmitterWhitelist (T247441) (duration: 00m 56s)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: 720e6cb: flaggedrevs: Move setting of wgFlaggedRevsAutopromote and wgFlaggedRevsAutoconfirm out of wgExtensionFunctions (T237191) (duration: 00m 56s)
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 55s)
  • 18:08 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 56s)
  • 17:51 mutante: all new parse* parsoid hardware pooled now and set to active in netbox, deploy in 10 min will add to $wgLinterSubmitterWhitelist (T247441)
  • 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 17:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 17:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2002.codfw.wmnet
  • 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:36 mutante: pooled the first of the new parsoid servers - parse2001 (T247441)
  • 16:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 16:04 elukey: completed the rollout of restrictive kafka ferm rules on the Kafka jumbo cluster
  • 16:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 16:01 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[0-2][0-9].codfw.wmnet
  • 15:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 15:58 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 15:54 moritzm: restarting apache on webperf* to pick up GNU TLS security update
  • 15:45 moritzm: restarting apache/FPM on mw2271/m2272 (codfw canaries) to pick up GNU TLS update
  • 15:35 moritzm: installing gnutls28 security updates on stretch
  • 15:23 elukey: enable stricter ferm rules on kafka-jumbo1007 and kafka-jumbo1005
  • 15:17 cicalese@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Allow public access to API Portal main page for private launch (duration: 00m 57s)
  • 15:17 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:11 cmjohnson1: completed pdu swap in eqiad racks d5/d6
  • 14:55 elukey: ferm rules added to kafka-jumbo1009, 1006 and 1008 up to now
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:16 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:42 moritzm: installing dbus security updates on stretch
  • 13:42 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:32 moritzm: installing websockify stretch updates
  • 13:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:51 cmjohnson1: correction it's replacing the pdu's in racks d5 and d6
  • 12:50 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1438 --new-data-type external-id (T262198)
  • 12:49 cmjohnson1: replacing pdu's in racks d4 and d5 eqiad
  • 12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-snmp (exit_code=1)
  • 12:30 ayounsi@cumin1001: START - Cookbook sre.pdus.rotate-snmp
  • 12:30 XioNoX: rotate SNMP community on all the PDUs - T246890
  • 12:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:24 moritzm: rebooting sodium for kernel update
  • 12:09 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:08 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:06 akosiaris: T187984 migration script on otrs1001 now in step 31/44
  • 12:03 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fea8861: Follow-up 0ee0d8f: [frwiktionary] Create `conj` alias (T262298) (duration: 00m 56s)
  • 11:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:48 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:48 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:46 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:45 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:41 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:41 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:40 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:39 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:36 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:35 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:27 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for MCR', diff saved to https://phabricator.wikimedia.org/P12578 and previous config saved to /var/cache/conftool/dbconfig/20200914-112648-marostegui.json
  • 11:24 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:20 marostegui: Remove triggers from db1124:3311 - T238966
  • 11:19 marostegui: Deploy MCR schema change on s1, this will generate lag on s1 labsdb - T238966
  • 11:13 Urbanecm: EU B&C window done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 47fe87c: [itwiki] Increase $wgAutoConfirmAge and $wgAutoConfirmCount (T262738) (duration: 00m 56s)
  • 11:09 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 11:05 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262298 # P12576
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0ee0d8f: [frwiktionary] Create new namespace "Conjugaison" & associated talk (T262298) (duration: 00m 56s)
  • 11:00 volans: Mass importing IPs from PuppetDB into Netbox T244153
  • 10:59 XioNoX: create LACP bundle to labtestvirt2003
  • 10:50 jbond42: enable git protocol version2 fleet wide
  • 10:43 effie: deploy scap 3.15.0-1 to canaries - T261234
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 09:27 akosiaris: T187984 migration script on otrs1001 now in step 8/44 (correction)
  • 09:26 akosiaris: T187984 migration script on otrs1001 now in step 8/41
  • 09:09 akosiaris: db1077. stop slave ; show slave status > /home/akosiaris/show_slave_status; reset slave all T187984
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2026 on es2 T261717', diff saved to https://phabricator.wikimedia.org/P12575 and previous config saved to /var/cache/conftool/dbconfig/20200914-085842-marostegui.json
  • 08:49 akosiaris: start the OTRS upgrade to 6.0.29 T187984
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12574 and previous config saved to /var/cache/conftool/dbconfig/20200914-084509-marostegui.json
  • 08:42 moritzm: upgrading remaining stretch systems to git 2.20 T262244
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12573 and previous config saved to /var/cache/conftool/dbconfig/20200914-083525-marostegui.json
  • 08:17 _joe_: restarting pybal on lvs2009
  • 08:16 _joe_: repooling mw2297
  • 08:14 _joe_: restarting php on mw2297, php-fpm stuck in SIGILL
  • 08:14 marostegui: Stop MySQL on db2125 for on-site maintenance - T260670
  • 08:12 _joe_: restarting pybal on lvs2010
  • 08:09 _joe_: restarting pybal on lvs1015
  • 08:05 godog: prometheus codfw ops, extend the lv by 100G
  • 08:04 marostegui: Stop MySQL on es2017 to clone es2027
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 to clone es2027 - T261717', diff saved to https://phabricator.wikimedia.org/P12572 and previous config saved to /var/cache/conftool/dbconfig/20200914-080344-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2018 as es3 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12571 and previous config saved to /var/cache/conftool/dbconfig/20200914-080239-marostegui.json
  • 07:58 _joe_: restarting pybal on lvs1015
  • 07:52 _joe_: restarting pybal on lvs1016
  • 07:40 jayme: shutting down etcd100[1-3] (sheduled for decommission, replaced by kubetcd100[4-6])
  • 07:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12570 and previous config saved to /var/cache/conftool/dbconfig/20200914-073919-marostegui.json
  • 06:56 elukey: slowly rollout ferm rules on Kafka-Jumbo hosts (see https://gerrit.wikimedia.org/r/611168)
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 05:54 elukey: execute "gnt-instance modify -B vcpus=4 an-tool1009.eqiad.wmnet" on ganeti1011 - T258768
  • 05:54 marostegui: Truncate tendril.general_log_sampled on db1115 - T262782
  • 05:47 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12569 and previous config saved to /var/cache/conftool/dbconfig/20200914-053844-marostegui.json

2020-09-13

  • 23:47 Urbanecm: Change email address of User:Oversight@enwiki to oversight-l@lists.wikimedia.org as part of OTRS downtime preparation (T262733)
  • 05:51 effie: sudo -i depool mw2297

2020-09-12

  • 01:07 mutante: people2001 - rsyncing user home dirs from people1002
  • 00:38 mutante: all issues with hosts doing stuff "on every run" have been fixed except one is left: analytics1034

2020-09-11

  • 22:54 mutante: starting people2001 VM
  • 17:30 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:22 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:12 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:55 jynus: starting snapshot of m2 from db1117
  • 08:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:59 XioNoX: remove BGP to AS64271 in AMS-IX (see peering@ email)
  • 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:17 moritzm: rebootin ldap-corp server for kernel update
  • 07:02 moritzm: remove git-core from stretch systems, it's a transition package no longer provided by the 2.20 backport from Buster
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:54 mutante: downtimes 48h for parse* hosts not in production yet but getting icinga checks from applied role
  • 01:53 mutante: ACKed alerts for eqiad power switches after making T262629
  • 01:53 mutante: initial puppet runs on parse2010 - parse2020, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:45 mutante: mw2296 - restarted php7.2-fpm
  • 01:42 mutante: mw2296 - systemctl restart apache2 - rescheduled icinga alerts for apache and php-fpm
  • 01:33 mutante: initial puppet runs on parse2001 - parse2010, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix (duration: 00m 07s)
  • 01:32 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20]: Simple hql syntax fix (duration: 08m 09s)
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:24 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20]: Simple hql syntax fix
  • 00:41 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca] (duration: 00m 08s)
  • 00:41 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca]
  • 00:40 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca] (duration: 08m 25s)
  • 00:38 mutante: generating mcrouter certs for parse2001 - parse2019 - mcrouter_generate_certs on puppetmaster1001 (T247441)
  • 00:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca]
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:01 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-09-10

  • 23:44 ejegg: updated payments-wiki from e41ab173e0 to 3c073a6a56
  • 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:43 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:31 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:11 ejegg: updated payments-wiki from be81063168 to e41ab173e0
  • 22:06 mutante: added mcrouter cert for parse2020, ran mcrouter_generate_certs
  • 21:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.8
  • 20:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:20 longma: correction: T257976 - 1.36.0-wmf.8 to all wikis
  • 20:20 longma: deploying 1.36.0-wmf.8 to all wikis
  • 20:02 krinkle@deploy1001: Synchronized php-1.36.0-wmf.8/includes/resourceloader/ResourceLoaderSkinModule.php: Ibe2c9f8d024f6 (duration: 01m 05s)
  • 19:44 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:12 mholloway-shell@deploy1001: Started restart [recommendation-api/deploy@db7fd80]: (no justification provided)
  • 19:07 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d2b57: Set $wgCategoryCollation = uca-tr on trwiktionary (T262163) (duration: 01m 05s)
  • 18:58 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262398
  • 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 09e487e: Add a new namespace to frwiktionary (T262398) (duration: 01m 04s)
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 03s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 07s)
  • 18:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 0cde0b1: Add throttle rule for Czech senior citizens course (T262415) (duration: 01m 05s)
  • 18:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:00 mutante: helium (former backup host) is being removed from ferm rules on all hosts, it was replaced by backup1001 (T260717)
  • 17:33 bblack: dns servers: upgrading remainder of fleet to gdnsd-3.3.0-1~wmf1
  • 16:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:25 bblack: authdns1001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:06 bblack: dns4001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:04 bblack: reprepro: uploaded gdnsd-3.3.0-1~wmf1 - T261340
  • 15:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:04 volans: uploaded cumin_4.0.0 to apt.wikimedia.org buster-wikimedia (no code changes)
  • 13:58 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:52 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:42 moritzm: rebooting etherpad1002 (etherpad.wikimedia.org) for kernel update
  • 13:24 moritzm: installing rake security updates on stretch
  • 13:10 ebernhardson: delete lldwiki_{content|general} indices from search.svc.{eqiad|codfw}.wmnet:9643 (psi), they should be on 9443 (omega)
  • 12:57 klausman: Ran puppet-merge to get my dotfiles from https://gerrit.wikimedia.org/r/c/operations/puppet/+/626367 out
  • 12:34 moritzm: installing firejail updates on maps/thumbor/restbase
  • 12:01 moritzm: upgrading deployment servers to git 2.20 T262244
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P12557 and previous config saved to /var/cache/conftool/dbconfig/20200910-113758-marostegui.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P12556 and previous config saved to /var/cache/conftool/dbconfig/20200910-113426-marostegui.json
  • 11:13 matthiasmullie: Euro B&C done
  • 11:13 moritzm: uploaded git 2.20.1-2+deb10u3~wmf1 to stretch-wikimedia/main T262244
  • 11:11 mlitn@deploy1001: Synchronized php-1.36.0-wmf.8//extensions/WikimediaEvents/: WikimediaEvents: Enable MediaSearch A/B test (duration: 01m 06s)
  • 10:42 duesen_: daniel@mwmaint2001:~$ mwscript maintenance/findBadBlobs.php jvwiki --revisions 214173 --mark T262457
  • 10:34 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:32 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 XioNoX: move VRRP master to cr2-esams
  • 10:21 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:45 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12555 and previous config saved to /var/cache/conftool/dbconfig/20200910-093106-marostegui.json
  • 09:26 dcausse: creating missing cirrus indices for jawikivoyage T262518
  • 09:24 dcausse: creating missing cirrus indices for jawikivoyage T260228
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12554 and previous config saved to /var/cache/conftool/dbconfig/20200910-091335-marostegui.json
  • 08:49 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:47 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12551 and previous config saved to /var/cache/conftool/dbconfig/20200910-082304-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12550 and previous config saved to /var/cache/conftool/dbconfig/20200910-073107-marostegui.json
  • 07:03 elukey: resize search-loader vms (+4 vcores +4GB of ram) on Ganeti - T262385
  • 05:29 marostegui: Deploy schema change on s3 master - T260476
  • 00:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master (duration: 06m 42s)
  • 00:24 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master
  • 00:23 twentyafterfour: done. Phabricator update complete
  • 00:23 twentyafterfour: applying database migrations to phabricator db
  • 00:09 twentyafterfour: deploying phabricator update 2020-09-10 https://phabricator.wikimedia.org/project/view/4755/

2020-09-09

  • 23:51 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915 (duration: 00m 05s)
  • 23:51 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915
  • 23:37 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/CirrusSearch/includes/Search/InterleavedResultSet.php: Repair passing interleaved search metrics from backend to frontend (duration: 01m 04s)
  • 20:13 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:625914 (duration: 01m 03s)
  • 20:03 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:626190 T261425 (duration: 01m 03s)
  • 20:01 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.8/skins/WikimediaApiPortal: Backport gerrit:626044, T261425 (duration: 01m 12s)
  • 19:11 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.8 (duration: 01m 03s)
  • 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.8
  • 18:19 _joe_: banning urls ^/api/rest_v1/page/mobile-html-offline-resources/ from varnish caches
  • 18:19 Urbanecm: Morning B&C window done
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b226330: Enable $wgAllowCrossOrigin on all wikis (T262425) (duration: 01m 04s)
  • 18:15 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 85e36ae: Enable MediaWiki client errors on commonswiki and metawiki (T255585) (duration: 01m 06s)
  • 18:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 02m 55s)
  • 17:59 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 06m 47s)
  • 17:52 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2 (duration: 09m 38s)
  • 17:42 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2
  • 17:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437 (duration: 06m 00s)
  • 17:35 ppchelko@deploy1001: Started deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:24 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:22 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:15 marostegui: Stop mysql on db2125 for on-site maintenance T260670
  • 16:10 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3] (duration: 00m 11s)
  • 16:10 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3]
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 bd808: scap3 of Striker to labweb1001 failing. Will investigate.
  • 16:05 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2] (duration: 00m 11s)
  • 16:05 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2]
  • 16:04 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) (duration: 01m 21s)
  • 16:03 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111)
  • 15:54 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:11 herron: prometheus1003: systemctl restart thanos-sidecar@ops.service
  • 14:29 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:22 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:00 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:57 marostegui: Restart mysql on db1115 T231769
  • 13:54 bblack: deployed https://gerrit.wikimedia.org/r/626153
  • 12:47 _joe_: restarting php-fpm on wtp2003
  • 12:46 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:37 cmjohnson1: beginning scheduled PDU maintenance racks D5 and D6 in eqiad
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12545 and previous config saved to /var/cache/conftool/dbconfig/20200909-123634-kormat.json
  • 12:31 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12544 and previous config saved to /var/cache/conftool/dbconfig/20200909-123109-kormat.json
  • 12:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 moritzm: installing zeromq security updates on Buster
  • 12:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:37 awight: EU Bacon complete
  • 11:34 awight@deploy1001: Synchronized wmf-config: Config: api-portal: required extended configuration (T261425) (duration: 01m 08s)
  • 11:15 moritzm: added Tobias Klausmann to pwstore
  • 11:14 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:03 marostegui: Stop MySQL on s2 eqiad master to prepare for the PDU maintenance (this will generate lag on s2 on labsdb) T261453
  • 10:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:28 volans: restarting ferm on failed hosts: an-test-master1001.eqiad.wmnet,an-worker1116.eqiad.wmnet,db[1075,1101,1116].eqiad.wmnet,labstore1007.wikimedia.org,logstash[1025,1030].eqiad.wmnet leftover from yesterday network issue
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:11 klausman: Rebooting stat1005 for clearing GPU status and testing new DKMS driver (T260442)
  • 10:09 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12542 and previous config saved to /var/cache/conftool/dbconfig/20200909-100157-kormat.json
  • 09:52 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12541 and previous config saved to /var/cache/conftool/dbconfig/20200909-095219-kormat.json
  • 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12540 and previous config saved to /var/cache/conftool/dbconfig/20200909-093353-kormat.json
  • 09:26 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12539 and previous config saved to /var/cache/conftool/dbconfig/20200909-092621-kormat.json
  • 09:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:11 moritzm: installing qemu security updates on Buster
  • 09:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:53 _joe_: restarting restbase on rb2009 (depooled)
  • 08:53 godog: upgrade kibana to 7.9.1 on the logstash7 cluster
  • 08:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12538 and previous config saved to /var/cache/conftool/dbconfig/20200909-085147-kormat.json
  • 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12537 and previous config saved to /var/cache/conftool/dbconfig/20200909-084433-kormat.json
  • 08:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12536 and previous config saved to /var/cache/conftool/dbconfig/20200909-083616-kormat.json
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:30 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12535 and previous config saved to /var/cache/conftool/dbconfig/20200909-083038-kormat.json
  • 08:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable DynamicPageList on ruwikinews (T262240) (duration: 01m 22s)
  • 07:25 elukey: restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage
  • 06:21 XioNoX: push new pfw policies - T262297
  • 01:58 eileen: civicrm revision changed from 4e40a59d42 to cc1f7e6d13, config revision is 4845a229dc

2020-09-08

  • 23:47 eileen: civicrm revision is 4e40a59d42, config revision is d26334fa36
  • 23:25 eileen: civicrm revision changed from 5e7352e2c3 to 4e40a59d42, config revision is 3cf0913789
  • 22:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:12 andrew@deploy1001: Finished deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update (duration: 03m 35s)
  • 22:08 andrew@deploy1001: Started deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update
  • 22:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:57 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks (duration: 00m 13s)
  • 21:57 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks
  • 19:19 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.8
  • 19:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s)
  • 18:22 elukey: rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305
  • 18:00 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.8
  • 17:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:54 Amir1: Deployed patch for T262240
  • 17:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:23 andrewbogott: rebooting cloudvirt1033
  • 17:03 klausman: attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs
  • 16:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - T251609 (duration: 00m 58s)
  • 16:34 herron: increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag)
  • 16:12 longma: 1.36.0-wmf.8 was branched at e81e81e for T257976
  • 16:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:03 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:02 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:34 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1004.*
  • 15:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.*
  • 15:30 elukey: roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed
  • 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 15:20 _joe_: restarted celery-ores-worker.service on ores1007
  • 15:19 _joe_: restarted ferm on wdqs1011
  • 15:18 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 15:16 _joe_: starting wdqs-updater on wdqs1005
  • 15:15 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
  • 15:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet
  • 15:14 bblack: repool cp1087-90 (eqiad row D)
  • 15:13 herron: rolling restart of elk5 logstashes
  • 15:10 marostegui: Start mysql on db1106 after PDU maintenance is done
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: service=kubesvc,name=kubernetes1013.*
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes1004.*
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 4 port 0
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 0 member 2 port 50
  • 15:02 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 1 port 1
  • 14:53 marostegui: Reload dbproxy1016 to recover the alert
  • 14:45 jynus: restarting bacula-dir @ backup1001
  • 14:44 XioNoX: reboot asw2-d3-eqiad
  • 14:33 moritzm: bouncing ferm on hosts where ferm.service failed due to DNS resolution issues for prometheus hosts
  • 14:31 volans: restarted ssh on mc1033 from console
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 1 member 4 port 0
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 0 member 2 port 50
  • 14:13 akosiaris: drain kubernetes1013, kubernetes1004. They are on row D
  • 14:13 bblack: dns1002 - disable puppet + bird service (stop advertising recdns from row D)
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1090.eqiad.wmnet
  • 13:59 bblack: depooling cp1087-1090
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp108[789].eqiad.wmnet
  • 13:57 XioNoX: asw2-d-eqiad> request system reboot member 3
  • 13:35 cmjohnson1: the power cable was not properly seated and lost power to asw2-d3-eqiad
  • 13:34 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 13:30 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:25 mateusbs17: Restarted puppetdb on deployment-puppetdb03 (T248041)
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:20 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:18 cmjohnson1: swapping pdu's in eqiad, mgmt for racks d3 and d4 will go down
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 13:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 13:14 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:13 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:12 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12523 and previous config saved to /var/cache/conftool/dbconfig/20200908-123546-kormat.json
  • 12:34 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:27 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12522 and previous config saved to /var/cache/conftool/dbconfig/20200908-122702-kormat.json
  • 12:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12521 and previous config saved to /var/cache/conftool/dbconfig/20200908-121139-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12520 and previous config saved to /var/cache/conftool/dbconfig/20200908-120419-kormat.json
  • 12:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:18 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:53 marostegui: Deploy schema change on s3 eqiad master - T253276
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:20 marostegui: Deploy schema change on s4 eqiad master - T253276
  • 10:14 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:11 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:08 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12519 and previous config saved to /var/cache/conftool/dbconfig/20200908-100852-kormat.json
  • 09:52 akosiaris: enable puppet, run it on all k8s eqiad nodes and double check that calico-node is fine T239835
  • 09:43 akosiaris: stopped calico-node and kube-apiserver on k8s nodes/masters T239835
  • 09:43 marostegui: Stop mysql on es2014 to clone es2026 T261717
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 - T261717', diff saved to https://phabricator.wikimedia.org/P12517 and previous config saved to /var/cache/conftool/dbconfig/20200908-093957-marostegui.json
  • 09:37 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs (#2), T261489"
  • 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:28 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12515 and previous config saved to /var/cache/conftool/dbconfig/20200908-092755-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:20 jayme: disabling puppted on argon.eqiad.wmnet,chlorine.eqiad.wmnet,kubernetes[1001-1016].eqiad.wmnet - Reinitialize eqiad k8s cluster with new etcd - T239835
  • 08:55 marostegui: Deploy schema change on s7 eqiad master - T253276
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2127's weight', diff saved to https://phabricator.wikimedia.org/P12514 and previous config saved to /var/cache/conftool/dbconfig/20200908-084834-marostegui.json
  • 08:45 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs, T261489"
  • 08:23 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
  • 08:22 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 08:21 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
  • 08:20 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 08:16 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:44 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (T250887; T262242) (duration: 00m 59s)
  • 07:44 elukey: roll restart kafka daemons on kafka-jumbo100[7-9] to pick up opendjk upgrades
  • 07:40 XioNoX: move HE from ix to transit BGP group on cr3-eqsin
  • 07:00 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:58 marostegui: Deploy schema change on s2 eqiad master - T253276
  • 06:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P12513 and previous config saved to /var/cache/conftool/dbconfig/20200908-065022-marostegui.json
  • 06:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:31 marostegui: Deploy schema change on s5 eqiad master - T253276
  • 06:23 elukey: roll restart of Hadoop master daemons on an-master100[1,2] to pick up new opejdk settings
  • 06:14 marostegui: Stop MySQL on db1106 for PDU maintenance T261452
  • 05:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime

2020-09-07

  • 23:35 Reedy: Deployed patch for T262213
  • 21:19 reedy@deploy1001: Synchronized private/PrivateSettings.php: Remove old mitigation (duration: 00m 55s)
  • 18:04 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12511 and previous config saved to /var/cache/conftool/dbconfig/20200907-153857-kormat.json
  • 15:32 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12510 and previous config saved to /var/cache/conftool/dbconfig/20200907-153206-kormat.json
  • 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12509 and previous config saved to /var/cache/conftool/dbconfig/20200907-152117-kormat.json
  • 15:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12508 and previous config saved to /var/cache/conftool/dbconfig/20200907-151718-kormat.json
  • 15:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:09 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12507 and previous config saved to /var/cache/conftool/dbconfig/20200907-150901-kormat.json
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 moritzm: rebooting poolcounter1004/1005
  • 15:03 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12506 and previous config saved to /var/cache/conftool/dbconfig/20200907-150310-kormat.json
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1133 from dbctl T253217', diff saved to https://phabricator.wikimedia.org/P12504 and previous config saved to /var/cache/conftool/dbconfig/20200907-143507-marostegui.json
  • 14:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 _joe_: restarting pybal in codfw to pick up the new mobileapps TLS endpoint
  • 13:44 _joe_: restarting pybal in eqiad to pick up the new mobileapps TLS endpoint
  • 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:28 hashar@deploy1001: Finished deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924 (duration: 00m 05s)
  • 13:27 hashar@deploy1001: Started deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:22 hashar@deploy1001: Finished deploy [integration/docroot@11ab4a0]: (no justification provided) (duration: 00m 10s)
  • 13:22 hashar@deploy1001: Started deploy [integration/docroot@11ab4a0]: (no justification provided)
  • 13:14 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:04 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:43 kormat@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 12:42 kormat@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:29 marostegui: Upgrade and reboot db2094 and db2095 (sanitarium hosts in codfw)
  • 12:18 gehel: restarting elasticsearch on elastic2029 (high GC)
  • 12:01 volans: restart uwsgi on debmonitor1002 to test db reconnection
  • 11:58 marostegui: Reboot pc1008 for upgrade
  • 11:36 Urbanecm: EU B&C done
  • 11:30 urbanecm@deploy1001: Synchronized docroot/noc/index.html: bbfe2ce: noc: Remove link to outdated blog (T259978) (duration: 00m 57s)
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: ff9f104: Update help URL (T256623) (duration: 00m 56s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b512d3: [hewiktionary] Enable wikilove (T262181) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 2/2) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 1/2) (duration: 01m 20s)
  • 11:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewiktionary wikilove # T262181
  • 11:01 marostegui: Reboot pc1007 for upgrade
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:36 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:12 dcausse@deploy1001: Finished deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server) (duration: 00m 33s)
  • 09:11 dcausse@deploy1001: Started deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server)
  • 09:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:49 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:29 jayme@deploy2001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:19 marostegui: Upgrade and restart pc1010
  • 08:18 jayme@deploy2001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:10 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:03 marostegui: Compress InnoDB on s8 eqiad master (db1109) - T232446
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P12501 and previous config saved to /var/cache/conftool/dbconfig/20200907-051157-marostegui.json
  • 04:56 marostegui: Compress InnoDB on s1 eqiad master - this will generate a few day of lag on s1 and labsdb for enwiki T254462
  • 04:53 marostegui: Deploy schema change on db1109 (eqiad wikidata master) - T256685

2020-09-06

  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2127's weight a bit', diff saved to https://phabricator.wikimedia.org/P12496 and previous config saved to /var/cache/conftool/dbconfig/20200906-194512-marostegui.json
  • 08:20 elukey: powercycle mw1360 (mgmt console available, network errors while running anything)
  • 08:04 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1360.eqiad.wmnet
  • 08:01 elukey: executed "sudo ipmitool -I lanplus -H mw1360.mgmt.eqiad.wmnet -U root mc reset cold" from cumin (mgmt not available for mw1360)

2020-09-05

  • 00:23 foks: removing 2 files for legal compliance

2020-09-04

  • 22:15 ryankemper: wdqs deploy complete, service is healthy
  • 21:54 ryankemper: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:52 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:49 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@c7e6b35]: 0.3.47 (duration: 12m 55s)
  • 21:37 ryankemper: Tests on canary `wdqs1003` passing, beginning full wdqs deploy
  • 21:36 ryankemper@deploy1001: Started deploy [wdqs/wdqs@c7e6b35]: 0.3.47
  • 21:31 ryankemper: `ryankemper@wdqs2002:~$ sudo systemctl restart wdqs-blazegraph`
  • 21:06 mutante: apt1001 - removed all libnginx-mod* packages except libnginx-mod-http-echo ; sudo apt-get autoremove ; run puppet ; restarted nginx - apt.wikimedia.org switched to nginx-light (T261962)
  • 21:02 mutante: apt1001 - remove all libnginx-mod* packages except libnginx-mod-http-echo
  • 20:59 mutante: apt2001 - sudo apt-get autoremove
  • 20:51 mutante: apt2001 - apt-get remove --purge libnginx* and run puppet to replace nginx-full with nginx-light (T261962)
  • 20:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 mutante: Icinga - ACKing with sticky - alerts on test and dev hosts
  • 18:10 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing (duration: 07m 35s)
  • 18:02 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing
  • 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12492 and previous config saved to /var/cache/conftool/dbconfig/20200904-102955-marostegui.json
  • 10:28 marostegui: Deploy MCR schema change on db1087 (sanitarium master), this will generate lag (probably a few days) on s8 labsdb hosts T238966
  • 09:48 marostegui: Restart prometheus-mysqld-exporter on db2125
  • 09:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 08:31 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:29 elukey: roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades
  • 08:08 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:30 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 05:13 marostegui: Deploy MCR schema change on s4 eqiad master T238966
  • 01:51 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints (duration: 63m 18s)
  • 01:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:30 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 01:23 ryankemper: (Following the restart of blazegraph, service has been restored to `wdqs2003`. See https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599182219699&to=1599182547699)
  • 01:16 ryankemper: Glancing at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599170628749&to=1599182011243, looks like `wdqs2003`'s blazegaph isn't happy based off the null data entries. Restarting blazegraph: `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph`
  • 00:48 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints

2020-09-03

  • 23:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9394739: Start logging log-ins on select wikis (T253802) (duration: 00m 56s)
  • 21:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:55 milimetric@deploy1001: deploy aborted: AQS: Deploying new geoeditors endpoints (duration: 00m 13s)
  • 19:54 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints
  • 19:07 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149] (duration: 00m 08s)
  • 19:07 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149]
  • 19:06 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149] (duration: 09m 06s)
  • 18:57 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149]
  • 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:46 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:02 papaul: power down ores2009 for DIMM upgrade
  • 16:45 papaul: power down ores2008 for DIMM upgrade
  • 16:33 papaul: power down ores2007 for DIMM upgrade
  • 16:24 elukey: roll restart aqs on aqs1* to pick up new druid settings
  • 16:05 papaul: power down ores2006 for DIMM upgrade
  • 15:51 papaul: power down ores2005 for DIMM upgrade
  • 15:33 papaul: power down ores2004 for DIMM upgrade
  • 15:30 moritzm: installing nginx updates on apt* and htmldumper1001
  • 15:25 moritzm: installing firejail update (along with restarts) on thumbor1001, maps1001, restbase1016 (and -dev)
  • 15:22 papaul: power down ores2003 for DIMM upgrade
  • 15:17 moritzm: installing firejail security updates on parsoid servers
  • 15:08 papaul: power down ores2002 for DIMM upgrade
  • 14:53 papaul: power down ores2001 for DIMM upgrade
  • 14:36 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:30 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:29 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 06s)
  • 14:29 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 14:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 marostegui: Failover m5 (wikitech) master - T260324
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:43 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 18s)
  • 13:43 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 13:40 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me (duration: 01m 29s)
  • 13:39 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me
  • 13:32 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host (duration: 00m 05s)
  • 13:32 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host
  • 13:08 marostegui: Start pre m5 failover steps T260324
  • 12:46 marostegui: Deploy MCR schema change on s7 eqiad master (lag might show up) - T238966
  • 12:30 hnowlan: enabling puppet on appservers, finished rollout of api.wikimedia.org https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'Shift weights in s2 codfw to account for db2125 being down T260670', diff saved to https://phabricator.wikimedia.org/P12485 and previous config saved to /var/cache/conftool/dbconfig/20200903-121916-kormat.json
  • 12:17 moritzm: installing openexr security updates for stretch
  • 12:03 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2125 after hw issue', diff saved to https://phabricator.wikimedia.org/P12483 and previous config saved to /var/cache/conftool/dbconfig/20200903-120304-kormat.json
  • 11:45 moritzm: installing net-snmp security updates on Stretch
  • 11:45 moritzm: installing net-snmp security updates on Buster
  • 11:33 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix | phaste # T260320 # P12481
  • 11:28 moritzm: installing PHP 7.0 security updates
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 04281a0: Add extra namespaces for jawikivoyage (T260320) (duration: 01m 01s)
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 976d735: Lift IP cap on 2020-09-08 for Senior Citizen Write Wikipedia course - cs.wikipedia (T261882) (duration: 01m 01s)
  • 11:21 gilles@deploy1001: Synchronized static/images/project-logos: T252108 Deploying lossily optimised Wikipedia logos (duration: 01m 20s)
  • 10:50 hnowlan: disabling apache on appservers for rollout of https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 10:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:07 XioNoX: re-apply vlan 1118 firewall filter and update OSPF/bootp on cr1/2-eqiad - T261866
  • 09:57 XioNoX: rectification: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 on cr1-eqiad - T261866
  • 09:56 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12480 and previous config saved to /var/cache/conftool/dbconfig/20200903-095510-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12479 and previous config saved to /var/cache/conftool/dbconfig/20200903-095015-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12478 and previous config saved to /var/cache/conftool/dbconfig/20200903-094857-marostegui.json
  • 09:48 XioNoX: move VRRP master from cr1-eqiad:ae2.1118 to cr2-eqiad:xe-3/0/4.1118 - T261866
  • 09:46 XioNoX: move vlan 1118 IPv4 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12477 and previous config saved to /var/cache/conftool/dbconfig/20200903-094435-marostegui.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12476 and previous config saved to /var/cache/conftool/dbconfig/20200903-094043-marostegui.json
  • 09:38 XioNoX: move vlan 1118 IPv6 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12475 and previous config saved to /var/cache/conftool/dbconfig/20200903-093629-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12474 and previous config saved to /var/cache/conftool/dbconfig/20200903-093454-marostegui.json
  • 09:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12473 and previous config saved to /var/cache/conftool/dbconfig/20200903-092549-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316 db2087:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12472 and previous config saved to /var/cache/conftool/dbconfig/20200903-092028-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12471 and previous config saved to /var/cache/conftool/dbconfig/20200903-091834-marostegui.json
  • 09:13 XioNoX: rolled back: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2122', diff saved to https://phabricator.wikimedia.org/P12470 and previous config saved to /var/cache/conftool/dbconfig/20200903-090901-marostegui.json
  • 09:06 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P12469 and previous config saved to /var/cache/conftool/dbconfig/20200903-090419-marostegui.json
  • 09:01 XioNoX: force ae2.1118 VRRP master on cr1-eqiad - T261866
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317, db1098:3316', diff saved to https://phabricator.wikimedia.org/P12468 and previous config saved to /var/cache/conftool/dbconfig/20200903-090007-marostegui.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3317', diff saved to https://phabricator.wikimedia.org/P12467 and previous config saved to /var/cache/conftool/dbconfig/20200903-085838-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12466 and previous config saved to /var/cache/conftool/dbconfig/20200903-085708-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12465 and previous config saved to /var/cache/conftool/dbconfig/20200903-084910-marostegui.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P12464 and previous config saved to /var/cache/conftool/dbconfig/20200903-084836-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317, db1090:3312', diff saved to https://phabricator.wikimedia.org/P12463 and previous config saved to /var/cache/conftool/dbconfig/20200903-084358-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12462 and previous config saved to /var/cache/conftool/dbconfig/20200903-084147-marostegui.json
  • 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 T261917', diff saved to https://phabricator.wikimedia.org/P12461 and previous config saved to /var/cache/conftool/dbconfig/20200903-082956-marostegui.json
  • 08:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:28 moritzm: rebooting mwmaint1002 for kernel update
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12460 and previous config saved to /var/cache/conftool/dbconfig/20200903-082655-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12459 and previous config saved to /var/cache/conftool/dbconfig/20200903-082034-marostegui.json
  • 08:16 marostegui: Upgrade db1101 (s7 and s8)
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12458 and previous config saved to /var/cache/conftool/dbconfig/20200903-081543-marostegui.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1101:3317', diff saved to https://phabricator.wikimedia.org/P12457 and previous config saved to /var/cache/conftool/dbconfig/20200903-081503-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12456 and previous config saved to /var/cache/conftool/dbconfig/20200903-081337-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12455 and previous config saved to /var/cache/conftool/dbconfig/20200903-080714-marostegui.json
  • 08:06 marostegui: Upgrade and reboot db1127
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12454 and previous config saved to /var/cache/conftool/dbconfig/20200903-080634-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12453 and previous config saved to /var/cache/conftool/dbconfig/20200903-080024-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12452 and previous config saved to /var/cache/conftool/dbconfig/20200903-075443-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12451 and previous config saved to /var/cache/conftool/dbconfig/20200903-074922-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12450 and previous config saved to /var/cache/conftool/dbconfig/20200903-074827-marostegui.json
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:45 marostegui: Upgrade and reboot db1094
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12449 and previous config saved to /var/cache/conftool/dbconfig/20200903-074426-marostegui.json
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12448 and previous config saved to /var/cache/conftool/dbconfig/20200903-073718-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12447 and previous config saved to /var/cache/conftool/dbconfig/20200903-073116-marostegui.json
  • 07:29 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12446 and previous config saved to /var/cache/conftool/dbconfig/20200903-072716-marostegui.json
  • 07:24 hashar: contint2001: restarting CI Jenkins for plugins upgrade
  • 07:19 marostegui: Deploy schema change on s8 eqiad master T237120
  • 07:18 marostegui: Stop slave on s8 eqiad master (lag will appear on s8 eqiad) - T237120
  • 07:02 marostegui: Stop db2100:3317 and db2121 in sync to reload metawiki.content T261869
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12445 and previous config saved to /var/cache/conftool/dbconfig/20200903-070104-marostegui.json
  • 06:56 hashar: contint2001: restarting CI Jenkins
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 _joe_: deployment of mobileapps to pick up changes to envoy config, new helmfile layout
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12444 and previous config saved to /var/cache/conftool/dbconfig/20200903-065105-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12443 and previous config saved to /var/cache/conftool/dbconfig/20200903-064804-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12442 and previous config saved to /var/cache/conftool/dbconfig/20200903-064623-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12441 and previous config saved to /var/cache/conftool/dbconfig/20200903-064334-marostegui.json
  • 06:24 marostegui: Disconnect eqiad -> codfw replication

2020-09-02

  • 22:55 shdubsh: restart rsyslog on centrallog[12]001
  • 22:27 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 22:26 ryankemper: Puppet finished on all external wdqs codfw nodes, nginx automatically reloaded as intended
  • 22:24 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo run-puppet-agent"`
  • 21:48 bd808@deploy1001: Finished deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810) (duration: 01m 34s)
  • 21:46 bd808@deploy1001: Started deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810)
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart nginx.service"`
  • 21:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 ryankemper: Restarted nginx on `wdqs2007`
  • 21:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 ryankemper: restarted blazegraph on `wdqs2001` as well
  • 20:46 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal and not P{wdqs2001.codfw.wmnet}' "sudo systemctl restart wdqs-blazegraph.service"` (restarted everything but 2001, will restart 2001 next)
  • 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:20 robh: scs-c1-eqiad firmware update complete and back online T238036
  • 19:14 robh: updating firmware on scs-c1-eqiad via T238036
  • 19:14 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (duration: 00m 32s)
  • 18:58 herron: freeing some disk space on centrallog1001 with 'tune2fs -m 0 /dev/centrallog1001-vg/data'
  • 18:43 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled, ouch, forgot to rebase (duration: 00m 55s)
  • 18:40 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled (duration: 00m 55s)
  • 18:38 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:37 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-codfw - T261865
  • 18:36 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:622897 Install OAuthRateLimiter extension II: Add flag to IS (duration: 00m 56s)
  • 18:34 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-eqiad - T261865
  • 18:33 ppchelko@deploy1001: Synchronized wmf-config/extension-list: (no justification provided) (duration: 00m 54s)
  • 18:32 ottomata: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:28 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Fix parsing localised digits in PHP discussion parser (duration: 00m 56s)
  • 18:19 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Re-apply new reply API patches (again) (duration: 00m 58s)
  • 17:34 bstorm: re-enabled puppet on labsdb10[09-12]
  • 17:28 bstorm: disabled puppet on labsdb10[09-12]
  • 17:18 herron: restarted elasticsearch on logstash1012
  • 16:39 Pchelolo: creating oauth_ratelimit_client_tier table T258711
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:32 hnowlan: Temporarily disabling apache for configuration change T246945
  • 15:24 godog: prometheus codfw lvextend --resizefs --size +50G /dev/mapper/vg--ssd-prometheus--k8s
  • 15:19 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:16 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main
  • 15:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:31 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main eqiad - T261865
  • 14:29 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main codfw - T261865
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12434 and previous config saved to /var/cache/conftool/dbconfig/20200902-141854-marostegui.json
  • 13:05 elukey: run kafka preferred-replica-election on kafka-main codfw
  • 12:07 XioNoX: move vrrp master from cr2-codfw to cr1-codfw
  • 11:52 duesen__: daniel@mwmaint2001:/srv/mediawiki/php-1.36.0-wmf.6$ mwscript findBadBlobs.php testwiki --mark T251778
  • 11:36 Urbanecm: EU B&C done
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 796b4fa: Add title for apiportalwiki (T246945) (duration: 00m 56s)
  • 11:34 Urbanecm: Fetched extra commits to deploy1001's stagging dir, commit messages explains it's an accident, continuing; cc Krinkle
  • 11:31 duesen__: Deployed second security fix for T260485
  • 11:07 XioNoX: repool cr1-eqiad
  • 10:58 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:49 XioNoX: reboot cr1-eqiad:re0 (backup)
  • 10:45 jbond42: install apache updates on buster
  • 10:36 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:35 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 10:34 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 10:32 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 10:31 jbond42: install apache updates on jessie
  • 10:27 XioNoX: reboot cr1-eqiad:re1 (backup)
  • 10:18 XioNoX: move VRRP master from cr1 to cr2
  • 10:16 XioNoX: drain cr1-eqiad transit/transport/IX
  • 10:13 XioNoX: drain cr1-eqiad-pfw3-eqiad link
  • 10:04 XioNoX: repool cr2-eqiad
  • 09:55 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:46 XioNoX: reboot cr2-eqiad:re0 (backup) - T259621
  • 09:28 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:18 XioNoX: reboot cr2-eqiad:re1 (backup) - T259621
  • 09:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:13 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:13 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:12 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:11 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:07 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 09:01 elukey: reimage kafka-jumbo1004 to Buster
  • 08:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from s10 - T260324', diff saved to https://phabricator.wikimedia.org/P12432 and previous config saved to /var/cache/conftool/dbconfig/20200902-085705-marostegui.json
  • 08:52 XioNoX: deactivate cr2-eqiad transit/IX - T259621
  • 08:50 XioNoX: drain cr2-eqiad transport links - T259621
  • 08:20 XioNoX: activate Telia BGP in eqiad
  • 07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 elukey: reimage kafka-jumbo1003 to buster
  • 07:28 marostegui: Reboot dbstore1003 for kernel upgrade - T261389
  • 07:12 XioNoX: configure cr2-eqiad:ae5 as single LACP link to Telia
  • 07:05 marostegui: Drop unused grants on m5 T261152
  • 07:02 elukey: reboot kafka-jumbo1002 to pick up new kernel settings
  • 07:00 XioNoX: deactivate Telia BGP in eqiad
  • 06:38 elukey: powercycle analytics1059 - cpu soft locks on multiple CPUs
  • 06:30 elukey: reboot kafka-jumbo1001 to pick up new kernel settings
  • 06:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .

2020-09-01

  • 22:39 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=sysop_itwiki Pierpao (T261722)
  • 17:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:36 ryankemper: wdqs [canary] rollback complete, tests passing now. Will need to dig into source of failure
  • 17:35 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@7920fbe]: 0.3.46 (duration: 03m 43s)
  • 17:35 ryankemper: `wdqs1003` (the canary instance) is failing tests now, going to rollback
  • 17:32 ryankemper@deploy1001: Started deploy [wdqs/wdqs@7920fbe]: 0.3.46
  • 17:30 ryankemper: Starting wdqs deploy
  • 15:56 chasemp: labsdb* puppet agent --test; sudo /usr/local/sbin/maintain-views --all-databases --table user --replace-all; sudo /usr/local/sbin/maintain-views --all-databases --table user_old --replace-all
  • 15:25 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:28 _joe_: restarting envoy on all eqiad jobrunners
  • 14:22 _joe_: restarted confd on mwmaint1002
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:18 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2083 weight', diff saved to https://phabricator.wikimedia.org/P12429 and previous config saved to /var/cache/conftool/dbconfig/20200901-141521-marostegui.json
  • 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:07 rzl@cumin1001: MediaWiki read-only period ends at: 2020-09-01 14:07:36.305500
  • 14:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: MediaWiki read-only period starts at: 2020-09-01 14:02:04.851006
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:58 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:58 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:51 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:45 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:44 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 10:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:48 XioNoX: reserve cr2-eqiad:xe-3/3/7 for new Telia port
  • 09:38 jayme: systemctl restart docker-reporter-releng-images.service on deneb to clear out alert because of temporary HTTP 504 from debmonitor
  • 09:01 moritzm: installing Java 8 sec updates on contint*
  • 08:51 moritzm: uploaded apache 2.4.10-10+deb8u16+wmf1 for jessie-wikimedia
  • 07:11 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 07:05 moritzm: restarting jenkins on releases1002 to pick up Java security updates
  • 06:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:44 elukey: reimage kafka-jumbo1002 to Buster
  • 06:20 marostegui: Install query killers on db2137:3314 T243373
  • 01:17 chaomodus: updated the pynetbox package to 5.0.7 and uploaded to buster
  • 00:02 mutante: wb2-grrrri was not running and wikibugs had no more Gerrit updates since a while
  • 00:01 mutante: restarting wikibugs

2020-08-31

  • 23:38 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final) (duration: 00m 17s)
  • 23:38 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final)
  • 23:37 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001 (duration: 01m 12s)
  • 23:36 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001
  • 23:36 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001 (duration: 00m 58s)
  • 23:35 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2 (duration: 00m 05s)
  • 23:31 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next (duration: 00m 57s)
  • 23:30 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next
  • 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable (future) mw-reverted tag for all wikis except testwiki (T254074) (duration: 00m 57s)
  • 21:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:20 ryankemper: `sudo systemctl restart elasticsearch_6@production-search-psi-eqiad.service` on `elastic1052.eqiad.wmnet`
  • 18:38 Urbanecm: Morning B&C done
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 16197aa: Add two domains to wgCopyUploadsDomains for commonswiki (T261562; T261575) (duration: 00m 54s)
  • 18:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb28e9d: itwiki: Assign patrol right to autopatrolled instead of autoconfirmed (T261587) (duration: 00m 53s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: a1b0d6e: b609cd5: CommonSettings.php: limit new Echos `push-subscription-manager` group to Meta-Wiki (T261625) (duration: 00m 54s)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 846c544: wgEventStreams: Stream for MEP-iOS pilot (T260382) (duration: 00m 55s)
  • 17:21 volans: uploaded spicerack_0.0.42 to apt.wikimedia.org buster-wikimedia
  • 15:50 rzl@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
  • 15:49 ejegg: updated payments-wiki from ef7ebd08cb to be81063168
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=99)
  • 15:32 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 14:58 ema: Traffic: depool eqiad from user traffic T243316
  • 14:38 moritzm: installing rake security updates on stretch
  • 14:33 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:21 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:20 rzl@cumin1001: Switching services apertium, termbox, search, api-gateway, ores, sessionstore, eventgate-main, graphoid, eventstreams, wikifeeds, wdqs, parsoid, eventgate-logging-external, wdqs-internal, echostore, mathoid, mobileapps, proton, restbase, kartotherian, recommendation-api, eventgate-analytics-external, restbase-async, citoid, schema, cxserver, eventgate-analytics, zotero: eqiad => codfw
  • 14:20 rzl@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:13 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:12 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=99)
  • 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 13:41 andrewbogott: dropping many databases from m5, as per T261152
  • 13:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:07 marostegui: Failover m3 (phabricator) proxy from dbproxy1016 to dbproxy1020 - T261459
  • 13:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:54 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:54 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:53 oblivian@cumin2001: Switching services parsoid: eqiad => codfw
  • 12:53 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:48 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:45 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:45 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:44 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:44 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:44 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:43 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:37 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:14 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:14 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:13 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:13 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:13 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:10 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:05 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 11:58 elukey: reimage kafka-jumbo1001 to Buster
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: 5d583d9: Disable MediaSearch A/B test (duration: 00m 55s)
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 81f88fd: Enable Signature button on Wikiproject for hywiki (T261550) (duration: 00m 54s)
  • 11:22 jbond42: removing old hiera version 1 and 3 backends
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b74893f: Enable sitenotice on mobile for closed wikis (T261357) (duration: 00m 56s)
  • 11:02 volans: upgraded spicerack to 0.0.41 on cumin hosts
  • 10:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:51 elukey: executed /srv/phab/phabricator/bin/remove destroy @klausman on phab1001 (following https://wikitech.wikimedia.org/wiki/Phabricator#Delete_a_user) to clear incosistent state of new account (wrong email address)
  • 08:43 moritzm: installing bind9 security updates on stretch/buster (client-side tools/libs only)
  • 07:53 volans: uploaded spicerack_0.0.41 to apt.wikimedia.org buster-wikimedia
  • 07:30 moritzm: installing squid security updates
  • 07:24 moritzm: installing openexr security updates on buster
  • 07:12 marostegui: Sanitize jawikivoyage on db2094:3325 and db1124:3325 T260482
  • 06:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:06 elukey: reimage kafka-jumbo1005 to Debian Buster
  • 05:21 marostegui: Reload haproxy on dbproxy1017 and dbproxy1021 to test db1128

2020-08-30

  • 16:13 herron: restarted eqiad v5 logstashes

2020-08-29

  • 18:05 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)
  • 17:45 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)

2020-08-28

  • 21:53 ryankemper: `sudo systemctl reload nginx.service` on `cloudelastic100[5,6].wikimedia.org` to try to resolve certificate warning issues
  • 19:11 andrewbogott: rebooting cloudvirt1006. It's a spare, unused system but showing a bus error and icinga alerts; not worth saving if it needs saving
  • 17:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:39 mutante: shutting down mw2196
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:40 rzl: switchdc live test complete
  • 16:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 16:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 16:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 16:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 16:33 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-28 16:29:24.432463
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-28 16:28:07.882663
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 16:19 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 16:19 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 16:13 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 16:12 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 16:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 16:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 16:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 16:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 16:06 rzl: starting one more live test of the data center switchover automation, no production impact is expected but there will be some SAL noise
  • 14:22 moritzm: installing Java security updates on kafka/main and Logstash(5) clusters
  • 13:35 hashar@deploy1001: Finished deploy [integration/docroot@65ec92c]: noop, sync up for README.md (duration: 00m 07s)
  • 13:35 hashar@deploy1001: Started deploy [integration/docroot@65ec92c]: noop, sync up for README.md
  • 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 elukey: stop kafka on kafka-jumbo1006 and reimage to buster
  • 12:56 moritzm: installing debmonitor1002 T261492
  • 12:46 moritzm: installing debmonitor2002 T261492
  • 11:50 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:27 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 09:48 jayme: updated helm to 2.16.9-3 on chartmuseum*, contint*, deploy*
  • 09:19 jayme: imported helm_2.16.9-3 to buster-wikimedia, stretch-wikimedia, jessie-wikimedia
  • 08:22 kormat: enabling replication from db2112 to db1083 (s1) T243373
  • 07:41 jynus: restart backup2001,backup1002
  • 07:10 jynus: restart db2139
  • 07:07 marostegui: Warm up parsercache in codfw - T260042
  • 06:47 jynus: restart db2102
  • 06:28 jynus: restart db2100
  • 06:07 jynus: restart db2099
  • 05:50 jynus: restart db2098
  • 00:06 eileen: process-control config revision is dd541a25dc

2020-08-27

  • 23:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:48 eileen: civicrm revision changed from a942537984 to 3d501e71d9, config revision is dd541a25dc
  • 22:54 eileen: civicrm revision changed from 481ab742db to a942537984, config revision is e2ab4d7c1f
  • 22:28 tzatziki: removing one file for legal compliance
  • 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 22:18 volans: uploaded spicerack_0.0.40-1_amd64.deb to apt.wikimedia.org buster-wikimedia
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:29 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:22 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:10 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw221[0-4].codfw.wmnet
  • 20:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw220[0-9].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw214[0-7].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw213[0-9].codfw.wmnet
  • 20:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Streams for testing MEP-based analytics instruments - T259714 (duration: 00m 55s)
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 marxarelli: 1.36.0-wmf.6 promoted to all wikis (T257974). new errors appear to be related to T261345 but are known since 1.36.0-wmf.5
  • 19:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw21[8-9][0-9]*.codfw.wmnet
  • 19:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.6
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
  • 19:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:16 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating apiportalwiki (T246945)
  • 19:15 urbanecm@deploy1001: Synchronized dblists: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:14 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:11 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 18:54 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 00m 08s)
  • 18:54 mforns@deploy1001: Started deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:53 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 10m 01s)
  • 18:43 mforns@deploy1001: Started deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:43 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Assign all homepage users to variant A (duration: 01m 03s)
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on ruwiki (T257490) (duration: 01m 03s)
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2250.codfw.wmnet,service=canary
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2249.codfw.wmnet,service=canary
  • 18:16 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:16 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:14 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=jobrunner,name=mw1318.eqiad.wmnet
  • 18:07 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw229[1-9].codfw.wmnet,cluster=api_appserver
  • 18:06 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2290.codfw.wmnet,cluster=api_appserver
  • 18:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw22[6-8][0-9].codfw.wmnet,cluster=api_appserver
  • 18:03 Urbanecm: Creating jawikivoyage is done (T260320)
  • 18:02 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
  • 18:02 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[0-9].codfw.wmnet,cluster=api_appserver
  • 18:00 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating jawikivoyage (T260320) (duration: 01m 02s)
  • 17:59 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw224[4-5].codfw.wmnet,service=canary
  • 17:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[4-5].codfw.wmnet
  • 17:59 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:58 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating jawikivoyage (T260320)
  • 17:57 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[0-3].codfw.wmnet
  • 17:56 urbanecm@deploy1001: Synchronized dblists: Creating jawikivoyage (T260320) (duration: 00m 58s)
  • 17:56 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw221[5-9].codfw.wmnet,service=canary
  • 17:55 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw221[5-9].codfw.wmnet
  • 17:55 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating jawikivoyage (T260320) (duration: 01m 07s)
  • 17:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw220[1-9].codfw.wmnet
  • 17:52 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw220[1-9].codfw.wmnet
  • 17:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2200.codfw.wmnet
  • 17:50 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2200.codfw.wmnet
  • 17:48 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw214[0-7].codfw.wmnet
  • 17:47 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw213[5-9].codfw.wmnet
  • 17:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw214[0-7].codfw.wmnet
  • 17:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw213[5-9].codfw.wmnet
  • 17:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw23[0-7][0-9].codfw.wmnet
  • 17:31 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[0-7].codfw.wmnet,service=canary
  • 17:30 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw227[0-7].codfw.wmnet
  • 17:29 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:29 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 17:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw226[8-9].codfw.wmnet
  • 17:13 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[4-8].codfw.wmnet
  • 17:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:11 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[0-2].codfw.wmnet
  • 17:04 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw223[2-9].codfw.wmnet
  • 17:01 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2231.codfw.wmnet
  • 16:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2230.codfw.wmnet
  • 16:54 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[4-9].codfw.wmnet
  • 16:49 mutante: re-weighted appservers and api appservers in eqiad - hardware type G = weight 25, all other types = weight 30 (T261159)
  • 16:48 mutante: depooling mw2187 - mw2199 - old codfw appservers of type A to be decom'ed, previously weight 10 (T260654)
  • 16:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw219[0-9].codfw.wmnet
  • 16:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw218[7-9].codfw.wmnet
  • 16:35 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1297.eqiad.wmnet
  • 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:21 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[0-5].eqiad.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw126[1-5].eqiad.wmnet,service=canary
  • 16:14 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw126[1-9].eqiad.wmnet
  • 16:12 elukey: remove some old/stale terms from analytics-in4 on cr1/cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622746, https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622744)
  • 16:09 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[6-9].eqiad.wmnet,service=canary
  • 16:08 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[6-9].eqiad.wmnet
  • 16:06 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1290.eqiad.wmnet
  • 16:05 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw128[0-9].eqiad.wmnet
  • 15:52 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1290.eqiad.wmnet
  • 15:51 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw128[0-9].eqiad.wmnet
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[7-9].eqiad.wmnet,service=canary
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1276.eqiad.wmnet,service=canary
  • 15:41 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw127[6-9].eqiad.wmnet
  • 15:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1297.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1269.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1267.eqiad.wmnet
  • 14:48 moritzm: installing Java security updates on aqs, hadoop and kafka-jumbo
  • 14:44 moritzm: restarting tomcat on idp-test* hosts to pick up Java update
  • 14:42 elukey: add eventgate-related terms to analytics-in4 filter on cr1/cr2-eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622705)
  • 14:37 moritzm: imported openjdk 8u265-b01-1~deb10u1 to buster-wikimedia (forward port of latest Java 8 security update)
  • 14:31 papaul: replacing msw-c5,c6,c7 and fmsw-c8
  • 13:58 kormat: disabling GTID on pc2007 (pc1), pc2008 (pc2), pc2009 (pc3) T243373
  • 13:56 kormat: disabling GTID on db2096 (x1), es2021 (es4), es2023 (es5) T243373
  • 13:54 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:53 kormat: disabling GTID on db2129 (s6), db2118 (s7), db2079 (s8) T243373
  • 13:52 kormat: disabling GTID on db2123 (s5) T243373
  • 13:52 kormat: disabling GTID on db2090 (s4) T243373
  • 13:51 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:51 kormat: disabling GTID on db2105 (s3) T243373
  • 13:50 kormat: disabling GTID on db2107 (s2) T243373
  • 13:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:29 elukey: restart jvm daemons on analytics1042, aqs1004, kafka-jumbo1001 to pick up new openjdk upgrades (canaries)
  • 13:18 kormat: enabling replication from db2107 to db1122 (s2) T243373
  • 13:14 kormat: enabling replication from db2096 to db1103 (x1) T243373
  • 13:10 jynus: restart db2097
  • 13:07 jbond42: deploy python3.4 security update to kraz
  • 13:03 jbond42: deploy python3.4 security update to canaries on jessie
  • 13:01 kormat: enabling replication from db2118 to db1086 (s7) T243373
  • 12:52 jynus: restart db1140
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s8 weights T243373', diff saved to https://phabricator.wikimedia.org/P12402 and previous config saved to /var/cache/conftool/dbconfig/20200827-124338-marostegui.json
  • 12:35 jynus: restart db1139
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12401 and previous config saved to /var/cache/conftool/dbconfig/20200827-123028-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12400 and previous config saved to /var/cache/conftool/dbconfig/20200827-123003-marostegui.json
  • 12:24 marostegui: Fix password format for in db2129 (s6 codfw master) T243373
  • 12:14 kormat: enabling replication from db2129 to db1093 (s6) T243373
  • 12:13 jynus: restart db1095
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights T243373', diff saved to https://phabricator.wikimedia.org/P12399 and previous config saved to /var/cache/conftool/dbconfig/20200827-120816-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12398 and previous config saved to /var/cache/conftool/dbconfig/20200827-120211-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 eqiad weights T243373', diff saved to https://phabricator.wikimedia.org/P12397 and previous config saved to /var/cache/conftool/dbconfig/20200827-115934-marostegui.json
  • 11:56 Urbanecm: Lift range blocks exceeding wgBlockCIDRLimit via custom script from F32197596 (ruwiki, ruwikiquote; T243980)
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s4 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12396 and previous config saved to /var/cache/conftool/dbconfig/20200827-115110-marostegui.json
  • 11:49 moritzm: uploaded python3.4 3.4.2-1+deb8u7+wmf1 for jessie-wikimedia T259102
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12395 and previous config saved to /var/cache/conftool/dbconfig/20200827-114509-marostegui.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust db2126 weight T243373', diff saved to https://phabricator.wikimedia.org/P12394 and previous config saved to /var/cache/conftool/dbconfig/20200827-112213-marostegui.json
  • 11:12 Urbanecm: EU B&C done
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 34994d3: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki (T131300) (duration: 01m 03s)
  • 10:57 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:56 godog: bounce grafana to apply new settings
  • 10:51 kormat: enabling replication from db2123 to db1100 (s5) T243373
  • 10:48 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:30 kormat: enabling replication from es2023 to es1024 (es5) T243373
  • 10:28 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:23 kormat: enabling replication from es2021 to es1021 (es4) T243373
  • 10:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 10:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:03 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 09:54 moritzm: installing Java security updates on IDP* hosts
  • 09:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:43 elukey: decommissioning vms schema[12]00[12] (replaced previously by schema[12]00[34] buster vms)
  • 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:41 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:20 kormat: enabling replication from db2105 to db1123 (s3) T243373
  • 09:15 kormat: enabling replication from db2079 to db1109 (s8) T243373
  • 09:07 kormat: enabling replication from db2090 to db1081 (s4) T243373
  • 08:53 kormat: enabling replication from pc2009 to pc1009 (pc3) T243373
  • 08:44 kormat: enabling replication from pc2008 to pc1008 (pc2) T243373
  • 08:13 marostegui: Enable replication codfw -> eqiad on pc1 T243373
  • 08:01 gehel: manual cleanup of stale wdqs deploy crontab on wdqs1009
  • 07:35 marostegui: Move pc2010 under pc2007 T243373
  • 07:16 moritzm: installing ghostscript security updates on stretch
  • 06:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 06:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12392 and previous config saved to /var/cache/conftool/dbconfig/20200827-060652-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12391 and previous config saved to /var/cache/conftool/dbconfig/20200827-055815-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12390 and previous config saved to /var/cache/conftool/dbconfig/20200827-055522-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12389 and previous config saved to /var/cache/conftool/dbconfig/20200827-055126-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12388 and previous config saved to /var/cache/conftool/dbconfig/20200827-055104-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12387 and previous config saved to /var/cache/conftool/dbconfig/20200827-054259-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 db1085 db1078', diff saved to https://phabricator.wikimedia.org/P12386 and previous config saved to /var/cache/conftool/dbconfig/20200827-054114-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12385 and previous config saved to /var/cache/conftool/dbconfig/20200827-053814-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12384 and previous config saved to /var/cache/conftool/dbconfig/20200827-053558-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12383 and previous config saved to /var/cache/conftool/dbconfig/20200827-053509-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12382 and previous config saved to /var/cache/conftool/dbconfig/20200827-053100-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P12381 and previous config saved to /var/cache/conftool/dbconfig/20200827-052925-marostegui.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P12380 and previous config saved to /var/cache/conftool/dbconfig/20200827-052818-marostegui.json
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P12379 and previous config saved to /var/cache/conftool/dbconfig/20200827-052413-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12378 and previous config saved to /var/cache/conftool/dbconfig/20200827-051609-marostegui.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12377 and previous config saved to /var/cache/conftool/dbconfig/20200827-051546-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12376 and previous config saved to /var/cache/conftool/dbconfig/20200827-050754-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P12375 and previous config saved to /var/cache/conftool/dbconfig/20200827-050727-marostegui.json
  • 04:53 marostegui: Stop db1074 and db2107 in sync to fix drifts on s2 change_tag - T260042
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P12374 and previous config saved to /var/cache/conftool/dbconfig/20200827-045329-marostegui.json
  • 04:04 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1006.wikimedia.org
  • 04:03 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1005.wikimedia.org
  • 04:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cloudelastic1005.wikimedia.org
  • 02:03 mutante: shutting down install3001,install4001,install5001 VMs (no OS yet, but please also don't delete, debugging in progress, shutting them down until I continue on T254157)

2020-08-26

  • 23:35 eileen: civicrm revision changed from d2e80f7522 to 481ab742db, config revision is e2ab4d7c1f
  • 23:00 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:51 XioNoX: standardize pfw3-eqiad
  • 19:33 marxarelli: 1.36.0-wmf.6 promoted to group1 (T257974). logs show no new errors
  • 19:24 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.6 (duration: 01m 03s)
  • 19:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.6
  • 18:21 Urbanecm: Morning B&C done
  • 18:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 945b97c: Added import sources for mlwiktionary (T260716) (duration: 01m 05s)
  • 18:12 Urbanecm: Purge Thai and Greek taglines, URLs are at P12372 (T258552)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4009289: Update Thai and Greek taglines (T258552) (duration: 01m 03s)
  • 18:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 4009289: Update Thai and Greek taglines (T258552) (duration: 01m 05s)
  • 18:08 herron: upgraded eqiad elk v7 cluster from 7.8.0 to 7.9.0 T234854
  • 18:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client side error logging on hewiki (T255585) (duration: 01m 04s)
  • 17:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Documentation-only change; sync for line sanity (duration: 01m 04s)
  • 17:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T254349 Set wgVisualEditorEnableBetaFeature true on wikis that need it (duration: 01m 03s)
  • 15:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:41 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:11 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for MCR change', diff saved to https://phabricator.wikimedia.org/P12371 and previous config saved to /var/cache/conftool/dbconfig/20200826-145612-marostegui.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12370 and previous config saved to /var/cache/conftool/dbconfig/20200826-145531-marostegui.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12369 and previous config saved to /var/cache/conftool/dbconfig/20200826-144750-marostegui.json
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1002.eqiad.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1001.eqiad.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2001.codfw.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2002.codfw.wmnet
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12368 and previous config saved to /var/cache/conftool/dbconfig/20200826-143623-marostegui.json
  • 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2003.codfw.wmnet
  • 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2004.codfw.wmnet
  • 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1004.eqiad.wmnet
  • 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1003.eqiad.wmnet
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12367 and previous config saved to /var/cache/conftool/dbconfig/20200826-142746-marostegui.json
  • 14:25 jgleeson: updated civicrm from 0f195c6cca to d2e80f7522
  • 14:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:20 marostegui: Upgrade mysql on db1091 after MCR changes
  • 14:13 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:37 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 100% T261276', diff saved to https://phabricator.wikimedia.org/P12366 and previous config saved to /var/cache/conftool/dbconfig/20200826-133753-kormat.json
  • 13:18 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php dewiki --mark T205936 --revisions - < ~/T205936-dewiki-20050512070000.ids # marking known bad revisions for T205936
  • 13:17 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 75% T261276', diff saved to https://phabricator.wikimedia.org/P12365 and previous config saved to /var/cache/conftool/dbconfig/20200826-131732-kormat.json
  • 13:16 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php oswiki --mark T205936 --revisions - < ~/T205936-oswiki-20090309200000.ids # marking known bad revisions for T205936
  • 13:07 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 50% T261276', diff saved to https://phabricator.wikimedia.org/P12364 and previous config saved to /var/cache/conftool/dbconfig/20200826-130735-kormat.json
  • 13:06 vgutierrez: serve a synthetic warn page to DHE-RSA-AES128-SHA users - T258405
  • 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 30% T261276', diff saved to https://phabricator.wikimedia.org/P12363 and previous config saved to /var/cache/conftool/dbconfig/20200826-124700-kormat.json
  • 12:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 20% T261276', diff saved to https://phabricator.wikimedia.org/P12362 and previous config saved to /var/cache/conftool/dbconfig/20200826-122059-kormat.json
  • 12:12 godog: upgrade nagios-nrpe-server to 2.15-2 on jessie hosts - T261198
  • 11:58 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1110 T261276', diff saved to https://phabricator.wikimedia.org/P12361 and previous config saved to /var/cache/conftool/dbconfig/20200826-115850-kormat.json
  • 11:56 mlitn@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 00s)
  • 11:55 mlitn@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 08s)
  • 11:53 kart_: Finished manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)
  • 11:39 kart_: Started manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)
  • 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable propagateChangeVisibility for testwikidata, part 2 (duration: 01m 03s)
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable propagateChangeVisibility for testwikidata, part 1 (duration: 01m 19s)
  • 10:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 XioNoX: re-enable IPv6 BGP to Init7 in knams
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 replication broken', diff saved to https://phabricator.wikimedia.org/P12360 and previous config saved to /var/cache/conftool/dbconfig/20200826-084044-marostegui.json
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for MCR change', diff saved to https://phabricator.wikimedia.org/P12358 and previous config saved to /var/cache/conftool/dbconfig/20200826-054557-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12357 and previous config saved to /var/cache/conftool/dbconfig/20200826-054409-marostegui.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12356 and previous config saved to /var/cache/conftool/dbconfig/20200826-053345-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12355 and previous config saved to /var/cache/conftool/dbconfig/20200826-052355-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12354 and previous config saved to /var/cache/conftool/dbconfig/20200826-050849-marostegui.json
  • 05:03 marostegui: Update db1135 and db1114 after MCR changes

2020-08-25

  • 21:51 mutante: xhgui1001/xhgui2001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) (T260397)
  • 21:50 mutante: xhgui1001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) ...
  • 21:46 mutante: importing xhgui 0.12.0-2-wmf1 to buster-wikimedia APT repo (T260397)
  • 19:40 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import (duration: 00m 54s)
  • 19:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import
  • 19:15 marxarelli: 1.36.0-wmf.6 promoted to group0 (T257974). no new errors
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.6
  • 19:05 moritzm: installing Java security updates on cloudelastic* hosts
  • 19:02 moritzm: installing Java security updates on elastic* hosts
  • 18:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:58 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.6 (duration: 41m 58s)
  • 17:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import (duration: 01m 52s)
  • 17:28 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import
  • 17:17 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.6
  • 17:08 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.4 (duration: 01m 40s)
  • 17:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.3 (duration: 19m 12s)
  • 17:01 herron: imported logstash, elasticsearch, and kibana 7.9.0 -oss packages into buster-wikimedia thirdparty/elastic79
  • 16:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import (duration: 00m 49s)
  • 16:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import
  • 16:21 shdubsh: restart logstash on logstash1007 -- gc duration outlier
  • 16:08 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import (duration: 00m 54s)
  • 16:07 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import
  • 16:00 gehel: repool wdqs1005 - catched up on lag
  • 15:47 elukey: restart mariadb@analytics_meta on db1108 to apply a replication filter (exclude superset_staging database from replication)
  • 15:44 jgleeson: fundraising-tools updated from dcad0bfe75 to 3fe3a23114
  • 15:41 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import (duration: 01m 38s)
  • 15:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import
  • 15:22 liw: testing upcoming Scap release on beta
  • 14:56 moritzm: installing rake security updates on stretch
  • 14:56 moritzm: installing take security updates on stretch
  • 14:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 14:32 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 14:32 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 14:26 XioNoX: disable IPv6 BGP to Init7 in knams
  • 14:10 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538 (duration: 03m 50s)
  • 14:06 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for MCR change', diff saved to https://phabricator.wikimedia.org/P12347 and previous config saved to /var/cache/conftool/dbconfig/20200825-135248-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'fully repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12346 and previous config saved to /var/cache/conftool/dbconfig/20200825-134736-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12345 and previous config saved to /var/cache/conftool/dbconfig/20200825-133734-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12344 and previous config saved to /var/cache/conftool/dbconfig/20200825-132027-marostegui.json
  • 13:17 moritzm: installing firejail security updates on remaining mw* servers in eqiad
  • 12:56 godog: upgrade nagios-nrpe-server on scb2* and mwlog* - T261198
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12343 and previous config saved to /var/cache/conftool/dbconfig/20200825-125108-marostegui.json
  • 12:45 marostegui: Update MySQL on db1111 after MCR change
  • 12:39 marostegui: alter table sites on s6, directly on the primary master T260476
  • 12:39 godog: test nagios-nrpe-server with dh 2048 on scb2001 - T261198
  • 12:35 moritzm: imported ceph packages from stretch-backports to component/ceph T256877
  • 12:10 moritzm: installing ruby-json security updates
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 MCR change', diff saved to https://phabricator.wikimedia.org/P12341 and previous config saved to /var/cache/conftool/dbconfig/20200825-120708-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12340 and previous config saved to /var/cache/conftool/dbconfig/20200825-120211-marostegui.json
  • 11:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12339 and previous config saved to /var/cache/conftool/dbconfig/20200825-114938-marostegui.json
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12338 and previous config saved to /var/cache/conftool/dbconfig/20200825-113758-marostegui.json
  • 11:36 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12337 and previous config saved to /var/cache/conftool/dbconfig/20200825-112859-marostegui.json
  • 11:25 marostegui: Upgrade mysql on db1118 after MCR change
  • 11:16 Urbanecm: EU B&C done
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d869e30: Enable ContentTranslation as a default tool in Assamese and Burmese WPs (T258503; T258505) (duration: 01m 00s)
  • 10:59 moritzm: installing remaining libx11 security updates
  • 10:37 arturo: import all binary packages from tesseract-ocr-lang into stretch-wikimedia/component/tesseract-410-bpo (T247422)
  • 10:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:23 moritzm: removed fermium.wikimedia.org from debmonitor
  • 09:45 marostegui: Create missing table cx_notification_log on x1 wikishared T261190
  • 08:50 XioNoX: re-activate eqord peering/transit - T259593
  • 08:19 XioNoX: reconfigure eqord to be AS65020 - T259593
  • 08:18 XioNoX: deactivate eqord peering/transit - T259593
  • 07:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 07:13 marostegui: Upgrade MySQL on dbstore1004
  • 07:09 dcausse: depooling wdqs1005 (high lag)
  • 07:04 dcausse: restartint blazegraph on wdqs1005 (T242453)
  • 06:20 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111, db1118 for MCR change', diff saved to https://phabricator.wikimedia.org/P12336 and previous config saved to /var/cache/conftool/dbconfig/20200825-053856-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12335 and previous config saved to /var/cache/conftool/dbconfig/20200825-053801-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12334 and previous config saved to /var/cache/conftool/dbconfig/20200825-052602-marostegui.json
  • 05:21 moritzm: installing Java security updates on relforge*
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12333 and previous config saved to /var/cache/conftool/dbconfig/20200825-051327-marostegui.json
  • 05:11 marostegui: Remove revisions triggers from db2094:3311 T238966
  • 05:10 marostegui: Deploy MCR schema change on s1 codfw, this will create lag on s1 codfw - T238966
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12332 and previous config saved to /var/cache/conftool/dbconfig/20200825-050451-marostegui.json
  • 04:02 ejegg: updated fundraising python tools from 305f2a4438 to dcad0bfe75
  • 01:49 eileen: civicrm revision changed from ce28723709 to 0f195c6cca, config revision is 96839009f1
  • 01:39 eileen: civicrm revision is ce28723709, config revision is 96839009f1
  • 01:30 eileen: civicrm revision is ce28723709, config revision is 54c8c7abf2
  • 01:17 cdanis: repool esams
  • 01:11 cdanis: T259621 wrong junos version was staged on cr2-esams, abandoning this attempt and putting back in service
  • 01:07 cdanis: cdanis@re0.cr2-esams> request system software add validate re1 /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz
  • 00:56 cdanis: T259621 ❌cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 homer 'cr*' commit 'drain cr2-esams transport link'
  • 00:36 cdanis: T259621 cdanis@re1.cr3-esams> request chassis routing-engine master switch
  • 00:30 cdanis: T259621 cdanis@re1.cr3-esams> request vmhost reboot re0
  • 00:24 cdanis: T259621 cdanis@re1.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re0
  • 00:18 cdanis: T259621 cdanis@re0.cr3-esams> request chassis routing-engine master switch
  • 00:14 cdanis: T259621 cdanis@re0.cr3-esams> request vmhost reboot re1
  • 00:08 cdanis: T259621 cdanis@re0.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re1

2020-08-24

  • 23:46 cdanis: depool esams T259621
  • 23:16 Urbanecm: Evening B&C window done
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 778f710: Alternate configuration mechanism for Parsoid (T241961) (duration: 00m 58s)
  • 22:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:29 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deployed additional mitigations for T257687 (duration: 00m 58s)
  • 20:29 rzl: re-enabled puppet on 'R:File = /etc/nutcracker/nutcracker.yml' T261154
  • 19:25 rzl: disabling puppet on 'R:File = /etc/nutcracker/nutcracker.yml' to swap mc2028 out for mc2037 T261154
  • 18:10 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Increase weight of grants and research namespaces in metawiki search (duration: 00m 58s)
  • 15:20 jynus: shutdown backup2001 T260764
  • 15:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:08 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:04 vgutierrez: rolling restart of ats-tls to disable ECDHE-RSA-AES128-SHA - T258405
  • 14:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:55 rzl: switchover test complete, puppet re-enabled on cumin1001
  • 14:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:53 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:52 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:48 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:47 godog: powercycle ganeti5002 -- host down and nothing in console
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:43 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-24 14:43:35.570234
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:42 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
  • 14:42 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:42 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:41 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-24 14:41:55.754938
  • 14:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 14:41 dcausse: creating cirrus indices for lldwiki
  • 14:39 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:24 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:24 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:22 moritzm: installing libexif security updates on stretch
  • 14:18 rzl: disabling puppet on cumin1001 and starting a test of the DC switchover automation, expect some SAL noise but no production impact
  • 14:08 duesen: Deployed patch for T260485
  • 13:59 marostegui: Stop mysql on db1117:3325 to clone db1128 - T260324
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for MCR change', diff saved to https://phabricator.wikimedia.org/P12327 and previous config saved to /var/cache/conftool/dbconfig/20200824-135538-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318 after MCR change', diff saved to https://phabricator.wikimedia.org/P12326 and previous config saved to /var/cache/conftool/dbconfig/20200824-133032-marostegui.json
  • 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12325 and previous config saved to /var/cache/conftool/dbconfig/20200824-131305-marostegui.json
  • 13:05 moritzm: installing imagemagick security updates on stretch
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12323 and previous config saved to /var/cache/conftool/dbconfig/20200824-130024-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12322 and previous config saved to /var/cache/conftool/dbconfig/20200824-125131-marostegui.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for MCR change', diff saved to https://phabricator.wikimedia.org/P12321 and previous config saved to /var/cache/conftool/dbconfig/20200824-122848-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 after MCR change', diff saved to https://phabricator.wikimedia.org/P12320 and previous config saved to /var/cache/conftool/dbconfig/20200824-122752-marostegui.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12319 and previous config saved to /var/cache/conftool/dbconfig/20200824-122050-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12318 and previous config saved to /var/cache/conftool/dbconfig/20200824-121200-marostegui.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12317 and previous config saved to /var/cache/conftool/dbconfig/20200824-120310-marostegui.json
  • 12:01 Urbanecm: EU B&C window completed
  • 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8c380d6: Enable tewiki as import source for tewikibooks (T260107) (duration: 00m 57s)
  • 11:58 XioNoX: test advertise CF tunnel endpoint on cr1-eqiad - T259036
  • 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5a6d025: Add retrobibliothek.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T261012) (duration: 00m 56s)
  • 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e1ae39a: Enable mapframe at trwiki (T260594) (duration: 00m 58s)
  • 11:43 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: 1066ecb: Enable MediaSearch A/B test (T254388) (duration: 00m 56s)
  • 11:42 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/ContentTranslation/modules/publish/ext.cx.wikibase.link.js: 74a8718: Publish: Fix broken wikidata linking (T249458) (duration: 00m 58s)
  • 11:39 Urbanecm: Purge 13 URLs with purgeList.php, see P12316 for list of them (T260908; T258552; T261076; T261110)
  • 11:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 arturo: add liblept5 1.76.0-1~bpo9+1 (and leptonica-progs) to stretch-wikimedia/component/tesseract-410-bpo (T247422)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fe0449d: 74220d0: 7db8a19: Update Chinese wordmarks and taglines, update zhwikisource project logo (T260908; T258552; T261076; T261110) (duration: 00m 59s)
  • 11:29 urbanecm@deploy1001: Synchronized static/images/: fe0449d: 74220d0: 7db8a19: Update Chinese wordmarks and taglines, update zhwikisource project logo (T260908; T258552; T261076; T261110) (duration: 00m 58s)
  • 11:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:43 moritzm: installing ruby2.3 security updates
  • 10:12 moritzm: installing firejail security updates on mw canaries
  • 09:58 oblivian@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=appserver,service=canary
  • 09:46 XioNoX: add PNI to CF on cr1-eqiad with import/export NONE - T259036
  • 09:18 moritzm: restarting mw canaries to pick up libx11 update
  • 09:13 moritzm: installing libx11 security updates on stretch
  • 09:10 vgutierrez: repool cp5002
  • 09:08 _joe_: restarting php-fpm on mw1344 (stuck in SIGILL for new children)
  • 09:00 vgutierrez: restart ats-tls on cp5002
  • 08:54 moritzm: installing net-snmp security updates on buster
  • 08:52 ema: depool cp5002 due to icinga errors
  • 08:24 moritzm: installing json-c security updates on buster
  • 07:36 XioNoX: push new pfw policies - T261007
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1105:3311 for MCR change', diff saved to https://phabricator.wikimedia.org/P12315 and previous config saved to /var/cache/conftool/dbconfig/20200824-052916-marostegui.json

2020-08-23

  • 20:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 gehel: repool wdqs1006 - catched up on lag

2020-08-22

  • 19:33 ryankemper: depooled wdqs1006 (still has 2.5 hours to catch up on)
  • 19:31 ryankemper: pooled wdqs1006 now that lag has dissipated
  • 07:36 gehel: restart blazegraph on wdqs1006 + depool to catchup on lag
  • 05:24 legoktm: legoktm@mwmaint1002:~$ echo "https://releases.wikimedia.org/mediawiki/1.35/" | mwscript purgeList.php --wiki=aawiki

2020-08-21

  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:17 zpapierski@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification (duration: 00m 50s)
  • 16:16 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification
  • 16:15 zpapierski@deploy1001: deploy aborted: .. (duration: 00m 01s)
  • 16:15 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: ..
  • 13:25 jayme@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=termbox,name=codfw
  • 13:25 jayme@cumin1001: conftool action : set/pooled=False; selector: dnsdisc=termbox,name=codfw
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 09:02 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 01:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-08-20

  • 22:31 eileen: civicrm revision changed from 27d5900f7d to ce28723709, config revision is 706cf3c898
  • 22:20 eileen: civicrm revision is 27d5900f7d, config revision is 706cf3c898
  • 22:20 mutante: permanently shut down tungsten.eqiad.wmnet T260395 T158837 T180761 T224549
  • 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:35 ejegg: updated fundraising CiviCRM from 958a79f660 to 27d5900f7d
  • 20:53 cdanis: repool eqsin
  • 20:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:25 cdanis: cdanis@cr2-eqsin> request vmhost reboot
  • 20:17 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:13 cdanis: cdanis@cr2-eqsin> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-18.2R3-S5.3.tgz
  • 20:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:02 cdanis: depool eqsin for router upgrade
  • 19:57 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 19:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:24 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:17 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 19:17 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.5 refs T257973
  • 19:08 mutante: restarted apache on cont2001 for integration.wikimedia.org docroot change
  • 19:07 mutante: switching document root of integration.wikimedia.org to scap (T149924)
  • 19:02 twentyafterfour: 1.36.0-wmf.5 has no known blockers and logspam is cleaned up, time to roll group2 wikis to wmf.5
  • 18:42 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 18:42 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:19 mutante: ores1004 - starting failed celery-ores-worker
  • 18:18 mutante: testreduce1001 - rt_client and vd_client now properly stopped by puppet T257906
  • 17:29 shdubsh: restart elasticsearch on logstash1012 (not 1020) -- high gc runtimes
  • 17:28 shdubsh: restart elasticsearch on logstash1020 -- high gc runtimes
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:23 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 17:22 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 16:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:48 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:43 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:40 _joe_: restarted apache2 on icinga1001
  • 16:13 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:11 shdubsh: restart elasticsearch on logstash1011 -- long gc runs
  • 16:10 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:08 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:06 oblivian@deploy1001: Finished deploy [ores/deploy@8540eec]: various configuration fixes (duration: 09m 03s)
  • 13:57 oblivian@deploy1001: Started deploy [ores/deploy@8540eec]: various configuration fixes
  • 13:53 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:53 oblivian@deploy1001: Finished deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy T244843 (duration: 14m 00s)
  • 13:39 oblivian@deploy1001: Started deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy T244843
  • 13:26 oblivian@deploy1001: Finished deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy T244843 (take 2) (duration: 11m 37s)
  • 13:14 oblivian@deploy1001: Started deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy T244843 (take 2)
  • 13:11 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843 (duration: 11m 19s)
  • 13:09 gehel: repool wdqs1007 - catched up on lag
  • 13:00 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843
  • 12:51 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843 (duration: 07m 03s)
  • 12:44 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843
  • 11:49 Lucas_WMDE: EU backport window done
  • 11:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/AbuseFilter/includes/AbuseFilterHooks.php: d762e7b: Use $user param when filtering edits (T258717) (duration: 01m 05s)
  • 11:41 eileen: civicrm revision changed from 6c9441a18e to 958a79f660, config revision is 706cf3c898
  • 11:38 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/AbuseFilter/includes/AbuseFilterHooks.php: 00da39b: Use $user param when filtering edits (T258717) (duration: 01m 05s)
  • 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/Wikibase/client/data-bridge/dist/: Backport: Don't try to load source maps in production (T260852) (duration: 01m 07s)
  • 11:07 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix testwikidata depicts id & CirrusSearchUserTesting config (duration: 01m 06s)
  • 11:07 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=trwiki editor # T260899
  • 10:58 XioNoX: re-pool codfw - T259621
  • 10:53 XioNoX: un-drain cr1-codfw - T259621
  • 10:45 XioNoX: cr1-codfw> request chassis routing-engine master switch - T259621
  • 10:26 hashar: Restarted zuul-merger instances on contint1001 and contint2001
  • 10:24 hashar@deploy1001: Finished deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events (duration: 00m 24s)
  • 10:24 hashar@deploy1001: Started deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events
  • 10:21 XioNoX: cr1-codfw> request chassis routing-engine master switch - T259621
  • 10:12 XioNoX: reboot cr1-codfw:re1 (backup) for upgrade - T259621
  • 09:57 XioNoX: bump cr1-codfw OSPF metrics - T259621
  • 09:51 XioNoX: enable transit/peering and re-set normal OSPF values on cr2-codfw - T259621
  • 09:41 XioNoX: cr2-codfw> request chassis routing-engine master switch - T259621
  • 09:36 eileen: civicrm revision changed from cf9fadbeed to 6c9441a18e, config revision is 706cf3c898
  • 09:33 XioNoX: reboot cr2-codfw:re0 (backup) for upgrade - T259621
  • 09:18 XioNoX: cr2-codfw> request chassis routing-engine master switch - T259621
  • 09:18 kormat: stress-testing db2125 T260670
  • 09:08 XioNoX: reboot cr2-codfw:re1 (backup) for upgrade - T259621
  • 09:03 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2125 after host failure T260670', diff saved to https://phabricator.wikimedia.org/P12303 and previous config saved to /var/cache/conftool/dbconfig/20200820-090313-kormat.json
  • 08:52 kormat: removing /usr/bin/check_mariadb.py from all db hosts T259516
  • 08:52 XioNoX: disable transit/peering on cr2-codfw - T259621
  • 08:48 XioNoX: bump cr2-codfw OSPF metrics - T259621
  • 08:44 jynus: running analyze table on db1115's tendril.global_status_log, may case some stalls on tendril/dbtree T260876
  • 08:41 XioNoX: depool codfw for routers upgrade - T259621
  • 08:31 XioNoX: enable transit/peering on cr3-knams - T259621
  • 08:21 XioNoX: reboot cr3-knams for upgrade - T259621
  • 08:07 XioNoX: disable transit/peering on cr3-knams - T259621
  • 07:39 hashar: contint2001: restarted zuul
  • 07:29 hashar: contint1001: restarted zuul-merger
  • 07:29 hashar@deploy1001: Finished deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - T258630 (duration: 00m 13s)
  • 07:28 hashar@deploy1001: Started deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - T258630
  • 01:54 ejegg: re-enabled fundraising scheduled jobs
  • 00:51 mutante: ms-be1039 - started failed ferm service
  • 00:35 ejegg: stopped fundraising scheduled jobs
  • 00:27 eileen: civicrm revision changed from c442a09153 to cf9fadbeed, config revision is 3cdffd4fc2

2020-08-19

  • 23:20 Urbanecm: Evening B&C window closed
  • 23:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a808999: Enable VisualEditor in namespaces Draft and Wikiproject on hywiki (T260825) (duration: 01m 05s)
  • 22:41 eileen: civicrm revision changed from 34f95a3311 to c442a09153, config revision is 3cdffd4fc2
  • 21:27 eileen: civicrm revision changed from 154519cc1f to 34f95a3311, config revision is 3cdffd4fc2
  • 21:17 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 21:17 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167) (duration: 00m 06s)
  • 20:39 dpifke@deploy1001: Started deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167)
  • 19:43 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader2001 with debug logging
  • 19:20 mutante: testreduce1001 - re-enabled puppet, confirmed parsoid-rt service was now stopped properly by puppet while it runs as before on scandium, the previous parsoid-testing host. switching it over is now a Hiera one-liner. (T257906)
  • 19:15 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.5 refs T257973 (duration: 01m 04s)
  • 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.5 refs T257973
  • 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 60af096: Add autopatrolled group at arzwiki (T260761) (duration: 01m 04s)
  • 18:52 mutante: testreduce1001 - disable puppet; stop parsoid-rt service
  • 18:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 924a03b: Add clinton.presidentiallibraries.us to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T259927) (duration: 01m 04s)
  • 18:45 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 83b34e1: ClosedWikiProvider: Use testUserForCreation rather than testForAuthentication (T258695) (duration: 01m 04s)
  • 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d45f6: Dont index Draft (118) and Draft talk (119) on hywiki (T260804) (duration: 01m 04s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 04s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 06s)
  • 18:25 mutante: rebooting webperf1002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb4aa44: Configure namespaces on commons to include categories (T198716) (duration: 01m 04s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b904333: Update project wordmarks (T254788; sync 2/2) (duration: 01m 04s)
  • 18:19 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: b904333: Update project wordmarks (T254788; sync 1/2) (duration: 01m 06s)
  • 18:15 mutante: rebooting webperf2002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a6f8354: Enable $wgMFNoindexPages for all wikis (T255458) (duration: 01m 07s)
  • 18:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:38 mutante: decom'ing releases2001.codfw.wmnet (
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:37 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:41 rzl: finished exercising the switchdc cookbooks with --live-test for now, all changes reverted including re-enabling puppet on cumin1001
  • 15:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 15:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:31 jbond42: update java.security https://gerrit.wikimedia.org/r/c/operations/puppet/+/593467
  • 15:30 oblivian@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=api-rw
  • 15:26 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:26 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:22 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:22 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:18 godog: prometheus codfw lvextend --resizefs --size +80G /dev/mapper/vg--ssd-prometheus--ops
  • 15:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 15:17 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:16 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:16 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:14 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:50 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 14:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:50 rzl: running the switchdc cookbooks with --live-test, simulating a switch to eqiad where we're already running, no production impact is expected
  • 14:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:47 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:41 rzl: disable puppet on cumin1001 for switchdc testing
  • 14:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:27 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:38 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:34 gehel: depooling wdqs1007 and restarting blazegraph
  • 13:29 _joe_: depooling and disabling puppet on restbase1024 for further investigation
  • 13:27 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:25 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:03 _joe_: building and uploading fluent-bit, ratelimit images
  • 13:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 12:57 _joe_: building a new version of the base docker images
  • 11:29 awight: EU bacon finished
  • 11:28 effie: restart mwdebug* servers
  • 11:08 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Fix typos in flaggedrevs comments () (duration: 01m 19s)
  • 09:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:36 XioNoX: update firewall policies on pfw - T260585
  • 08:35 jayme: running puppet on A:all-mw-eqiad
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:20 godog: switch grafana.w.o to grafana 7 in codfw - T259143
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:14 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:06 jayme: running puppet on A:all-mw-eqiad
  • 07:46 godog: upgrade to grafana 7 on cloudmetrics hosts - T259143
  • 07:15 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 07:10 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:39 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:13 eileen: tools revision changed from b4ebd1e564 to 0b9d971bc4
  • 06:07 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:03 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:00 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:53 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 03:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:53 cstone: civicrm revision changed from f5469d0a4c to 154519cc1f
  • 02:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
  • 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend (T180761) (duration: 05m 13s)
  • 00:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster

2020-08-18

  • 23:45 catrope@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 05s)
  • 23:44 catrope@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 06s)
  • 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
  • 23:34 Urbanecm: Run scap pull at mw1301
  • 23:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable static maps on testwiki, disable them on test2wiki (duration: 03m 22s)
  • 23:32 mutante: rebooting mw1301 via mgmt
  • 23:22 mutante: killed reboot-cluster on cumin1001
  • 23:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ac34f72: Enable subpages in NS:0 in techconductwiki (T260350) (duration: 05m 14s)
  • 23:04 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 22:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 22:09 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:07 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:06 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:37 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:24 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:27 hashar: https://releases-jenkins.wikimedia.org/ changed agent from releases1001 to releases1002
  • 20:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.5 refs T257973
  • 20:11 mutante: running puppet on cp-ats-ulsfo and switching releases-jenkins backend
  • 20:07 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.5 refs T257973 (duration: 53m 12s)
  • 20:00 mutante: releases1001 rm /etc/rsync.d/frag* & run puppet
  • 19:54 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002/2002 with --delete T256164
  • 19:47 ejegg: updated payments-wiki from a7ee1790e0 to ef7ebd08cb
  • 19:44 hashar: Deleting old jobs from https://releases-jenkins.wikimedia.org/ # T256164
  • 19:41 hashar: releases1001: deleting old legacy mediawiki snapshots under /var/lib/jenkins/{REL1_27,REL1_29,REL1_30} # T256164
  • 19:14 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.5 refs T257973
  • 19:13 twentyafterfour: Promote testwikis from 1.36.0-wmf.4 to 1.36.0-wmf.5 refs T257973
  • 17:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:12 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw14(09|11|13).*
  • 16:03 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 15:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 15:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:02 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:56 papaul: replacing msw-c1,c2 and c4
  • 14:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P12293 and previous config saved to /var/cache/conftool/dbconfig/20200818-145337-marostegui.json
  • 14:48 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(55|64|65).*
  • 14:46 XioNoX: move v4 HE on cr3-ulsfo from peering to transit bgp group
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12292 and previous config saved to /var/cache/conftool/dbconfig/20200818-144415-marostegui.json
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12291 and previous config saved to /var/cache/conftool/dbconfig/20200818-143758-marostegui.json
  • 14:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12290 and previous config saved to /var/cache/conftool/dbconfig/20200818-142937-marostegui.json
  • 14:28 marostegui: Stop MYSQL on db2125 for on-site maintenance - T260670
  • 13:54 marostegui: Revoke DELETE and CREATE from xhgui user on m2 T260640
  • 13:53 XioNoX: bump Zayo v4 BGP session in eqiad
  • 13:49 XioNoX: move v4 HE on cr2-eqord from peering to transit bgp group
  • 13:37 XioNoX: move v4 cr1-eqiad from peering to transit bgp group
  • 13:04 kormat: disabling puppet on all db machines T259516
  • 12:57 _joe_: rebooting appservers in eqiad, 3 at a time
  • 12:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:34 kormat: deploying wmfmariadbpy 0.4
  • 12:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:53 XioNoX: add new icinga hosts to mr policies - T260533
  • 11:40 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:36 Lucas_WMDE: EU backport&config done
  • 11:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wikisource wordmark for trwikisource (T260658), part 2 (duration: 00m 55s)
  • 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/mobile/copyright/wikisource-wordmark-tr.svg' | mwscript purgeList.php # T260658
  • 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wikisource-wordmark-tr.svg: Config: Add Wikisource wordmark for trwikisource (T260658), part 1 (duration: 00m 55s)
  • 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Catalan Wikipedia (T232584) (duration: 01m 01s)
  • 11:06 jbond42: deploy net-snmp update to buster
  • 10:56 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw229.*
  • 10:55 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 10:54 marostegui: Reboot db2125 after running a full upgrade - T260670
  • 10:46 marostegui: Powercycle db2125 from the idrac T260670
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - host down T260670', diff saved to https://phabricator.wikimedia.org/P12288 and previous config saved to /var/cache/conftool/dbconfig/20200818-100718-marostegui.json
  • 09:45 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:43 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 09:40 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[234].*
  • 09:40 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 09:35 kart_: Update cxserver to 2020-08-17-090424-production (T259980)
  • 09:32 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:29 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:28 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:28 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[02].*
  • 09:26 volans: upgraded spicerack to v0.0.39 on cumin hosts
  • 09:25 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:21 volans: uploaded spicerack_0.0.39-1+deb10u1 to apt.wikimedia.org buster-wikimedia
  • 09:05 hashar: Restarting CI Jenkins
  • 08:44 vgutierrez: restart ats-tls on cp5006
  • 08:24 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 08:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:16 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:10 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P12284 and previous config saved to /var/cache/conftool/dbconfig/20200818-080256-marostegui.json
  • 07:58 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:45 godog: VictorOps ack'd incidents will re-trigger after 24h if not resolved - T259465
  • 07:44 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12283 and previous config saved to /var/cache/conftool/dbconfig/20200818-074325-marostegui.json
  • 07:42 _joe_: performing rolling reboot of all codfw api servers
  • 07:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12282 and previous config saved to /var/cache/conftool/dbconfig/20200818-072349-marostegui.json
  • 07:19 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw213[5-9].codfw.wmnet
  • 07:16 jynus: update rest of phabricator passwords T250361
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12281 and previous config saved to /var/cache/conftool/dbconfig/20200818-071121-marostegui.json
  • 07:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:07 godog: prometheus eqiad: add 100G to prometheus/global
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:01 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:53 twentyafterfour: phabricator maintenance successful
  • 06:48 jynus: deploy another password change to phabricator service (potentially disruptive) T250361
  • 06:41 XioNoX: add cloudflare PNI IPs in eqiad - T259036
  • 06:21 jynus: deploy password change to phabricator service T146055
  • 06:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:52 _joe_: running puppet on mc1020 T260622
  • 05:02 twentyafterfour: phabricator appears to be fully functional
  • 05:01 twentyafterfour: phabricator read-only ended
  • 05:00 twentyafterfour: phabricator is now read-only
  • 05:00 marostegui: Failover m3 (phabricator) database master from db1128 to db1132 - T259589
  • 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P12279 and previous config saved to /var/cache/conftool/dbconfig/20200818-043241-marostegui.json
  • 01:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
  • 01:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1343.eqiad.wmnet
  • 01:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1344.eqiad.wmnet
  • 01:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:48 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1341.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1339.eqiad.wmnet
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:15 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1315.eqiad.wmnet
  • 00:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)

2020-08-17

  • 23:59 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:41 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet
  • 23:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:30 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet
  • 23:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:25 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:11 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet
  • 23:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet
  • 22:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 22:37 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet
  • 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:26 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
  • 22:25 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:09 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
  • 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
  • 22:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:57 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary (T259360)
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:53 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add api-gateway.request stream config T259736, one host timed out (duration: 00m 55s)
  • 21:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 ppchelko@deploy1001: sync-file aborted: Add api-gateway.request stream config T259736 (duration: 05m 01s)
  • 21:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
  • 21:46 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet
  • 21:42 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 21:38 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for T257687 (duration: 00m 57s)
  • 21:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:34 effie: blocking temporarily traffic to mc1020
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet
  • 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
  • 21:08 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:30 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:28 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:22 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:01 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3 (duration: 02m 57s)
  • 18:58 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3
  • 18:58 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2 (duration: 11m 19s)
  • 18:46 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2
  • 18:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002 (duration: 131m 17s)
  • 18:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:43 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 808c17d: Change logo for lldwiki to match the requested one (T259432) (duration: 00m 56s)
  • 18:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: 67e8f88: Add logo files for lldwiki (T259432) (duration: 00m 56s)
  • 17:17 cdanis@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.*
  • 17:06 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 17:04 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw,name=mw2246.codfw.wmnet
  • 17:01 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 16:36 jynus: restart backup2001, backup1001 one after the other
  • 16:35 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002
  • 16:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 16:27 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - remove unneeded override for SearchSatisfaction - T259163 (duration: 00m 56s)
  • 16:22 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: cluster=jobrunner,dc=codfw,name=mw2250.codfw.wmnet
  • 16:20 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw
  • 16:20 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:14 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1359.*
  • 16:12 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 16:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out (duration: 01m 31s)
  • 15:43 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out
  • 15:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out (duration: 20m 40s)
  • 15:36 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:30 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*-codfw*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:22 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out
  • 15:22 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054 (duration: 02m 30s)
  • 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054
  • 15:08 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:06 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:04 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis (take 2) - T254606 (duration: 00m 53s)
  • 14:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis - T254606 (duration: 00m 55s)
  • 14:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - group0 - T254606 (duration: 00m 56s)
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12277 and previous config saved to /var/cache/conftool/dbconfig/20200817-141449-marostegui.json
  • 14:09 marostegui: Sanitize thankyouwiki on db1124:3315, db2094:3315 - T260551
  • 14:03 marostegui: Sanitize lldwiki on db1124:3315 and db2094:3315 T259436
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12276 and previous config saved to /var/cache/conftool/dbconfig/20200817-140229-marostegui.json
  • 13:58 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259432)
  • 13:54 Urbanecm: Creating thankyouwiki and lldwiki is done
  • 13:54 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 52s)
  • 13:54 Urbanecm: Create account Pcoombe (WMF) at thankyouwiki, email set to pcoombe@wikimedia.org (T259002)
  • 13:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:49 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating thankyouwiki (T259002)
  • 13:48 urbanecm@deploy1001: Synchronized dblists: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:47 marostegui: Deploy MCR change on db1104
  • 13:47 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating thankyouwiki (T259002) (duration: 00m 56s)
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for MCR change', diff saved to https://phabricator.wikimedia.org/P12275 and previous config saved to /var/cache/conftool/dbconfig/20200817-134701-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12274 and previous config saved to /var/cache/conftool/dbconfig/20200817-134619-marostegui.json
  • 13:46 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12273 and previous config saved to /var/cache/conftool/dbconfig/20200817-134604-marostegui.json
  • 13:41 jayme: imported td-agent-bit_1.5.3-0 to buster-wikimedia - T260536
  • 13:40 jayme: imported !log imported to buster-wikimedia
  • 13:39 marostegui: Upgrade db1088 (s6) to a newer mysql version (10.4.14)
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for mysql upgrade', diff saved to https://phabricator.wikimedia.org/P12272 and previous config saved to /var/cache/conftool/dbconfig/20200817-133905-marostegui.json
  • 13:34 jbond42: deploy json-c security update to buster
  • 13:33 marostegui: Restart mysql on db2102 (testing new package)
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12271 and previous config saved to /var/cache/conftool/dbconfig/20200817-133043-marostegui.json
  • 13:29 urbanecm@deploy1001: Synchronized langlist: Creating lldwiki (T259432) (duration: 00m 54s)
  • 13:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:27 urbanecm@deploy1001: sync-file aborted: Creating lldwiki (T259432)¨ (duration: 00m 00s)
  • 13:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating lldwiki (T259432) (duration: 00m 53s)
  • 13:25 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lldwiki (T259432)
  • 13:23 urbanecm@deploy1001: Synchronized dblists: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:22 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:20 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12270 and previous config saved to /var/cache/conftool/dbconfig/20200817-131307-marostegui.json
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:09 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12269 and previous config saved to /var/cache/conftool/dbconfig/20200817-130127-marostegui.json
  • 12:58 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depoool db1089 for MCR change', diff saved to https://phabricator.wikimedia.org/P12268 and previous config saved to /var/cache/conftool/dbconfig/20200817-124458-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12267 and previous config saved to /var/cache/conftool/dbconfig/20200817-124409-marostegui.json
  • 12:44 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:27 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12266 and previous config saved to /var/cache/conftool/dbconfig/20200817-122234-marostegui.json
  • 12:21 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12265 and previous config saved to /var/cache/conftool/dbconfig/20200817-121600-marostegui.json
  • 12:05 Lucas_WMDE: EU backport window done
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki --fix | tee T259429-fix
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki | tee T259429-dryrun
  • 12:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set Portal and Portal_talk namespaces in bjnwiki as an extra namespace. (T259429) (duration: 00m 55s)
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12264 and previous config saved to /var/cache/conftool/dbconfig/20200817-115741-marostegui.json
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 2 (duration: 00m 57s)
  • 11:53 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/mobile/copyright/wiktionary-wordmark-es.svg\n' | mwscript purgeList.php # T254059
  • 11:53 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wiktionary-wordmark-es.svg: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 1 (duration: 00m 56s)
  • 11:46 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki%s.png\n' '-1.5x' '-2x' | mwscript purgeList.php # T259006
  • 11:45 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: Change the logo of lzh Wikipedia (T259006) (duration: 00m 55s)
  • 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons for Turkish Wikiquote, Turkish Wiktionary, Turkish Wikisource and Turkish Wikibooks (T260493) (duration: 00m 55s)
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons (T260492) (duration: 00m 57s)
  • 11:25 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:14 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] configure mediasearch A/B test (duration: 01m 08s)
  • 11:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:54 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:51 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:49 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:36 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:30 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:14 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:55 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:45 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:42 jynus: updating compiler facts for cloud puppet compiler project to include new host dbprov2003
  • 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:22 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:18 _joe_: running a full apt-get upgrade on mw1379-1380
  • 09:18 _joe_: re-upgrading imagemagick on mw1378
  • 09:16 _joe_: upgrading packages on mw1377
  • 09:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:05 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: forcing a puppet run on all mw-api servers in eqiad - T260329
  • 07:52 _joe_: repooling mw1382
  • 07:37 _joe_: running the same test on mw1382 T260329
  • 07:34 _joe_: repooling mw1381
  • 07:15 _joe_: running the same test on mw1381 T260329
  • 07:15 _joe_: repooled mw1281
  • 06:26 _joe_: stop testing on mw1281, T260329
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:28 marostegui: Stop mysql on db1099:3311, db1099:3318 for reimage
  • 05:28 _joe_: depooling mw1281 for testing for T260329
  • 05:25 marostegui: Deploy schema change on db1139:3311
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311, db1099:3318 for reimage and MCR change', diff saved to https://phabricator.wikimedia.org/P12263 and previous config saved to /var/cache/conftool/dbconfig/20200817-052147-marostegui.json

2020-08-16

  • 11:12 gehel: repooling wdqs1004 - catched up on lag

2020-08-15

  • 21:18 gehel: depooling wdqs1004 and restarting services, will wait to catch up on lag before repooling

2020-08-14

  • 19:41 effie: restart mwdebug1002
  • 16:58 cdanis: done deploying 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' to all routers T260449
  • 16:44 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-esams*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:39 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr1-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:36 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 02:41 eileen: tools revision changed from 9a89f45974 to b4ebd1e564

2020-08-13

  • 23:39 tzatziki: removing 3 files for legal compliance
  • 22:03 mutante: switching xhgui from tungsten to xhgui1001 - ran puppet on webperf*001 - T180761 T158837
  • 21:54 andrew@deploy1001: Finished deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388 (duration: 03m 53s)
  • 21:50 andrew@deploy1001: Started deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388
  • 21:11 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002 and then all other releases* servers. 57GB, overwriting existing data from manual config (T247652)
  • 20:53 kormat: dropping xhgui.xhgui on m2
  • 19:35 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/DiscussionTools: Revert new reply API (again) T259855 (duration: 00m 57s)
  • 18:06 herron: restarted ES on logstash1010
  • 18:05 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Enabling new XHGui backend (T180761) (duration: 00m 56s)
  • 17:16 hnowlan: deployed ATS and varnish rules to route api.wikimedia.org
  • 16:26 hnowlan: created api.wikimedia.org
  • 15:49 hnowlan: moving api-gateway service to state production. critical set to false
  • 15:41 herron: restart ES on logstash1012
  • 14:56 fdans@deploy1001: Finished deploy [analytics/refinery@ba1a439]: Regular analytics weekly train (duration: 11m 34s)
  • 14:45 ema: repool mw1382 with kernel memory accounting disabled T260281
  • 14:45 fdans@deploy1001: Started deploy [analytics/refinery@ba1a439]: Regular analytics weekly train
  • 14:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:40 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:38 ema: reboot mw1382 with kernel memory accounting disabled T260281
  • 14:34 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:34 _joe_: rebooting mw1381 with a newer kernel, mw1383 as control with the old kernel T260329
  • 14:33 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:31 _joe_: installing kernel 4.19.0-0.bpo.9 on mw1381 T260329
  • 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:00 elukey: create schema[12]00[34] in ganeti - T260347
  • 13:59 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:58 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:45 hnowlan: moving api-gateway service to monitoring_setup
  • 13:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:44 hashar: Gracefully restarting Zuul
  • 13:39 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:10 _joe_: forcing a puppet run on the api appservers in eqiad T260329
  • 13:07 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: revert enabling of lilypond (again) T257091 T260329 (duration: 00m 59s)
  • 11:24 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 hnowlan: restarting pybal on lvs2010 T254908
  • 11:09 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:06 hnowlan: restarting pybal on lvs2009 T254908
  • 11:05 hnowlan: restarting pybal on lvs1016 T254908
  • 11:05 jayme: depool mw1380 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 11:05 hnowlan: restarting pybal on lvs1015 T254908
  • 11:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 hnowlan: Moving api-gateway service to from service_setup to lvs_setup and running puppet on LVS servers
  • 10:17 jayme: depool mw1379 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 10:04 XioNoX: re-order OSPF interfaces on all routers (now partially netbox driven)
  • 09:37 ayounsi@deploy1001: Finished deploy [homer/deploy@89636df]: Homer release v0.2.5 (duration: 03m 03s)
  • 09:34 ayounsi@deploy1001: Started deploy [homer/deploy@89636df]: Homer release v0.2.5
  • 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12247 and previous config saved to /var/cache/conftool/dbconfig/20200813-085547-marostegui.json
  • 08:45 _joe_: downgrading imagemagick on mw1378 T260329
  • 08:43 _joe_: downgrading imagemagick on mw1378 T260281
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:55 _joe_: downgrading curl/libcurl3/libcurl3-gnutls on mw1377 T260329
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12246 and previous config saved to /var/cache/conftool/dbconfig/20200813-074528-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12244 and previous config saved to /var/cache/conftool/dbconfig/20200813-071943-marostegui.json
  • 07:16 marostegui: Stop replication on db1082 to remove triggers on sanitarium for MCR changs
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12243 and previous config saved to /var/cache/conftool/dbconfig/20200813-071545-marostegui.json
  • 06:48 marostegui: Deploy MCR change on dbstore1003:3311
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12242 and previous config saved to /var/cache/conftool/dbconfig/20200813-060135-marostegui.json
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:43 marostegui: Stop MySQL on db2135 (codfw master), haproxy irc alert will fire T260324
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12241 and previous config saved to /var/cache/conftool/dbconfig/20200813-052859-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12240 and previous config saved to /var/cache/conftool/dbconfig/20200813-051222-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12239 and previous config saved to /var/cache/conftool/dbconfig/20200813-050107-marostegui.json
  • 02:56 mutante: testreduce1001 - systemctl reset-failed ; fix parsoid-vd systemd state and icinga alert
  • 00:37 mutante: removing jenkins_service_running checks from secondary servers where it's stopped, manually from icinga config, running puppet on icinga
  • 00:14 mutante: re-enabling puppet on releases* servers

2020-08-12

  • 23:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:37 wkandek: reboot mw1372
  • 23:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:32 wkandek: reboot mw1373
  • 23:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:31 wkandek: reboot mw1371
  • 23:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:28 wkandek: reboot mw1384
  • 23:27 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:27 wkandek: reboot mw1385
  • 23:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:24 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:22 wkandek: reboot mw1370
  • 23:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:19 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:18 wkandek: reboot mw1369
  • 23:18 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:17 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:17 wkandek: reboot mw1387
  • 23:16 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:16 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:16 wkandek: reboot mw1389
  • 23:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:14 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:09 wkandek: reboot mw1368
  • 23:09 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:08 wkandek: reboot me1367
  • 23:08 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:07 wkandek: reboot mw1391
  • 23:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:05 ejegg: updated Fundraising CiviCRM from 72452e28a9 to f5469d0a4c
  • 23:05 wkandek: reboot mw1393
  • 23:04 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:04 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:01 wkandek: reboot mw1395
  • 23:01 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:53 wkandek: reboot mw1397
  • 22:53 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek: reboot mw1366
  • 22:52 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:52 wkandek: reboot me1365
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:51 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 wkandek: reboot mw1399
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:46 wkandek: reboot mw1364
  • 22:46 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:42 wkandek: reboot mw1401
  • 22:42 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:41 wkandek: reboot mw1355
  • 22:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:38 wkandek: reboot mw1354
  • 22:38 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:36 wkandek: reboot mw1396
  • 22:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:35 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:32 wkandek: reboot mw1353
  • 22:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:31 wkandek: reboot mw1352
  • 22:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:29 wkandek: reboot mw1348
  • 22:29 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:28 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:26 wkandek: reboot 1347
  • 22:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:23 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:22 wkandek: reboot mw1350
  • 22:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:21 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:20 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:19 wkandek: reboot mw1346
  • 22:19 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:18 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:14 wkandek: reboot mw1345
  • 22:13 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:12 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:12 wkandek: reboot mw1349
  • 22:12 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:11 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:08 wkandek: reboot mw1333
  • 22:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:07 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
  • 22:03 wkandek: reboot mw1344
  • 22:03 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek: reboot mw1343
  • 22:02 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:00 wkandek: reboot mw1332
  • 22:00 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:56 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:55 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:53 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:50 wkandek: reboot mw1331
  • 21:50 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 wkandek: reboot mw1342
  • 21:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:46 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:46 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 21:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:39 wkandek: reboot mw1341
  • 21:39 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:37 wkandek@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 21:37 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:36 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:33 wkandek: reboot mw1329
  • 21:33 wkandek: reboot mw1328
  • 21:32 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:29 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 ejegg: updated payments-wiki from 77ff5d70fc to a7ee1790e0
  • 21:25 wkandek: reboot mw1340
  • 21:25 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:21 wkandek: reboot mw1339
  • 21:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:20 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:15 wkandek: reboot mw1327
  • 21:15 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:13 wkandek: reboot mw1326
  • 21:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:11 wkandek: reboot mw1317
  • 21:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:10 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:05 wkandek: reboot mw1316
  • 21:04 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:03 wkandek: reboot mw1325
  • 21:03 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:02 wkandek: reboot mw1324
  • 21:02 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:02 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 wkandek: reboot mw1315
  • 21:01 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:57 wkandek: reboot mw1323
  • 20:57 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:52 wkandek: reboot mw1322
  • 20:52 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:51 wkandek: reboot mw1314
  • 20:51 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:50 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:50 wkandek: reboot mw1313
  • 20:50 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:44 wkandek: reboot mw1312
  • 20:44 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:43 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:43 wkandek: reboot mw1321
  • 20:42 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:41 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:40 wkandek: reboot mw1297
  • 20:40 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:39 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:39 wkandek: reboot mw1320
  • 20:39 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:34 wkandek: reboot mw1290
  • 20:34 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 wkandek: reboot mw1319
  • 20:33 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:32 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:29 wkandek: reboot mw1275
  • 20:29 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:26 wkandek: reboot mw1289
  • 20:25 wkandek: reboot mw1288
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:23 wkandek: reboot mw1274
  • 20:23 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:20 wkandek: reboot mw1273
  • 20:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:13 wkandek: reboot mw1287
  • 20:13 wkandek: reboot mw1286
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 wkandek: reboot mw1272
  • 20:11 wkandek: reboot mw1271
  • 19:41 hashar: Upgrading Jenkins on contint2001 (primary)
  • 19:25 hashar: contint1001: sudo systemctl mask jenkins # spare server
  • 19:25 mutante: all releases* servers except 1001 - disable puppet; stop jenkins, mask jenkins
  • 19:22 mutante: releases1002 - stopped and masked jenkins service
  • 19:22 mutante: releases2001 - stopped and masked jenkins service
  • 19:20 mutante: upgrading jenkins on releases*001
  • 19:19 hashar: Upgrading Jenkins on contint1001 (spare)
  • 19:16 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.4
  • 19:13 mutante: uploade new jenkins version to APT repo; upgrading jenkins on releases1002/2002
  • 19:08 effie: pool mw1396
  • 19:06 effie: repool mw1395 mw1397 mw1399
  • 18:56 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in client (duration: 02m 13s)
  • 18:47 wkandek: reboot mw1270
  • 18:47 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:45 wkandek: reboot mw1269
  • 18:41 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:38 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:25 wkandek: reboot mw1268
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:17 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on hewiki (T255020) (duration: 01m 03s)
  • 18:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:04 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo (duration: 01m 06s)
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:56 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:51 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:49 effie: reboot mw1265 mw1282 mw1283
  • 17:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:45 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:37 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:19 effie: reboot mw1263 mw1264 mw1279 and mw1281
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:16 cdanis: for posterity: mw1359 has a bunch of special packages installed (previously recorded in SAL) and also has `sudo memleak-bpfcc -o 60000 -z 31 -Z 33 30` running in a tmux in an attempt to understand what's causing the page fragmentation in the appserver fleet
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:00 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:57 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Additional mitigations for T257687 (duration: 01m 03s)
  • 16:53 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:48 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:35 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:32 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:31 effie: reboot mw1277 mw1278 && mw1261 mw1262
  • 16:29 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 16:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:04 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: I3726a6364d, T257079 (duration: 01m 02s)
  • 15:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:52 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:50 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:48 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:48 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:37 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:32 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:26 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:15 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:12 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install linux-headers-4.9.0-12-amd64
  • 15:10 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install python3-netaddr ieee-data
  • 15:09 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo dpkg -i bpfcc-tools_0.12.0-2_all.deb libbpfcc_0.12.0-2_amd64.deb python3-bpfcc_0.12.0-2_all.deb
  • 15:08 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:54 cdanis: again un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:53 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 14:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:44 cdanis: temporarily re-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:35 cdanis: un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:32 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:31 cdanis: temporarily kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:02 kormat: uploaded wmfmariadbpy 0.3 to apt
  • 13:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:42 effie: restart mw1383 & mw1386
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:27 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.4 (duration: 01m 16s)
  • 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.4
  • 13:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:19 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:15 cdanis: ✔️ cdanis@mw1357.eqiad.wmnet ~ 🕘☕ sudo sysctl -w vm/compact_memory=1
  • 13:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:59 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:50 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:33 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:27 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:51 ema: pool mw1363 after reboot
  • 11:49 jynus: creating artificial low replication lag on db2130 to test icinga alerts T253120
  • 11:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:37 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:28 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:21 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:13 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:08 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:07 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:00 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:00 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:55 _joe_: rebooting mw1361
  • 10:51 jayme: rebooting mw1356
  • 10:49 _joe_: rebooting mw1378
  • 09:45 _joe_: repooling mw1377
  • 09:40 _joe_: rebooting mw1377
  • 09:22 _joe_: depool mw1357 tool
  • 09:14 _joe_: depooling mw1377 for inspection
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1110', diff saved to https://phabricator.wikimedia.org/P12220 and previous config saved to /var/cache/conftool/dbconfig/20200812-091211-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12219 and previous config saved to /var/cache/conftool/dbconfig/20200812-090831-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12218 and previous config saved to /var/cache/conftool/dbconfig/20200812-085021-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12217 and previous config saved to /var/cache/conftool/dbconfig/20200812-083548-marostegui.json
  • 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for reimage', diff saved to https://phabricator.wikimedia.org/P12215 and previous config saved to /var/cache/conftool/dbconfig/20200812-073130-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for MCR change', diff saved to https://phabricator.wikimedia.org/P12214 and previous config saved to /var/cache/conftool/dbconfig/20200812-045157-marostegui.json

2020-08-11

  • 23:41 Urbanecm: Evening B&C window completed
  • 23:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0f238f7: Update wgMFRemovableClasses (T231160) (duration: 01m 03s)
  • 23:36 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/MobileFrontend/extension.json: c22d65f: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 03s)
  • 23:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/MobileFrontend/extension.json: 81d54b0: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 05s)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 28faa27: Switching to updated license definition (duration: 01m 04s)
  • 21:52 krinkle@deploy1001: Synchronized php-1.36.0-wmf.3/includes/skins/SkinMustache.php: Ibe1f07346, T259872, T259858 (duration: 01m 04s)
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add streams for eventgate-main - T251935 (duration: 01m 04s)
  • 19:21 ejegg: updated payments-wiki from f199c071c3 to 77ff5d70fc
  • 18:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:48 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant investigate right to checkuser group on frwiki (T260171) (duration: 01m 04s)
  • 18:18 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Beta-only: Configured additional settings for API Portal beta wiki gerrit:619339 (duration: 01m 03s)
  • 18:05 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Direct GrowthExperiments help panel questions to mentors on cswiki (T250235) (duration: 01m 03s)
  • 17:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Remove extraneous mediawiki.api-request stream - T251935 (duration: 01m 01s)
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:38 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:25 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:53 hashar@deploy1001: Synchronized php-1.36.0-wmf.4/skins/MinervaNeue/: Revert "ServiceWiring: Avoid usage of deprecated Title::getSubjectPage()" - T260155 (duration: 01m 06s)
  • 16:12 herron: migrating lists.wikimedia.org services from fermium to lists1001 T224586
  • 15:36 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.4
  • 15:27 hashar@deploy1001: Finished scap: (no justification provided) (duration: 30m 51s)
  • 14:59 marostegui: Deploy MCR change on db1116:3318
  • 14:56 hashar@deploy1001: Started scap: (no justification provided)
  • 14:56 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.2 (duration: 04m 15s)
  • 14:55 jayme: updated helmfile to 0.125.2-1 on contint* and deploy*
  • 14:52 otto@deploy1001: Finished deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935 (duration: 01m 14s)
  • 14:51 otto@deploy1001: Started deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935
  • 14:50 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.1 (duration: 02m 07s)
  • 14:48 jayme: imported helmfile_0.125.2-1 to buster-wikimedia, jessie-wikimedia, stretch-wikimedia
  • 14:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.41 (duration: 04m 20s)
  • 14:40 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.40 (duration: 10m 24s)
  • 14:37 papaul: replacing msw-b5,b6,b7 and b8
  • 14:30 hashar: Cleaning old MediaWiki versions that were never removed
  • 14:27 hashar@deploy1001: sync aborted: testwikis wikis to 1.36.0-wmf.4 (duration: 72m 36s)
  • 14:10 hashar: mw1319: scap pull
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:14 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.4
  • 13:12 hashar: Applied 1.36.0-wmf.4 security patches # T257972
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:52 kormat: uploaded wmfmariadbpy 0.2 packages to apt1001
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:54 marostegui: Install new MariaDB 10.4.14 on db2102
  • 11:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:18 Urbanecm: EU B&C window done
  • 11:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 619255|Enable ContentTranslation in Sundanese WP as a default tool (T258502) (duration: 00m 59s)
  • 10:39 volans: migrating *all* eqiad mgmt DNS records to the autogenerated ones via Netbox - T233183
  • 10:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0)
  • 10:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh
  • 10:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 09:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 09:29 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:25 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:11 marostegui: Rename tables on muswiki and mhwiktionary on s3 master (db1123) without replication T260112
  • 09:01 volans: renewed puppet certificate on scb1001.eqiad.wmnet
  • 08:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e6ec237: Revert "Turn muswiki and mhwiktionary to read-only" (T259004) (duration: 00m 58s)
  • 08:45 urbanecm@deploy1001: Synchronized dblists/: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 3/3) (duration: 00m 58s)
  • 08:44 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 2/3) (duration: 00m 58s)
  • 08:43 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 1/3) (duration: 01m 02s)
  • 08:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a04bc1f: Turn muswiki and mhwiktionary to read-only (T259004) (duration: 01m 01s)
  • 08:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:54 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:45 XioNoX: Re-prioritize peering over transit eqiad/esams - T259614
  • 01:59 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: enabling fast stale mode T250248 (duration: 00m 58s)
  • 00:33 dpifke@deploy1001: Finished deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167 (duration: 01m 03s)
  • 00:31 dpifke@deploy1001: Started deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167
  • 00:24 mutante: reverting switch of releases.wikimedia.org for today since releases-jenkins.wikimedia.org is tied to it and new jenkins still needs some config and plugins (T247652)
  • 00:08 mutante: releases-jenkins.wikimedia.org currently under maintenance (T247652)

2020-08-10

  • 23:56 eileen: tools revision changed from 22550f38c5 to 9a89f45974
  • 23:53 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced. httpbb tests have been created and pass. (T247652)
  • 23:52 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced of course.
  • 20:13 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/c/integration/config/+/619359/
  • 20:10 ejegg: updated payments-wiki from 932aacde54 to f199c071c3
  • 18:32 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@3e12dbb]: 0.3.44 (duration: 15m 18s)
  • 18:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 ryankemper@deploy1001: Started deploy [wdqs/wdqs@3e12dbb]: 0.3.44
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Investigate on frwiki (T257891) (duration: 00m 58s)
  • 18:07 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Explicitly disable nativeGallery in Parsoid settings (no-op) (duration: 00m 58s)
  • 18:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump the weight of near match for search (T257922) (duration: 00m 59s)
  • 17:56 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:49 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-analytics streams - T251935 (duration: 01m 02s)
  • 17:46 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:14 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:04 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:01 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:55 XioNoX: Re-prioritize peering over transit - codfw - T259614
  • 12:34 XioNoX: Re-prioritize peering over transit - eqsin - T259614
  • 12:07 XioNoX: standardize cr1-eqiad interfaces
  • 11:56 Urbanecm: EU B&C window done
  • 11:55 Urbanecm: Run `mwscript namespaceDupes.php --wiki=tiwiki --fix` at mwmaint1002 (T259295)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 14b2897: Define Portal namespace for tiwiki (T259295) (duration: 00m 59s)
  • 11:49 urbanecm@deploy1001: Synchronized static/images/project-logos/: bbbf701: Regenerate Bengali Wikipedia logo from source SVG (T259292) (duration: 00m 59s)
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0d8366f: Search Work NS by default at bnwikisource (T258982) (duration: 00m 59s)
  • 11:37 Urbanecm: Run `mwscript namespaceDupes.php --wiki=hywiki --fix` at mwmaint1002 (T259987)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1771487: add two extra namespaces for hywiki (T259987) (duration: 00m 59s)
  • 11:28 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/shnwiktionary*.png with purgeList.php (T260010)
  • 11:27 XioNoX: standardize cr2-eqiad interfaces
  • 11:27 urbanecm@deploy1001: Synchronized static/images/project-logos/: c5c96ca: Regenerate shnwiktionary logo from source svg (T260010) (duration: 00m 58s)
  • 11:21 XioNoX: repool ulsfo
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a15e3a2: Increase autoconfirmed threshold for Chinese Wikinews to 7 days and 20 edits at least (T259869) (duration: 00m 58s)
  • 11:13 XioNoX: Re-prioritize peering over transit - ulsfo - T259614
  • 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ba0b2ab: Create TemplateEditor group on zhwiki (T260012) (duration: 00m 58s)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix --add-prefix=T259959 (T259959)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix (T259959)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959) (duration: 00m 58s)
  • 11:06 urbanecm@deploy1001: sync-file aborted: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959¨) (duration: 00m 00s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 01s)
  • 10:42 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:37 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:36 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.pool (exit_code=99)
  • 10:36 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:29 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:23 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 10:19 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:18 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:14 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 hashar: Updated containeer for Jenkins job operations-dns-lint-docker https://gerrit.wikimedia.org/r/619267
  • 09:55 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/619266
  • 09:54 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 09:49 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 09:21 marostegui: Promote dbproxy1019 back T255408
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:43 marostegui: Remove revision triggers from db2094:3318 T238966
  • 06:42 marostegui: Stop replication on s8 codfw master to deploy MCR change, this will generate lag on s8 codfw T238966
  • 04:46 marostegui: Depool dbproxy1019 for reimage T255408

2020-08-09

  • 21:58 ejegg: updated payments-wiki from cd012f37f1 to 932aacde54
  • 03:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2020-08-08

  • 02:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 02:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload

2020-08-07

  • 16:42 jforrester@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/DiscussionTools/: T259855 Revert new reply API (duration: 01m 06s)
  • 15:01 volans: import DNS names for network devices in Netbox - T258729
  • 13:27 godog: bounce pybal on lvs1016 and then lvs1015 to reset state, logstash1025 reported down but actually up
  • 10:27 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:27 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey: reboot deneb via ganeti2021 (hostname config pointing to recdns for some reason)
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P12195 and previous config saved to /var/cache/conftool/dbconfig/20200807-091527-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12194 and previous config saved to /var/cache/conftool/dbconfig/20200807-084747-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12193 and previous config saved to /var/cache/conftool/dbconfig/20200807-080719-marostegui.json
  • 07:50 godog: prometheus codfw lvextend --resize --size +60G /dev/mapper/vg--hdd-prometheus--global
  • 07:49 godog: prometheus codfw lvextend --resize --size +30G /dev/mapper/vg--ssd-prometheus--k8s
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12192 and previous config saved to /var/cache/conftool/dbconfig/20200807-074658-marostegui.json
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for upgrade', diff saved to https://phabricator.wikimedia.org/P12191 and previous config saved to /var/cache/conftool/dbconfig/20200807-063431-marostegui.json

2020-08-06

  • 23:21 catrope@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/: Fixes for WelcomeSurvey language question (T232410) (duration: 00m 59s)
  • 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change GrowthExperiments mentor list on fawiki (T253291) (duration: 00m 59s)
  • 21:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:33 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/vendor: Update git submodules (vendor) (T259832) (duration: 01m 08s)
  • 21:32 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:47 shdubsh: restart logstash -- pipeline appears stuck
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 brennen: manually updating the vendor submodule on 1.36.0 for T259832
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix another typo in eventgate stream config - T251935 (duration: 00m 58s)
  • 19:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix typo in eventgate stream config - T251935 (duration: 00m 59s)
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.3
  • 18:58 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:57 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:21 Urbanecm: Morning B&C window was completed
  • 18:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/modules/: fb4a808: Fix "Ask mentor" help panel button styling (T250235) (duration: 01m 07s)
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9db9659: Remove temporary logging for mediamoderation (T259742) (duration: 01m 07s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9695811: : Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") (T259574) (duration: 01m 06s)
  • 17:42 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 06s)
  • 17:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 17:37 brennen: train 1.36.0-wmf.3: proceeding to group1
  • 17:36 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/WikibaseMediaInfo/src/View/MediaInfoEntityTermsView.php: Backport: Fix array unpacking as argument list (T259745) (duration: 01m 07s)
  • 16:32 chrisalbon@deploy1001: Finished deploy [ores/deploy@f3c44be]: T258435 (duration: 14m 12s)
  • 16:18 dpifke@deploy1001: Finished deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167 (duration: 00m 05s)
  • 16:18 dpifke@deploy1001: Started deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167
  • 16:18 chrisalbon@deploy1001: Started deploy [ores/deploy@f3c44be]: T258435
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:10 fdans@deploy1001: Finished deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3 (duration: 20m 01s)
  • 14:50 fdans@deploy1001: Started deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3
  • 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-* test.event streams - T251935 (duration: 01m 08s)
  • 13:32 jayme: updated helm to 2.16.9-2 on contint*, deploy* and chartmuseum*
  • 13:24 jayme: imported helm_2.16.9-2 and tiller_2.16.9-2 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:06 kart_: Updated cxserver to 2020-08-05-070016-production (T258919, T199523, T257943, T256194)
  • 12:03 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:59 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:57 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:54 Lucas_WMDE: EU backport window done
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Flow/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 09s)
  • 11:53 XioNoX: reboot cr2-eqord - T259621
  • 11:37 XioNoX: drain traffic away cr2-eqord - T259621
  • 11:27 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Wikibase/lib/: Backport: Fix CachingFallbackLabelDescriptionLookup failing in edge-cases (T259744) (duration: 01m 10s)
  • 11:22 XioNoX: reboot cr2-eqdfw - T259621
  • 11:13 XioNoX: drain traffic away cr2-eqdfw - T259621
  • 10:52 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:48 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:45 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:23 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:16 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:12 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:11 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1127', diff saved to https://phabricator.wikimedia.org/P12188 and previous config saved to /var/cache/conftool/dbconfig/20200806-084406-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12187 and previous config saved to /var/cache/conftool/dbconfig/20200806-083743-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12186 and previous config saved to /var/cache/conftool/dbconfig/20200806-083033-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12185 and previous config saved to /var/cache/conftool/dbconfig/20200806-081416-marostegui.json
  • 07:03 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:57 marostegui: Truncate tables on zerowiki T227717
  • 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:47 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:43 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:37 elukey: roll restart of druid clusters' zookeeper and an-conf* zookeeper for openjdk-11 upgrades
  • 06:36 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for MCR', diff saved to https://phabricator.wikimedia.org/P12184 and previous config saved to /var/cache/conftool/dbconfig/20200806-050743-marostegui.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P12182 and previous config saved to /var/cache/conftool/dbconfig/20200806-045622-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12181 and previous config saved to /var/cache/conftool/dbconfig/20200806-045107-marostegui.json
  • 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12180 and previous config saved to /var/cache/conftool/dbconfig/20200806-044608-marostegui.json
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12179 and previous config saved to /var/cache/conftool/dbconfig/20200806-043758-marostegui.json
  • 03:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
  • 02:24 eileen: process-control config revision is 525eb71235 turn off delete deleted contacts
  • 01:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:35 mutante: wtp2019 - reimaging - parsoid service does not work, unlike on all other wtp*, making sure it's clean
  • 00:00 mutante: LDAP - removed demon from nda group

2020-08-05

  • 23:57 eileen: civicrm revision changed from 150c3476c4 to 72452e28a9, config revision is b6ece03513
  • 23:02 shdubsh: logstash in codfw looks stuck -- restarting
  • 19:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.2
  • 19:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:13 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 44s)
  • 19:11 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 18:26 Lucas_WMDE: Morning backport window done
  • 18:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/ContentTranslation/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 11s)
  • 18:14 mutante: test !log
  • 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable growth study quick survey (T257015) (duration: 01m 12s)
  • 17:30 shdubsh: test prometheus-icinga-exporter upgrade on icinga2001
  • 16:50 elukey: powercycle stat1005 after GPU issue
  • 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-logging-external streams and destination_event_service settings - T251935 (duration: 01m 05s)
  • 15:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 godog: bounce logstash on logstash100[789] - udp loss reported
  • 15:05 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:48 elukey: reboot stat1008 for unexpected maintenance (GPU stuck)
  • 14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:32 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:25 moritzm: installing nmap bugfix updates from buster point release
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 moritzm: installing pillow security updates
  • 14:03 moritzm: installing node-minimist security updates
  • 13:51 moritzm: installing Linux update to 4.9.132 from buster point update (no reboots, just the package updates)
  • 13:32 jayme: updated helmfile to 0.125.2-0 and helm-diff to 3.1.2-1 on contint* and deploy*
  • 13:28 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:24 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:04 elukey: restart yarn resource managers on an-master100[12] to pick up new Yarn settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/618529
  • 13:00 moritzm: installing libjpeg-turbo security updates on stretch
  • 12:52 XioNoX: netmon1002:/srv/deployment/librenms/librenms$ sudo -u librenms ./lnms migrate
  • 12:49 jayme: imported helm-diff_3.1.2-1 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:46 moritzm: installing imagemagick security updates on buster
  • 12:33 moritzm: installing net-snmp security updates on icinga hosts
  • 11:36 awight: EU Bacon reclosed
  • 11:36 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Switch test wikis to new version of vector by default (3/3) (T254227) (duration: 01m 07s)
  • 11:29 awight: EU Bacon reopened
  • 11:28 awight: EU Bacon complete
  • 11:26 awight@deploy1001: Synchronized wmf-config: Config: FileImporter: full default deployment (T232542) (duration: 01m 04s)
  • 11:23 jayme: imported helm-diff_3.1.2-0 to jessie-wikimedia and stretch-wikimedia
  • 11:22 jayme: imported helm-diff_3.1.2-0 to buster-wikimedia
  • 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add import sources for lijwikisource (T259633) (duration: 01m 07s)
  • 11:13 awight@deploy1001: sync-file aborted: Config: Add import sources for lijwikisource (T259633) (duration: 00m 13s)
  • 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Test Wikidata clients (T232584) (duration: 01m 20s)
  • 10:39 XioNoX: reboot cr3-ulsfo - T259621
  • 10:28 XioNoX: drain traffic away cr3-ulsfo - T259621
  • 10:21 moritzm: installing libssh security updates
  • 10:18 XioNoX: reboot cr4-ulsfo - T259621
  • 09:58 XioNoX: drain traffic away cr4-ulsfo
  • 09:53 XioNoX: depool ulsfo - T259621
  • 09:32 elukey: set ticket max renewable lifetime to 7d on all kerberos clients (was zero, the default)
  • 09:07 jayme: imported helmfile_0.125.2-0 to jessie-wikimedia
  • 09:07 jayme: imported helmfile_0.125.2-0 to stretch-wikimedia
  • 09:05 jayme: imported helmfile_0.125.2-0 to buster-wikimedia
  • 08:39 marostegui: Remove revision triggers on db1125:3317
  • 08:39 marostegui: Stop replication on db1079 for MCR, this will generate lag on s7 on labsdb
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for MCR', diff saved to https://phabricator.wikimedia.org/P12173 and previous config saved to /var/cache/conftool/dbconfig/20200805-083916-marostegui.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P12172 and previous config saved to /var/cache/conftool/dbconfig/20200805-083833-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12171 and previous config saved to /var/cache/conftool/dbconfig/20200805-082908-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12170 and previous config saved to /var/cache/conftool/dbconfig/20200805-082138-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12169 and previous config saved to /var/cache/conftool/dbconfig/20200805-081237-marostegui.json
  • 07:49 marostegui: Stop mysql on db1117:3323 (this will generate haproxy irc alerts) T259589
  • 07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:26 moritzm: installing perl security updates on buster
  • 07:20 moritzm: installing libexif security updates on buster
  • 07:14 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:13 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 05:53 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for MCR', diff saved to https://phabricator.wikimedia.org/P12167 and previous config saved to /var/cache/conftool/dbconfig/20200805-050907-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P12166 and previous config saved to /var/cache/conftool/dbconfig/20200805-050808-marostegui.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12165 and previous config saved to /var/cache/conftool/dbconfig/20200805-050308-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12164 and previous config saved to /var/cache/conftool/dbconfig/20200805-045334-marostegui.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12163 and previous config saved to /var/cache/conftool/dbconfig/20200805-043346-marostegui.json

2020-08-04

  • 22:41 brennen: restarting php7.2-fpm on mw1404 for opcache issues
  • 21:45 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:52 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:27 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch (duration: 02m 22s)
  • 20:25 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch
  • 20:15 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances (duration: 02m 07s)
  • 20:12 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances
  • 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.3
  • 19:11 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.3 (duration: 91m 03s)
  • 19:03 brennen: current 1.36.0-wmf.3 train status (T257971): mid scap-cdb-rebuild for testwiki sync; will proceed with group0 when finished.
  • 18:55 sukhe: upload pdns-recursor_4.3.3-1~deb10u1 to apt.wm.o (buster) - T252132
  • 18:49 mutante: letting puppet install envoy on all ores1* hosts
  • 18:46 mutante: letting puppet install envoy on all ores2* hosts
  • 18:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:19 mutante: temp disabling puppet on all ores hosts to add envoy
  • 17:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.3
  • 17:36 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:05 brennen: 1.36.0-wmf.3 was branched at 2d0cf09cdf for T257971
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:24 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Set default topic_prefixes - T255888 (duration: 00m 58s)
  • 15:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:18 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove now unused wgEventServiceStreamConfig - T229863 (duration: 00m 58s)
  • 15:18 moritzm: installing jackson-databind security issues
  • 15:08 moritzm: installing qemu security updates on cloudvirt* Stretch hosts
  • 14:54 cmjohnson1: swapping kubernetes1010 network cable T257542
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:41 cmjohnson1: powercycling analytics1050 T258370
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for MCR', diff saved to https://phabricator.wikimedia.org/P12161 and previous config saved to /var/cache/conftool/dbconfig/20200804-143524-marostegui.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12160 and previous config saved to /var/cache/conftool/dbconfig/20200804-142710-marostegui.json
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12159 and previous config saved to /var/cache/conftool/dbconfig/20200804-142220-marostegui.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12158 and previous config saved to /var/cache/conftool/dbconfig/20200804-141556-marostegui.json
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12157 and previous config saved to /var/cache/conftool/dbconfig/20200804-141004-marostegui.json
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:51 hashar: Install newer openjdk on contint2001 and restarting CI Jenkins
  • 12:00 jayme: helm was updated: 2.16.7-2 -> 2.16.9-1 on chartmuseum*, contint*, deploy*
  • 11:43 Lucas_WMDE: EU backport window done
  • 11:41 marostegui: Deploy schema change on s3 codfw master, lag might show up on codfw s3 T259238
  • 11:37 moritzm: installing openjdk-11 security updates
  • 11:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Load WikibaseRepo using extension registration in production (T257433) (duration: 00m 58s)
  • 11:12 Lucas_WMDE: Deployed patch for T86738 / T259565
  • 11:03 moritzm: installing e2fsprogs security updates for stretch
  • 10:47 moritzm: installing tomcat8 security updates
  • 10:47 vgutierrez: upgrade acme-chief to version 0.28
  • 10:33 vgutierrez: upload acme-chief 0.28 to apt.wm.o (buster) - T259338
  • 10:18 moritzm: installing imagemagick security updates on stretch
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for MCR and PK change T259524', diff saved to https://phabricator.wikimedia.org/P12156 and previous config saved to /var/cache/conftool/dbconfig/20200804-100035-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12155 and previous config saved to /var/cache/conftool/dbconfig/20200804-095608-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12154 and previous config saved to /var/cache/conftool/dbconfig/20200804-094909-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 moritzm: installing python3.5 security updates
  • 08:15 moritzm: installing remaining cups security updates
  • 08:13 XioNoX: cleaning up a bunch of prefix limit reached issues
  • 08:00 marostegui: Failover m2 from db1132 to db1107 -T257540
  • 07:54 moritzm: installing poppler security updates on stretch
  • 07:43 jayme: imported helm_2.16.9-1 to jessie-wikimedia
  • 07:43 jayme: imported helm_2.16.9-1 to stretch-wikimedia
  • 07:38 jayme: imported helm_2.16.9-1 to buster-wikimedia
  • 07:34 elukey: upgrade druid analytics (backend for Turnilo/Superset/etc..) to 0.19
  • 07:32 XioNoX: remove nonstop-bridging from fasw-c-eqiad switches - T191667
  • 07:29 XioNoX: remove nonstop-bridging from eqiad asw2 switches - T191667
  • 07:28 XioNoX: remove nonstop-bridging from asw2-esams - T191667
  • 07:27 marostegui: Start topology changes on m2 - T257540
  • 07:25 moritzm: installing rails security updates
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P12153 and previous config saved to /var/cache/conftool/dbconfig/20200804-064223-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12152 and previous config saved to /var/cache/conftool/dbconfig/20200804-063026-marostegui.json
  • 06:27 _joe_: restarting docker daemon on kubestage1002, seems like a case of https://github.com/moby/moby/issues/29635
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore original weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12151 and previous config saved to /var/cache/conftool/dbconfig/20200804-062358-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12150 and previous config saved to /var/cache/conftool/dbconfig/20200804-062256-marostegui.json
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 06:13 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling lilypond execution in safe mode 3rd attempt (duration: 00m 58s)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12149 and previous config saved to /var/cache/conftool/dbconfig/20200804-061255-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12148 and previous config saved to /var/cache/conftool/dbconfig/20200804-061209-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for MCR', diff saved to https://phabricator.wikimedia.org/P12147 and previous config saved to /var/cache/conftool/dbconfig/20200804-061003-marostegui.json
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for reimage', diff saved to https://phabricator.wikimedia.org/P12146 and previous config saved to /var/cache/conftool/dbconfig/20200804-051843-marostegui.json
  • 05:04 marostegui: Reboot db1107 to pick up the last kernel
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12145 and previous config saved to /var/cache/conftool/dbconfig/20200804-050150-marostegui.json
  • 03:56 legoktm: added Arlo to wmf-deployment Gerrit group
  • 03:53 legoktm: added subbu to wmf-deployment Gerrit group

2020-08-03

  • 23:43 mutante: mwdebug1001 - temp installing apt-file for debugging an issue on mwmaint
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on fawiki (T253291) (duration: 00m 59s)
  • 21:35 sbassett: Deployed mitigations for T115888
  • 21:14 sbassett@deploy1001: Synchronized php-1.36.0-wmf.2/resources/src/mediawiki.jqueryMsg/mediawiki.jqueryMsg.js: (no justification provided) (duration: 01m 00s)
  • 18:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:09 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update (duration: 15m 53s)
  • 17:53 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update
  • 17:33 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 17:28 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: (no justification provided) (duration: 00m 35s)
  • 17:28 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: (no justification provided)
  • 16:58 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.36.0-wmf.1"
  • 16:21 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 15:55 _joe_: regenerating the TLS certs for blubberoid
  • 15:33 XioNoX: standardize all routers routing-options config
  • 15:27 marostegui: Change PK on frwiktionary.revision on db2087:3317, db2129, db2121 db2086:3317 T259524
  • 15:16 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P12143 and previous config saved to /var/cache/conftool/dbconfig/20200803-145111-marostegui.json
  • 14:40 moritzm: update Buster netboot images to Buster 10.5 T259519
  • 14:33 XioNoX: disable all ALGs from pfw3-codfw
  • 14:28 XioNoX: remove IGMP and PIM from pfw3-codfw security zones
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into dump and depool db1106', diff saved to https://phabricator.wikimedia.org/P12142 and previous config saved to /var/cache/conftool/dbconfig/20200803-142749-marostegui.json
  • 14:27 XioNoX: remove nonstop-bridging from fasw-c-codfw - T191667
  • 14:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 filippo@deploy1001: Finished deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017 (duration: 00m 23s)
  • 14:03 filippo@deploy1001: Started deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017
  • 14:00 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'enable-puppet "cdanis deploying I92e9a05"'
  • 13:56 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'disable-puppet "cdanis deploying I92e9a05"'
  • 13:27 moritzm: installing libopenmpt security updates
  • 13:15 XioNoX: remove nonstop-bridging from asw-d-codfw - T191667
  • 13:14 XioNoX: remove nonstop-bridging from asw-c-codfw - T191667
  • 13:12 XioNoX: remove nonstop-bridging from asw-b-codfw - T191667
  • 13:11 XioNoX: remove nonstop-bridging from asw-a-codfw - T191667
  • 13:05 moritzm: installing json-c security updates
  • 12:53 XioNoX: move VRRP master to cr3-eqsin
  • 12:32 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 12:26 moritzm: installing apache-log4j1.2 security updates
  • 12:20 moritzm: restarting nginx on francium to pick up luajit update
  • 12:13 kormat: disabling puppet on cumin hosts T259021
  • 11:55 moritzm: installing luajit security updates
  • 11:20 moritzm: installing ruby-rack security updates
  • 11:19 Urbanecm: EU B&C done
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 346138d: Add extra namespaces for yuewiktionary (T258913) (duration: 01m 06s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8c2a2b2: Add gpophotoeng.gov.il to the wgCopyUploadsDomains allowlist for commonswiki (T258857) (duration: 01m 07s)
  • 11:03 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: ead6b9e: New throttle rule for Czech editathon (T259352) (duration: 01m 06s)
  • 11:03 moritzm: installing ruby2.5 security updates
  • 11:01 moritzm: removing cloudcephmon100[1-3].wikimedia.org from debmonitor (these eventually got re-installed as cloudcephmon100[1-3].eqiad.wmnet)
  • 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 06s)
  • 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 08s)
  • 10:29 moritzm: installing NSS security updates on buster
  • 10:26 moritzm: restarting Apache on puppetboard to pick up curl security updates
  • 10:19 moritzm: restarting wtp1025 (parsoid canary) to pick up curl security updates
  • 09:46 moritzm: restarting mw1261-mw1265 to pick up curl security updates
  • 09:42 moritzm: installing curl security updates on stretch
  • 08:59 moritzm: installing ffmpeg security updates on jobrunners/video scalers (3.2.15 rebuilt with VP9/row-mt patches)
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12141 and previous config saved to /var/cache/conftool/dbconfig/20200803-082641-marostegui.json
  • 08:25 moritzm: installing qemu security updates on stretch
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12140 and previous config saved to /var/cache/conftool/dbconfig/20200803-082533-marostegui.json
  • 08:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify s5 wikis T259437 (duration: 01m 05s)
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify s5 wikis T259437 (duration: 01m 40s)
  • 08:07 elukey: roll restart aqs on aqs* to pick up new druid settings
  • 07:10 marostegui: Remove revision triggers from db2095:3317 for MCR changes T238966
  • 07:09 marostegui: Deploy MCR change on s7 codfw, lag will appear on codfw T238966
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12139 and previous config saved to /var/cache/conftool/dbconfig/20200803-070702-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12138 and previous config saved to /var/cache/conftool/dbconfig/20200803-052715-marostegui.json
  • 05:04 marostegui: Remove db1108:3321 and db1108:3322 from tendril and add db1108:3351 and db1108:3352 T254462
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12137 and previous config saved to /var/cache/conftool/dbconfig/20200803-050148-marostegui.json

2020-08-01

  • 16:30 Amir1: wikiadmin@10.64.32.197(avkwiki)> delete from site_identifiers; (T259122)
  • 16:27 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259122)


2000s

2010s

2020s