Server Admin Log

From Wikitech
(Redirected from Server admin log)
Jump to navigation Jump to search

2020-11-27

  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:50 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 15:06 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 14:56 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 14:50 elukey: roll restart zookeeper on druid* nodes for openjdk upgrades
  • 14:50 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 10:52 jayme: updated helmfile to 0.135.0-1 on deploy*,contint*
  • 10:51 jayme: updated helm-diff to 3.1.3-1 on contint*
  • 10:49 jayme: updated helm to 2.17.0-1 on deploy*,contint*,chartmuseum*
  • 10:06 jayme: updated helm and helmfile on deploy2001
  • 10:04 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:00 jayme: imported helm 2.17.0 into buster-wikimedia and stretch-wikimedia
  • 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:05 elukey: roll restart druid public cluster for openjdk upgrades
  • 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 06:39 marostegui: Stop mysql on es1015 T268810
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1015 from dbctl', diff saved to https://phabricator.wikimedia.org/P13454 and previous config saved to /var/cache/conftool/dbconfig/20201127-063846-marostegui.json
  • 06:30 marostegui: Remove es1016 from tendril and zarcillo T268812
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for decommissioning T268810', diff saved to https://phabricator.wikimedia.org/P13453 and previous config saved to /var/cache/conftool/dbconfig/20201127-061929-marostegui.json

2020-11-26

  • 17:18 jayme: downgrade helmfile to 0.125.2-1 on deploy*
  • 17:05 jayme: updated helm-diff and helmfile on deploy100* and deploy200*
  • 16:34 jayme: imported helm-diff 3.1.3-1 into buster-wikimedia and stretch-wikimedia
  • 15:01 moritzm: installing libonig security updates
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13452 and previous config saved to /var/cache/conftool/dbconfig/20201126-144446-root.json
  • 14:38 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 14:36 moritzm: installing zeromq3 security updates for stretch
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
  • 14:35 jbond42: failing idp back to idp2001
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13451 and previous config saved to /var/cache/conftool/dbconfig/20201126-142942-root.json
  • 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2001.codfw.wmnet
  • 14:24 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
  • 14:23 moritzm: remove labtestpuppetmaster2001 from debmonitor T258103
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13450 and previous config saved to /var/cache/conftool/dbconfig/20201126-141439-root.json
  • 13:52 elukey: roll restart druid daemons on druid analytics to pick up new openjdk upgrades
  • 13:52 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:52 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 13:50 moritzm: installing python3.5 security updates
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P13449 and previous config saved to /var/cache/conftool/dbconfig/20201126-133204-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13448 and previous config saved to /var/cache/conftool/dbconfig/20201126-132918-root.json
  • 13:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=0; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13447 and previous config saved to /var/cache/conftool/dbconfig/20201126-131414-root.json
  • 13:07 hnowlan: testing depooling kartotherian on maps2004 to reduce load
  • 13:07 hnowlan@puppetmaster1001: conftool action : set/pooled=no:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2004.codfw.wmnet
  • 13:01 jbond42: update puppet_compiler on compiler1003
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13446 and previous config saved to /var/cache/conftool/dbconfig/20201126-125911-root.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P13445 and previous config saved to /var/cache/conftool/dbconfig/20201126-124253-marostegui.json
  • 12:31 jbond42: fail over idp.wikimedia.org
  • 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:53 moritzm: rebooting seaborgium for kernel update
  • 11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:40 marostegui: Deploy schema change on s8 codfw - there will be lag on s8 codfw - T268004
  • 11:16 moritzm: restarting archiva to pick up Java security update
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13442 and previous config saved to /var/cache/conftool/dbconfig/20201126-104324-root.json
  • 10:41 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13441 and previous config saved to /var/cache/conftool/dbconfig/20201126-102820-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13440 and previous config saved to /var/cache/conftool/dbconfig/20201126-101317-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13439 and previous config saved to /var/cache/conftool/dbconfig/20201126-095813-root.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P13438 and previous config saved to /var/cache/conftool/dbconfig/20201126-094729-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P13437 and previous config saved to /var/cache/conftool/dbconfig/20201126-094702-marostegui.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P13436 and previous config saved to /var/cache/conftool/dbconfig/20201126-094639-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13435 and previous config saved to /var/cache/conftool/dbconfig/20201126-094538-root.json
  • 09:38 marostegui: Stop mysql on es1016 for decommission
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13434 and previous config saved to /var/cache/conftool/dbconfig/20201126-093035-root.json
  • 09:26 ema: deployment-cache-text06: upgrade Varnish to 6.0.7-1wm1 T268736
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13433 and previous config saved to /var/cache/conftool/dbconfig/20201126-091532-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13432 and previous config saved to /var/cache/conftool/dbconfig/20201126-090028-root.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P13431 and previous config saved to /var/cache/conftool/dbconfig/20201126-084903-marostegui.json
  • 08:40 elukey: roll restart cassandra on aqs10* for openjdk upgrades
  • 08:40 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 08:09 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 08:08 marostegui: Deploy schema change on s7 codfw - there will be lag on s7 codfw - T268004
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13430 and previous config saved to /var/cache/conftool/dbconfig/20201126-072506-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13429 and previous config saved to /var/cache/conftool/dbconfig/20201126-071514-root.json
  • 07:12 marostegui: Enable GTID on clouddb1018:3317 clouddb1014:3317 T267090
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13428 and previous config saved to /var/cache/conftool/dbconfig/20201126-071003-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13427 and previous config saved to /var/cache/conftool/dbconfig/20201126-070010-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13426 and previous config saved to /var/cache/conftool/dbconfig/20201126-065500-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13425 and previous config saved to /var/cache/conftool/dbconfig/20201126-064507-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13424 and previous config saved to /var/cache/conftool/dbconfig/20201126-063956-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1016 from dbctl', diff saved to https://phabricator.wikimedia.org/P13423 and previous config saved to /var/cache/conftool/dbconfig/20201126-063234-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13422 and previous config saved to /var/cache/conftool/dbconfig/20201126-063003-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 for decommissioning', diff saved to https://phabricator.wikimedia.org/P13421 and previous config saved to /var/cache/conftool/dbconfig/20201126-062811-marostegui.json
  • 06:17 marostegui: Stop mysql on db1124:3315 to clone clouddb1016:3315 T267090
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for schema change', diff saved to https://phabricator.wikimedia.org/P13420 and previous config saved to /var/cache/conftool/dbconfig/20201126-061552-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P13419 and previous config saved to /var/cache/conftool/dbconfig/20201126-061459-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P13418 and previous config saved to /var/cache/conftool/dbconfig/20201126-061432-marostegui.json
  • 06:08 ryankemper: T268770 [eqiad] Finished rolling restart of cirrus eqiad. All cirrus elasticsearch restarts are now complete (cloudelastic, relforge, eqiad, codfw)
  • 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 04:24 ryankemper: T268770 [eqiad] Begin rolling restart of cirrus eqiad, 3 nodes at a time
  • 04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 03:07 krinkle@deploy1001: Synchronized wmf-config/mc.php: I805699ecfa (duration: 00m 58s)

2020-11-25

  • 23:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:55 mutante: mwdebug1003 - scap pull - which rsyncs from deploy1001 and runs php-fpm restart check script (T245757)
  • 22:47 ejegg: increased Ingenico API call timeout
  • 22:34 shdubsh: beginning rolling restart of logstash cluster - eqiad
  • 22:23 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:19 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:49 krinkle@deploy1001: Synchronized php-1.36.0-wmf.18/includes/libs/CSSMin.php: I26ed3e5e9a - fix T268308 (duration: 00m 59s)
  • 20:43 mutante: LDAP added user duminasi to group wmf (T266791)
  • 20:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 18:44 elukey: upload new hive* packages 2.2.3-2 to stretch-wikimedia - thirdparty/bigtop14 component
  • 18:42 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 18:38 mutante: LDAP adding swagoel to NDA T267314#6625628
  • 18:31 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
  • 18:05 ryankemper: T268770 [cloudelastic] Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
  • 18:01 ryankemper: [cloudelastic] (forgot to mention this) Thawed writes to cloudelastic cluster following restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic --thaw` on `mwmaint1002`
  • 17:58 ryankemper: T268770 [cloudelastic] restarts complete, service is healthy. This is done.
  • 17:55 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1006` complete and all 3 elasticsearch clusters are green, all cloudelastic instances are now complete
  • 17:49 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1005` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:44 shdubsh: beginning rolling restart of logstash cluster - codfw
  • 17:44 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1004` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:39 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1003` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:39 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1002` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:28 ryankemper: T268770 [cloudelastic] restarts on `cloudelastic1001` complete and all 3 elasticsearch clusters are green, proceeding to next instance
  • 17:22 ryankemper: T268770 Freezing writes to cloudelastic in preparation for restarts: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint1002`
  • 17:09 ryankemper: T268770 [cloudelastic] Downtimed `cloudelastic100[1-6]` in icinga in preparation for cloudelastic search elasticsearch cluster restart
  • 17:05 ryankemper: T268770 Begin rolling restart of eqiad cirrus elasticsearch, 3 nodes at a time
  • 17:04 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 17:00 godog: fail sdk on ms-be2031
  • 16:49 godog: clean up sdk1 on / on ms-be2031
  • 16:46 elukey: move analytics1066 to C3 - T267065
  • 16:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:21 mutante: puppetmaster - revoking old and signing new cert for mwdebug1003
  • 16:11 elukey: move analytics1065 to C3 - T267065
  • 16:10 mutante: shutting down mwdebug1003 - reimaging for T245757
  • 16:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:02 moritzm: installing golang-1.7 updates for stretch
  • 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:57 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:38 elukey: move stat1004 to A5 - T267065
  • 15:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 15:34 moritzm: removing maps2002 from debmonitor
  • 15:10 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:04 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 14:56 moritzm: installing krb5 security updates for Buster
  • 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 14:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 14:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 14:26 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 13:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:44 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 akosiaris: assign IPs to kubestage200{1,2,3}.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox T268747
  • 13:14 marostegui: Deploy schema change on commonswiki.watchlist on s4 codfw - there will be lag on s4 codfw - T268004
  • 13:08 akosiaris: assign IPs to kubestage200{1,2,3}.codfw.wmnet, kubestagemaster2001.codfw.wmnet in netbox
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13414 and previous config saved to /var/cache/conftool/dbconfig/20201125-124202-root.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13413 and previous config saved to /var/cache/conftool/dbconfig/20201125-122659-root.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13412 and previous config saved to /var/cache/conftool/dbconfig/20201125-121155-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13411 and previous config saved to /var/cache/conftool/dbconfig/20201125-115652-root.json
  • 11:49 gilles@deploy1001: Finished deploy [performance/coal@be167b2]: T268724 (duration: 00m 06s)
  • 11:48 gilles@deploy1001: Started deploy [performance/coal@be167b2]: T268724
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P13408 and previous config saved to /var/cache/conftool/dbconfig/20201125-114717-marostegui.json
  • 11:27 gilles@deploy1001: Finished deploy [performance/coal@468bc50]: T268724 (duration: 00m 06s)
  • 11:27 gilles@deploy1001: Started deploy [performance/coal@468bc50]: T268724
  • 11:27 jbond42: install krb5 updates to jessie hosts
  • 10:52 jbond42: failover idp primary to idp2001
  • 10:51 kormat: deployed wmfmariadbpy 0.6.1 to `C:wmfmariadbpy`
  • 10:43 kormat: uploaded wmfmariadbpy 0.6.1 to stretch+buster apt repos
  • 10:21 jynus: upgrade wmfbackup-check package on alert* hosts
  • 10:11 kormat: uploaded wmfmariadbpy 0.6 to stretch+buster apt repos
  • 09:54 moritzm: uploaded krb5 1.12.1+dfsg-19+deb8u5+wmf1 to apt.wikimedia.org
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13405 and previous config saved to /var/cache/conftool/dbconfig/20201125-095239-root.json
  • 09:45 marostegui: Manually install apt-get install bsd-mailx on clouddb1015, labsdb1012 and labsdb1011 - T268725
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13404 and previous config saved to /var/cache/conftool/dbconfig/20201125-093736-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13403 and previous config saved to /var/cache/conftool/dbconfig/20201125-092232-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13402 and previous config saved to /var/cache/conftool/dbconfig/20201125-090729-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P13401 and previous config saved to /var/cache/conftool/dbconfig/20201125-085216-marostegui.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13400 and previous config saved to /var/cache/conftool/dbconfig/20201125-084603-root.json
  • 08:43 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es5 T268469 (duration: 00m 59s)
  • 08:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13399 and previous config saved to /var/cache/conftool/dbconfig/20201125-083059-root.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13398 and previous config saved to /var/cache/conftool/dbconfig/20201125-081556-root.json
  • 08:14 kormat: rebooting es1024 T268469
  • 08:08 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 08:07 kormat: stopping mariadb on es1024 T268469
  • 08:04 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es5 T268469 (duration: 00m 58s)
  • 08:02 marostegui: Upgrade db2108
  • 08:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13397 and previous config saved to /var/cache/conftool/dbconfig/20201125-080053-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P13396 and previous config saved to /var/cache/conftool/dbconfig/20201125-071951-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P13395 and previous config saved to /var/cache/conftool/dbconfig/20201125-071450-marostegui.json
  • 06:38 marostegui: Stop mysql on db1125:3317 to clone clouddb1014:3317 clouddb1018:3317 T267090
  • 06:33 marostegui: Restart clouddb1019:3314, clouddb1019:3316
  • 06:32 marostegui: Restart clouddb1015:3314, clouddb1015:3316
  • 06:28 marostegui: Check private data on clouddb1014:3312 and clouddb1018:3312 T267090
  • 05:48 marostegui: Sanitize clouddb1014:3312 and clouddb1018:3312 T267090
  • 01:10 tgr_: Evening deploys done
  • 01:07 tgr@deploy1001: Finished scap: Backport: GrowthExperiments: Add Russian aliases (T268519) (duration: 32m 09s)
  • 00:35 tgr@deploy1001: Started scap: Backport: GrowthExperiments: Add Russian aliases (T268519)

2020-11-24

  • 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2 (duration: 00m 05s)
  • 23:50 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2
  • 23:50 crusnov@deploy1001: Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 (duration: 01m 51s)
  • 23:48 crusnov@deploy1001: Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488
  • 21:27 andrewbogott: restarting slapd on serpens
  • 21:20 cdanis: ✔️ cdanis@seaborgium.wikimedia.org ~ 🕟🍵 sudo systemctl restart prometheus-openldap-exporter.service
  • 21:17 andrewbogott: restarting slapd on seaborgium
  • 20:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Remove no longer needed EventLoggingSchemas override for NavigationTiming and ResourceTiming - T254606 (duration: 01m 01s)
  • 19:49 ryankemper: [elasticsearch] Restarted all elasticsearch systemd-managed services on `relforge100[1,2]`: `elasticsearch_6@relforge-eqiad.service` and `elasticsearch_6@relforge-eqiad-small-alpha.service`
  • 19:30 gilles@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/NavigationTiming/extension.json: (no justification provided) (duration: 00m 57s)
  • 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 331a129: Remove temporary feature flags (T258116) (duration: 00m 57s)
  • 19:20 mutante: LDAP - added derick to group nda (T268150)
  • 19:17 moritzm: installing Java security updates on elastic* and relforge*
  • 19:09 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:643260 group1: Switch ParserCache to JSON (duration: 00m 57s)
  • 18:59 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:56 elukey@deploy1001: Finished deploy [analytics/refinery@1ff0868]: Regular analytics weekly train (duration: 09m 50s)
  • 18:56 volans: migrating anycast zonefile to the Netbox-generated ones - T258729
  • 18:55 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:51 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:46 elukey@deploy1001: Started deploy [analytics/refinery@1ff0868]: Regular analytics weekly train
  • 18:46 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488 p2 (duration: 00m 05s)
  • 18:45 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488 p2
  • 18:45 crusnov@deploy1001: Finished deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488 (duration: 01m 09s)
  • 18:45 elukey: restart memcached on mw2339 to pick up the correct port (was bound on 11211 rather than 11210)
  • 18:44 crusnov@deploy1001: Started deploy [netbox/deploy@88f61d0]: Test deploy of 2.9.9 to netbox-next T266488
  • 18:19 ejegg: updated Fundraising CiviCRM from 28464df973 to fb0ad7f39b
  • 18:07 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:06 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 18:04 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:51 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:10 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:29 elukey: move analytics1064 from C2 to C3 eqiad - T267065
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:06 hnowlan: finished removing restbase2009 from cassandra cluster
  • 16:01 cmjohnson1: replacing the sfp at cr1-eqiad xe-3/2/1 T267672
  • 15:42 marostegui: Drop kraken user from s4 - T268636
  • 15:38 elukey: move druid1005 from rack B7 to B6 - T267065
  • 15:35 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:33 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 15:29 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 15:28 jayme: pushed docker-registry.discovery.wmnet/calico/kube-controllers:v3.17.0 docker-registry.discovery.wmnet/calico/node:v3.17.0 docker-registry.discovery.wmnet/calico/typha:v3.17.0
  • 15:23 jayme: imported calico 3.17.0 into component/calico-future for stretch-wikimedia
  • 15:07 godog: swift eqiad-prod: decom ms-be1022 ssd from swift - T267870
  • 15:01 marostegui: Enable GTID on clouddb1013:3311 clouddb1015:3314 clouddb1017:3311 clouddb1019:3314 T267090
  • 14:58 elukey: move analytics1072 from rack B2 to B3 - T267065
  • 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:53 jayme: imported helmfile 0.135.0-1 into buster-wikimedia and stretch-wikimedia
  • 14:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P13392 and previous config saved to /var/cache/conftool/dbconfig/20201124-144219-marostegui.json
  • 14:34 liw: finished testing Scap on Beta cluster in prep for https://phabricator.wikimedia.org/T268634
  • 14:31 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:27 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13391 and previous config saved to /var/cache/conftool/dbconfig/20201124-141912-root.json
  • 14:09 moritzm: reset-failed idp-u2f.service after Hiera change (one time issue, will soon be obsolete)
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13390 and previous config saved to /var/cache/conftool/dbconfig/20201124-140409-root.json
  • 13:52 elukey@deploy1001: Finished deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252 (duration: 00m 05s)
  • 13:52 elukey@deploy1001: Started deploy [statsv/statsv@b25b6ff]: Deploy https://gerrit.wikimedia.org/r/c/analytics/statsv/+/643252
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13389 and previous config saved to /var/cache/conftool/dbconfig/20201124-134905-root.json
  • 13:40 marostegui: Stop MySQL on db1074 to clone clouddb1018 and clouddb1014 T267090
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone clouddb1018 and clouddb1014 T267090', diff saved to https://phabricator.wikimedia.org/P13388 and previous config saved to /var/cache/conftool/dbconfig/20201124-133709-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13387 and previous config saved to /var/cache/conftool/dbconfig/20201124-133402-root.json
  • 13:13 jgleeson: civicrm revision is 28464df973, config revision is 928918a9b6
  • 13:01 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.18
  • 13:01 liw: done testing Scap release candidate on beta (failed: disk full on deploy01)
  • 12:49 hnowlan: disabled cassandra service on restbase2009, starting drain
  • 12:30 liw: testing upcoming Scap release on beta
  • 12:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:59 jayme: imported helm3 3.4.1-1 into buster-wikimedia and stretch-wikimedia
  • 11:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:52 XioNoX: push CR641949 and CR641949
  • 11:38 effie: rolling depool and pool app and api clusters - T244340
  • 11:25 _joe_: rebuild docker images for T268612
  • 11:20 effie: disable puppet on api and app servers to rollout onhost memcached - T244340
  • 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:14 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:12 marostegui: Stop mysql on db1125:3312 to clone clouddb1014:3312 and clouddb1018:3312 - T267090
  • 10:45 moritzm: upgrading seaborgium to Buster
  • 10:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:31 jbond42: up0load new cas package to wikimedia-buster
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2073', diff saved to https://phabricator.wikimedia.org/P13384 and previous config saved to /var/cache/conftool/dbconfig/20201124-100139-marostegui.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2026', diff saved to https://phabricator.wikimedia.org/P13383 and previous config saved to /var/cache/conftool/dbconfig/20201124-100020-marostegui.json
  • 09:48 volans: Migrating codfw private/public primary DNS records to the auto-generated ones from Netbox - T258729
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13382 and previous config saved to /var/cache/conftool/dbconfig/20201124-094449-marostegui.json
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P13381 and previous config saved to /var/cache/conftool/dbconfig/20201124-094159-marostegui.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P13380 and previous config saved to /var/cache/conftool/dbconfig/20201124-094052-marostegui.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P13379 and previous config saved to /var/cache/conftool/dbconfig/20201124-093517-marostegui.json
  • 09:23 marostegui: Deploy schema change on db2114 and db1096:3316 - T268004
  • 09:13 ema: cp4032: switch back to varnish 6.0.6-1wm2 after T264398 experiment, fix T268243
  • 09:09 elukey: drop principals and keytabs for analytics10[42-57] - T267932
  • 09:03 gilles@deploy1001: Finished deploy [performance/navtiming@ba6cd0d]: T260580 Parse user agents in navtiming instead of relying on eventlogging to do it (duration: 00m 05s)
  • 09:03 gilles@deploy1001: Started deploy [performance/navtiming@ba6cd0d]: T260580 Parse user agents in navtiming instead of relying on eventlogging to do it
  • 08:49 _joe_: uploading the base production docker images for MediaWiki, T265324
  • 08:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:43 _joe_: refreshing debian buster base image
  • 08:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:42 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:31 marostegui: Deploy user for pki database for dbproxy1012, dbproxy1014, dbproxy2001 - T268329
  • 08:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 08:27 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:58 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13378 and previous config saved to /var/cache/conftool/dbconfig/20201124-074342-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13377 and previous config saved to /var/cache/conftool/dbconfig/20201124-073202-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13376 and previous config saved to /var/cache/conftool/dbconfig/20201124-073125-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13375 and previous config saved to /var/cache/conftool/dbconfig/20201124-072755-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13374 and previous config saved to /var/cache/conftool/dbconfig/20201124-072715-marostegui.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13373 and previous config saved to /var/cache/conftool/dbconfig/20201124-072249-marostegui.json
  • 07:00 _joe_: changing the mtail recipe for mediawiki/apache to use an actual histogram
  • 06:31 marostegui: Sanitize clouddb1019:3314 T267090
  • 06:28 marostegui: Sanitize clouddb1015:3314 T267090
  • 03:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:31 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:42 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls T268583 (duration: 01m 05s)
  • 00:29 reedy@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: Unbreak gpg encrypted polls T268583 (duration: 01m 06s)

2020-11-23

  • 22:56 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:52 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 22:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:54 mutante: mwdebug1003 - removing php packages and letting puppet reinstall them after it has the correct APT config T267248
  • 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 mutante: mwdebug1003 - scap pull because <+icinga-wm> PROBLEM - Ensure local MW versions match expected deployment on mwdebug1003 is CRITICAL
  • 20:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:09 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 04s)
  • 20:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
  • 20:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert a110db0: group1: switch ParserCache to JSON (T263579) (duration: 00m 42s)
  • 19:22 Urbanecm: Morning B&C done
  • 19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a110db0: group1: switch ParserCache to JSON (T263579) (duration: 01m 05s)
  • 19:15 Urbanecm: Synced security patch for T120883 (wmf.18)
  • 19:12 Urbanecm: Synced security patch for T120883 (wmf.16)
  • 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7561926: GrowthExperiments: Enable help panel top-posting on svwiki, ruwiki (T268227) (duration: 01m 06s)
  • 17:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:41 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2010.codfw.wmnet
  • 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:29 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 05s)
  • 17:22 mutante: DNS - new project language 'skr' added - Saraiki ( سرائیکی Sarā'īkī, also spelt Siraiki, or Seraiki) is an Indo-Aryan language of the Lahnda group, spoken in the south-western half of the province of Punjab in Pakistan.
  • 17:12 elukey: move aqs1004 from rack A4 to A3 - T267065
  • 17:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:37 elukey: move analytics1070 from rack A7 to rack A5 - T267065
  • 15:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 15:13 godog: add ipv6 forward/reverse records for grafana1002 / grafana2001
  • 15:05 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 14:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2009.codfw.wmnet
  • 14:10 kormat: cleaning up heartbeat.heartbeat on pc3 T268336
  • 14:09 kormat: cleaning up heartbeat.heartbeat on pc2 T268336
  • 14:04 kormat: cleaning up heartbeat.heartbeat on pc1 T268336
  • 14:01 moritzm: imported prometheus-php-fpm-exporter 0.4.1+git20181018.d0d1837-2 to buster-wikimedia T245757
  • 13:56 XioNoX: push CR641960
  • 13:56 godog: add ms-be106[0-3] to eqiad-prod with minimal weight - T268435
  • 13:17 moritzm: imported ploticus 2.42-4.2~wmf1 to buster-wikimedia T245757
  • 13:11 Lucas_WMDE: EU backport+config window done
  • 13:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/Wikibase: Backport: Calculate page props on-the-fly during RDF dump (T145712) (duration: 01m 14s)
  • 13:01 hnowlan: started cassandra pooling maps2009
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13370 and previous config saved to /var/cache/conftool/dbconfig/20201123-125815-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13369 and previous config saved to /var/cache/conftool/dbconfig/20201123-125759-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141 after schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13368 and previous config saved to /var/cache/conftool/dbconfig/20201123-125417-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change T267335 T267399', diff saved to https://phabricator.wikimedia.org/P13367 and previous config saved to /var/cache/conftool/dbconfig/20201123-125345-marostegui.json
  • 12:34 Lucas_WMDE: Undeployed patch for T260349
  • 12:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2008.codfw.wmnet
  • 12:32 Urbanecm: Run scap pull at mwdebug1003
  • 12:28 marostegui: Stop mysql on db1121 to clone clouddb1017:3314 clouddb1019:3314
  • 12:27 Lucas_WMDE: Deployed patch for T260349
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone clouddb1017:3314 clouddb1019:3314 T267090', diff saved to https://phabricator.wikimedia.org/P13366 and previous config saved to /var/cache/conftool/dbconfig/20201123-122549-marostegui.json
  • 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c00d7e8: Move ContentTranslation out of Beta for br, ka, ast, si and ig WPs (T267212, T266217, T266218, T266219, T266220) (duration: 01m 06s)
  • 12:01 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=zhwiki; T246539)
  • 11:49 XioNoX: eqiad row A, split LVS, Ganeti, Cloud, interface-ranges to individual terms
  • 11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 05s)
  • 11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 21s)
  • 11:25 hnowlan: starting cassandra bootstrap of maps2008
  • 11:20 effie: enable puppet on cp* hosts
  • 11:16 moritzm: installing poppler security updates on stretch
  • 11:13 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 11:13 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:05 XioNoX: eqiad row A, standardize interfaces descriptions and ranges order
  • 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:26 effie: disable puppet on cp* hosts to merge 641730
  • 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:26 moritzm: rebooting serpens
  • 10:21 XioNoX: eqiad row B, split LVS, Ganeti, Cloud, interface-ranges to individual terms
  • 09:48 XioNoX: eqiad row B, standardize interfaces descriptions and ranges order
  • 08:46 elukey: drop kerberos keytabs for analytics10[28-41] from krb1001:/srv/kerberos/keytabs, decommed nodes (old hadoop test cluster)
  • 08:43 godog: start stress testing on ms-be106* - T268435
  • 08:41 elukey: drop kerberos principals from krb1001 for analytics10[29-41], decommed nodes (old hadoop test cluster)
  • 08:36 elukey: drop analytics1028's krb principals from krb1001 - old decommed node
  • 08:35 moritzm: installing remaining krb5 security updates for Stretch
  • 07:27 marostegui: Stop MySQL on db1125:3314 to clone clouddb1015 and clouddb1019 - lag will appear on Commosnwiki on wikireplicas - T267090
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:46 marostegui: Restart clouddb1013 clouddb1015 clouddb1017 clouddb1019 for testing T267090

2020-11-21

  • 09:18 joal: Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb
  • 09:17 joal: Drop historical logs of '
  • 08:28 ariel@deploy1001: Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s)
  • 08:28 ariel@deploy1001: Started deploy [dumps/dumps@1a76a9a]: revinfo updates
  • 08:10 elukey: remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110
  • 08:05 elukey: remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105

2020-11-20

  • 23:38 mutante: synced puppet-compiler facts - new hosts should be usable in compiler
  • 22:30 mutante: cumin1001 - sudo systemctl start cumin-check-aliases -> <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK T268369
  • 21:30 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:26 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:09 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:52 mutante: releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts
  • 19:45 mutante: releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed)
  • 19:39 mutante: Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise. from 72 to 25 active alerts
  • 19:14 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:47 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:36 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:31 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:14 dwisehaupt: shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259
  • 17:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:32 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:24 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:48 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 16:40 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:29 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:29 razzi@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 16:28 razzi: removed canceled ip address records for kafka-test1002 from netbox
  • 16:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:01 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:01 razzi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:42 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:01 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:59 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 14:58 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:30 elukey: force umount/mount for /mnt/hdfs on all stat1* nodes to pick up new openjdk settings
  • 14:28 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 14:00 elukey: restart hadoop daemons on an-master[1001-1002] (Hadoop masters) to pick up new rack settings and openjdk upgrades
  • 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:34 liw: finished trying to test scap on beta cluster
  • 13:24 bblack: cp*: remove remnants of expiring globalsign-2019 unified cert, including ocsp config+outputs
  • 13:12 liw: testing upcoming Scap release on beta
  • 13:00 bblack: dns*: upgrade remainder of fleet to gdnsd to 3.4.1
  • 12:54 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 12:29 moritzm: uploaded wmf-sre-laptop 0.3 to buster-wikimedia/component/wmf-sre-laptop
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set original weight to db1089', diff saved to https://phabricator.wikimedia.org/P13351 and previous config saved to /var/cache/conftool/dbconfig/20201120-121645-marostegui.json
  • 12:14 marostegui: Run check private data on clouddb1013:3311 clouddb1013:3313 clouddb1015:3316 clouddb1017:3311 clouddb1017:3313 clouddb1019:3316 T267090
  • 12:11 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=fawiki; T246539)
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13350 and previous config saved to /var/cache/conftool/dbconfig/20201120-115057-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13349 and previous config saved to /var/cache/conftool/dbconfig/20201120-114758-marostegui.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P13348 and previous config saved to /var/cache/conftool/dbconfig/20201120-114614-marostegui.json
  • 11:15 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 volans@cumin2001: START - Cookbook sre.dns.netbox
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13347 and previous config saved to /var/cache/conftool/dbconfig/20201120-104459-root.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13345 and previous config saved to /var/cache/conftool/dbconfig/20201120-102955-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13344 and previous config saved to /var/cache/conftool/dbconfig/20201120-101452-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13342 and previous config saved to /var/cache/conftool/dbconfig/20201120-095949-root.json
  • 09:56 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/642346)
  • 09:21 marostegui: Move pc2010 right under pc1007 to investigate lag issues (using orchestrator for this move)
  • 09:07 moritzm: updating krb5 on krb*
  • 08:57 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 08:50 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 08:32 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 08:31 elukey: roll restart kafka daemons on kafka-jumbo100* to pick up openjdk upgrades
  • 08:13 marostegui: Enable GTID on clouddb1015:3316 clouddb1019:3316 - T267090
  • 08:10 elukey: update analytics filters on cr1/cr2 eqiad (ref: https://gerrit.wikimedia.org/r/642268)
  • 08:04 marostegui: Stop db1124:3313 to clone clouddb1013:3313, clouddb1017:3313
  • 08:00 XioNoX: update cloud-in4 filter in codfw
  • 04:57 bblack: dns3001: upgrade gdnsd to 3.4.1
  • 04:55 bblack: authdns1001: upgrade gdnsd to 3.4.1
  • 04:49 bblack: authdns2001: upgrade gdnsd to 3.4.1
  • 04:45 bblack: dns3002: upgrade gdnsd to 3.4.1
  • 04:41 bblack: reprepro: uploaded gdnsd-3.4.1-1~wmf1 to buster-wikimedia

2020-11-19

  • 23:59 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:18 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:23 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:07 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:06 krinkle@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo/: T267668 - I1115135ee, and Ic239bb9807 (duration: 01m 07s)
  • 20:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:12 herron: upgraded logstash-next to kibana 7.10
  • 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:23 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:20 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:48 mutante: gerrit1001 - re-enabling puppet after merging gerrit:642086 for T268260 (upstream bug 13701)
  • 18:41 mutante: gerrit1001 - added RequestHeader set "X-Forwarded-Proto" expr=%{REQUEST_SCHEME} in apache config, reloaded apache to fix redirect issue
  • 18:37 mutante: gerrit1001 - disabled puppet
  • 18:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:07 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 17:59 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:47 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:33 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5 (duration: 00m 09s)
  • 17:33 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit1001 (primary) to Gerrit 3.2.5
  • 17:32 hashar: Upgrading Gerrit to 3.2.5 and restarting it
  • 17:05 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 06s)
  • 17:04 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
  • 16:59 ryankemper: T246345 [wdqs] Data-transfer of new wdqs node `wdqs1012` is complete, beginning transfer of `wdqs1004`->`wdqs1013` (public) and `wdqs1003`->`wdqs1011` (internal). Once these transfers are done `wdqs1012` and `wdqs1013` will need to be pooled and have their weights set to 10 after verifying they're healthy
  • 16:58 kormat: started mariadb on pc2010, now with more 🤞
  • 16:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:54 kormat: stopping mariadb on pc2010
  • 16:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:43 hashar: Restarting Gerrit replica instance on gerrit2001
  • 16:42 hashar@deploy1001: Finished deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server) (duration: 00m 10s)
  • 16:42 hashar@deploy1001: Started deploy [gerrit/gerrit@9d27055]: Upgrade gerrit2001 to Gerrit 3.2.5 (take 2 after rebasing deploy server)
  • 16:41 kormat: stopped and started replication on pc2010 to see if that would help it recover
  • 16:40 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5 (duration: 00m 05s)
  • 16:40 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: Upgrade gerrit2001 to Gerrit 3.2.5
  • 16:35 elukey: roll restart hadoop workers for openjdk upgrades
  • 16:35 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
  • 15:58 moritzm: installing jupyter-notebook security updates on an-coord*
  • 15:56 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
  • 15:52 bblack: dns*: upgrade to gdnsd-3.4.0 on remainder of the dns fleet'
  • 15:44 bblack: dns3001: upgrade gdnsd to 3.4.0
  • 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:41 bblack: dns1001: upgrade gdnsd to 3.4.0
  • 15:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:36 bblack: dns3002: upgrade gdnsd to 3.4.0
  • 15:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:31 bblack: authdns1001: upgrade gdnsd to 3.4.0
  • 15:30 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:26 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:18 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:57 moritzm: installing openldap security updates on buster (client side tools/libs, slapd already updated)
  • 14:54 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:47 marostegui: Sanitize enwiki on clouddb1017 T267090
  • 14:45 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:43 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:41 marostegui: Sanitize enwiki on clouddb1013 T267090
  • 14:39 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:29 moritzm: rolling restart of app server canaries to pick up latest sec updates
  • 14:21 moritzm: installing krb5 security updates on stretch
  • 14:02 bblack: authdns2001: upgrade gdnsd to 3.4.0
  • 13:45 XioNoX: push current state of audited cloud-in4 filter - T264993
  • 13:42 moritzm: removing stray wireshark 2.2.6 wireshark libs on Stretch
  • 13:32 moritzm: installing wireshark security updates
  • 13:30 bblack: dns4002: upgrade gdnsd to 3.4.0
  • 13:28 bblack: reprepro: updated buster-wikimedia gdnsd package to 3.4.0-1~wmf1
  • 12:43 moritzm: installing libproxy security updates on stretch
  • 12:38 marostegui: Stop mysql on db1106 to clone clouddb1013 and clouddb1017 T267090
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 T267090', diff saved to https://phabricator.wikimedia.org/P13334 and previous config saved to /var/cache/conftool/dbconfig/20201119-122459-marostegui.json
  • 12:00 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:44 moritzm: installing Java security updates on Hadoop/Kafka Jumbo hosts
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:00 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ruwiki; T246539)
  • 10:28 marostegui: Restart mysql on db1115, tendril and dbtree will be down for a few minutes
  • 09:40 marostegui: Stop mysql on db1124:3311 to clone clouddb1013 and clouddb1017, there will be lag on s1 on wikireplicas - T267090
  • 09:29 moritzm: upgrading serpens to Buster
  • 09:26 XioNoX: eqiad row C: move Ganeti/LVS interfaces to individual terms
  • 09:07 elukey: restart kafka daemons on kafka-jumbo1001 for openjdk upgrades (canary)
  • 08:56 effie: disable puppet on mw canaries to merge 641816
  • 08:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 08:49 elukey: restart hadoop daemons on analytics1058 for openjdk upgrades (canary)
  • 08:25 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 08:19 XioNoX: eqiad row C: standardize interfaces config
  • 07:55 XioNoX: eqiad row D: move Ganeti/LVS interfaces to individual terms
  • 07:47 XioNoX: eqiad row D: standardize interfaces config
  • 07:22 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 07:05 elukey: roll restart java daemons on Hadoop test for openjdk upgrades
  • 07:05 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 06:22 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:21 marostegui: Remove es1014 from tendril and zarcillo T268102
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:08 marostegui: Stop mysql on db1125:3316 to clone clouddb1015 and clouddb1019, there will be lag on s6 on wikireplicas - T267090
  • 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 01:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2020-11-18

  • 23:34 mutante: disabling puppet on memcache::mediawiki - deploying gerrit:637742
  • 22:56 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after T267269 (duration: 00m 04s)
  • 22:56 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after T267269
  • 22:24 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploy GlobalWatchlist to beta (noop; T268181) (duration: 01m 04s)
  • 22:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy GlobalWatchlist extension: Prepare IS.php to know relevant variables (noop; T268181) (duration: 01m 06s)
  • 22:05 urbanecm@deploy1001: Synchronized wmf-config/extension-list: Deploy GlobalWatchlist extension to beta: add it to extension-list (T268181) (duration: 01m 05s)
  • 21:53 mutante: mwdebug1003 - restarting ferm because config was generated but service not restarted due to puppet dependency errors, breaking NRPE monitoring T267248
  • 21:47 mutante: mwdebug1003 - scap pull - T267248
  • 21:40 mutante: mw1317,mw1318 - back in action and all monitoring activated again
  • 21:17 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1318.eqiad.wmnet,cluster=videoscaler
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet
  • 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
  • 21:02 mutante: mw1317,mw1318 - repooled=no after physical move to rack B
  • 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
  • 20:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
  • 20:27 mutante: mw1317, mw1318 shutting down for physical move
  • 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1318.eqiad.wmnet
  • 20:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1317.eqiad.wmnet
  • 20:15 mutante: mw1317,mw1318 - downtimed and depooled - they are physically moving from B7 to B5 (T266164)
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
  • 20:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
  • 20:10 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.18 (duration: 01m 03s)
  • 20:09 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.18
  • 20:03 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 20:03 akosiaris@cumin1001: conftool action : set/weight=0; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 19:53 akosiaris@cumin1001: conftool action : set/pooled=no; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 19:48 otto@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - T240460 (duration: 01m 06s)
  • 19:45 otto@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/EventLogging/modules/ext.eventLogging/core.js: EventLogging legacy events should use dt as server side receive time - T240460 (duration: 01m 07s)
  • 19:26 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:635607 - Switch ParserCache to JSON for group0 wikis (duration: 01m 05s)
  • 19:19 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:635086 - Enable parsoid on api_appserver (duration: 01m 04s)
  • 19:19 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 19:13 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:641527 - Set to 0 (duration: 01m 04s)
  • 18:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 elukey: shutdown an-presto1004 for hw maintenance
  • 17:13 akosiaris: T241230 pool codfw kubernetes for recommendation-api at a very low weight
  • 17:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 17:12 akosiaris@cumin1001: conftool action : set/weight=1; selector: service=recommendation-api,name=kubernetes.*,dc=codfw
  • 16:52 jbond42: drop os_version/requiers_os functions from wmflib
  • 16:50 elukey: update /etc/krb5.keytab on krb1001/krb2001 to match the most up to date key version for host/krb2001.codfw.wmnet
  • 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:49 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:44 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:38 reedy@deploy1001: Synchronized wmf-config/logging.php: T268141 (duration: 01m 06s)
  • 16:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:27 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:56 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 Urbanecm: mwscript deleteEqualMessages.php --wiki=cswiki --delete
  • 15:14 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:12 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:11 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:09 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:05 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:03 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php (T264797)
  • 14:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:30 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:13 Urbanecm: Purge https://2030.wikimedia.org/ via purgeList.php (T264797)
  • 14:09 elukey: copied /etc/krb5.keytab from krb1001 to krb2001 (the last one contained only one principal for 2001, the first one both for 1001 and 2001)
  • 14:05 moritzm: installing openldap security updates on ro replicas
  • 14:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 14:02 elukey: restart krb5-kpropd.service on krb2001 to force the pick up of new client configs
  • 13:35 bblack: cache_text: Executing "varnishadm -n frontend param.set nuke_limit 1000" - T266373
  • 13:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
  • 13:30 moritzm: installing openldap security updates on corp replicas
  • 13:08 Urbanecm: EU B&C done (~15 minutes ago)
  • 12:43 akosiaris: sync staging cluster's helmfile.d/admin state. Aside from calico, the rest is a noop
  • 12:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 12:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 12:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: 5488f56: Fix NewcomerTasksCacheRefreshJob (T268008) (duration: 01m 05s)
  • 12:25 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/NewcomerTasksCacheRefreshJob.php: 45d71a3: Fix NewcomerTasksCacheRefreshJob (T268008) (duration: 01m 05s)
  • 12:13 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/{bnwiki,bnwiki-1.5x,bnwiki-2x}.png (T265553)
  • 12:13 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=releases
  • 12:11 urbanecm@deploy1001: Synchronized static/images/project-logos/: 70aabf7: Regenerate Bengali Wikipedia logo (T265553) (duration: 01m 06s)
  • 12:06 akosiaris@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=wikifeeds
  • 12:01 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after restarting mysql T266483 (duration: 01m 06s)
  • 12:00 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=blubberoid,name=eqiad
  • 11:56 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=frwiki; T246539)
  • 11:56 marostegui: Restart mysql on pc1009 T266483
  • 11:56 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; T246539)
  • 11:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 and place pc1010 instead of it T266483 (duration: 01m 18s)
  • 11:40 XioNoX: eqiad row D: remove un-needed "enable" keywords
  • 10:59 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99)
  • 10:59 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert
  • 10:58 jbond42: renew sretest1002 ssl cert to test cookbook
  • 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:25 godog: ms-be1022 - disable failed sdb
  • 10:01 XioNoX: eqiad row D: Standardize interfaces descriptions
  • 09:56 moritzm: uploaded libexif 0.6.21-2+deb8u4+wmf1 to jessie-wikimedia
  • 09:22 elukey: set dns_canonicalize_hostname = false to all kerberos clients
  • 09:13 jbond42: renew puppet certificate of seaborgium
  • 08:34 marostegui: Stop MySQL on es1011, es1012, es1014 T268100 T268101 T268102
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1012 from dbctl T268101', diff saved to https://phabricator.wikimedia.org/P13326 and previous config saved to /var/cache/conftool/dbconfig/20201118-082942-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13325 and previous config saved to /var/cache/conftool/dbconfig/20201118-082636-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13324 and previous config saved to /var/cache/conftool/dbconfig/20201118-082618-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 80%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13323 and previous config saved to /var/cache/conftool/dbconfig/20201118-081115-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13322 and previous config saved to /var/cache/conftool/dbconfig/20201118-075612-root.json
  • 07:45 marostegui: Deploy schema change on db1098:3316 T267335 T267399
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 60%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13321 and previous config saved to /var/cache/conftool/dbconfig/20201118-074108-root.json
  • 07:28 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=nlwiki; T246539)
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13320 and previous config saved to /var/cache/conftool/dbconfig/20201118-072605-root.json
  • 07:16 marostegui: Run check table on s6 on db1125:3316 T267090
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 30%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13319 and previous config saved to /var/cache/conftool/dbconfig/20201118-071101-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13318 and previous config saved to /var/cache/conftool/dbconfig/20201118-065558-root.json
  • 06:53 elukey: restart also mirror maker on kafka-main1001/1003 (seems not related but just to clear old errors and a possible weird state)
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 100%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13317 and previous config saved to /var/cache/conftool/dbconfig/20201118-064556-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 20%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13316 and previous config saved to /var/cache/conftool/dbconfig/20201118-064054-root.json
  • 06:37 elukey: restart kafka-mirror-main-codfw_to_main-eqiad@0.service on kafka-main1002 - consumer msg rate low since kafka-main2003 went down for codfw c7 failure
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 75%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13315 and previous config saved to /var/cache/conftool/dbconfig/20201118-063052-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Slowly pool es1032 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13314 and previous config saved to /var/cache/conftool/dbconfig/20201118-062551-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1014 from dbctl', diff saved to https://phabricator.wikimedia.org/P13313 and previous config saved to /var/cache/conftool/dbconfig/20201118-062547-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 50%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13312 and previous config saved to /var/cache/conftool/dbconfig/20201118-061549-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 before decommissioning it', diff saved to https://phabricator.wikimedia.org/P13311 and previous config saved to /var/cache/conftool/dbconfig/20201118-061340-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1027 as new es1 master', diff saved to https://phabricator.wikimedia.org/P13310 and previous config saved to /var/cache/conftool/dbconfig/20201118-061218-marostegui.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1011 from dbctl', diff saved to https://phabricator.wikimedia.org/P13309 and previous config saved to /var/cache/conftool/dbconfig/20201118-061112-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1032 with minimum weight on es1 T261717', diff saved to https://phabricator.wikimedia.org/P13308 and previous config saved to /var/cache/conftool/dbconfig/20201118-060641-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 25%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13307 and previous config saved to /var/cache/conftool/dbconfig/20201118-060045-root.json
  • 05:47 marostegui: Run check table on enwiki on db1124:3311 T267090
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1018 (re)pooling @ 10%: Slowly pool es1018 after cloning es1032 T261717', diff saved to https://phabricator.wikimedia.org/P13306 and previous config saved to /var/cache/conftool/dbconfig/20201118-054542-root.json
  • 00:53 tgr_: also deployed Suggested Edits: Guard against task type not existing (T268012)
  • 00:52 tgr@deploy1001: Synchronized php-1.36.0-wmf.18/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: Suggested edits: Guard against empty topic data (T268015) (duration: 01m 07s)
  • 00:27 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable watchlist expiry feature on Wikidata & Commons (T266874) (duration: 01m 03s)

2020-11-17

  • 22:54 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 00m 07s)
  • 22:54 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c] (thin): Regular analytics weekly train THIN [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
  • 22:53 mforns@deploy1001: Finished deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13] (duration: 12m 51s)
  • 22:45 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 22:40 mforns@deploy1001: Started deploy [analytics/refinery@f19d20c]: Regular analytics weekly train [analytics/refinery@f19d20c21ada05df230d00c6e0022a7d5c356c13]
  • 22:39 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 22:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 22:10 mutante: otrs1001 - systemctl start otrs-cache-cleanup
  • 22:08 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere (duration: 11m 07s)
  • 22:07 mutante: otrs1001 - removing otrs-cache-cleanup cron from otrs's crontab - adding same command as systemd timer. gerrit:637038 T265138
  • 21:57 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, everywhere
  • 21:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw (duration: 07m 11s)
  • 21:24 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, codfw
  • 20:56 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.18
  • 20:43 Urbanecm: End of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; T246539)
  • 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.18 (duration: 39m 37s)
  • 19:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:52 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.18
  • 19:50 ppchelko@deploy1001: Finished deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010 (duration: 02m 03s)
  • 19:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8363aeb]: update to service-runner 2.8.0, canary on 2010
  • 19:46 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.11 (duration: 13m 05s)
  • 19:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:18 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:12 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: wgEventStreamsDefaultSettings in beta should only set eqiad as topic prefix - T253069 (duration: 02m 26s)
  • 19:12 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:38 ejegg: updated standalone SmashPig deployment from 09f29c1da5 to 63dffcb11f
  • 18:36 ejegg: updated fundraising python tools from 68e054c9ad to 41cab089da
  • 18:09 jynus: stopping db1139 for hw maintenance T261405
  • 17:59 dpifke@deploy1001: Finished deploy [performance/navtiming@8eaf7db]: (no justification provided) (duration: 00m 05s)
  • 17:58 dpifke@deploy1001: Started deploy [performance/navtiming@8eaf7db]: (no justification provided)
  • 17:37 dpifke@deploy1001: Finished deploy [performance/coal@43b91df]: (no justification provided) (duration: 00m 06s)
  • 17:37 dpifke@deploy1001: Started deploy [performance/coal@43b91df]: (no justification provided)
  • 17:34 dpifke@deploy1001: Finished deploy [statsv/statsv@249d073]: (no justification provided) (duration: 00m 05s)
  • 17:34 dpifke@deploy1001: Started deploy [statsv/statsv@249d073]: (no justification provided)
  • 17:27 dpifke@deploy1001: Finished deploy [statsv/statsv@873ea90]: (no justification provided) (duration: 00m 05s)
  • 17:27 dpifke@deploy1001: Started deploy [statsv/statsv@873ea90]: (no justification provided)
  • 17:19 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 17:16 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55d4d41]: (no justification provided) (duration: 00m 04s)
  • 17:16 dpifke@deploy1001: Started deploy [performance/arc-lamp@55d4d41]: (no justification provided)
  • 17:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: (no justification provided) (duration: 00m 04s)
  • 17:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: (no justification provided)
  • 17:08 dpifke@deploy1001: Finished deploy [performance/coal@5a32eb2]: (no justification provided) (duration: 00m 04s)
  • 17:08 dpifke@deploy1001: Started deploy [performance/coal@5a32eb2]: (no justification provided)
  • 16:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:46 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:42 jbond42: re-enable puppet fleet wide
  • 16:36 clarakosi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:33 clarakosi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:29 clarakosi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:22 moritzm: uploaded zeromq3 4.0.5+dfsg-2+deb8u2+wmf1 to jessie-wikimedia
  • 16:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:13 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:04 volans: powercycle ms-be1030.eqiad.wmnet, unresponsive to ping/ssh, no prompt in console, nothing in hw logs
  • 15:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:16 jbond42: disable puppet fleet wide
  • 15:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:09 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:01 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:59 cdanis@deploy1001: Synchronized docroot/thankyou: Special docroot for thankyouwiki T259312 d2a20ec57 (duration: 00m 55s)
  • 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:58 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:57 elukey: stutdown stat1008 for ram expansion
  • 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:55 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:49 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:47 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:43 XioNoX: codfw row A: move ganeti and LVS from interface-range to individual term
  • 14:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:37 Urbanecm: Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=itwiki; T246539)
  • 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:03 XioNoX: codfw row A: standardize interfaces
  • 13:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 13:36 XioNoX: codfw row B: move ganeti, Cloud and LVS from interface-range to individual term
  • 13:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 13:22 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 13:21 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:09 XioNoX: codfw row B: remove extra "enable"
  • 12:59 Lucas_WMDE: EU backport&config window done (again ☺)
  • 12:58 moritzm: updating idp-test* to 6.2.4-2
  • 12:57 XioNoX: codfw row B: Standardize interfaces descriptions
  • 12:55 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: Suggested Edits: Guard against task type not existing (T268012) (duration: 00m 58s)
  • 12:53 bblack: cpNNNN: removing old (30d+) failure reports from /var/cache/ocsp
  • 12:42 moritzm: IDP updated to 6.2.4
  • 12:33 Lucas_WMDE: reopen EU backport&config window
  • 12:23 XioNoX: codfw row C: move ganeti and LVS from interface-range to individual term
  • 12:15 XioNoX: codfw row C: remove extra "enable"
  • 12:15 Lucas_WMDE: EU backport&config window done
  • 12:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2006.codfw.wmnet
  • 12:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove migration settings in InitialiseSettings.php (T264286), 2/2 (labs) (duration: 00m 56s)
  • 12:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove migration settings in InitialiseSettings.php (T264286), 1/2 (prod) (duration: 00m 56s)
  • 12:05 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Remove migration settings in Wikibase.php (T264286) (duration: 00m 57s)
  • 11:51 XioNoX: codfw row C: Standardize interfaces descriptions
  • 10:46 marostegui: Run a test on check_private_data on clouddb1013 for s1 and s3 - T267090
  • 10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 in pc2 after restarting mysql T266483 (duration: 00m 56s)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:19 marostegui: Restart mysql on pc1008 T266483
  • 10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 and place pc1010 instead of it T266483 (duration: 00m 57s)
  • 09:29 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 09:17 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:14 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 09:10 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:08 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:56 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:56 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1028 as new es3 master', diff saved to https://phabricator.wikimedia.org/P13301 and previous config saved to /var/cache/conftool/dbconfig/20201117-085542-marostegui.json
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 before decommissioning it and pool es1026 as new es2 master', diff saved to https://phabricator.wikimedia.org/P13300 and previous config saved to /var/cache/conftool/dbconfig/20201117-085432-marostegui.json
  • 08:52 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13299 and previous config saved to /var/cache/conftool/dbconfig/20201117-084744-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13298 and previous config saved to /var/cache/conftool/dbconfig/20201117-084733-root.json
  • 08:43 marostegui: Truncate tendril.global_status_log - T231185
  • 08:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:33 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 80%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13297 and previous config saved to /var/cache/conftool/dbconfig/20201117-083241-root.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 80%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13296 and previous config saved to /var/cache/conftool/dbconfig/20201117-083229-root.json
  • 08:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:22 volans: restart netbox on netbox1001 to test new logging configuration
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13295 and previous config saved to /var/cache/conftool/dbconfig/20201117-081737-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13294 and previous config saved to /var/cache/conftool/dbconfig/20201117-081726-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 60%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13293 and previous config saved to /var/cache/conftool/dbconfig/20201117-080234-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 60%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13292 and previous config saved to /var/cache/conftool/dbconfig/20201117-080222-root.json
  • 07:58 XioNoX: codfw row D: Convert LVS ranges to individual interfaces
  • 07:54 XioNoX: codfw row D: explicitly set access ports to "interface-mode access"
  • 07:49 XioNoX: split codfw row D ganeti switch ports out of the interface group
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13291 and previous config saved to /var/cache/conftool/dbconfig/20201117-074730-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13290 and previous config saved to /var/cache/conftool/dbconfig/20201117-074719-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 30%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13289 and previous config saved to /var/cache/conftool/dbconfig/20201117-073227-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 30%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13288 and previous config saved to /var/cache/conftool/dbconfig/20201117-073216-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 100%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13287 and previous config saved to /var/cache/conftool/dbconfig/20201117-073057-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 100%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13286 and previous config saved to /var/cache/conftool/dbconfig/20201117-073032-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13285 and previous config saved to /var/cache/conftool/dbconfig/20201117-071723-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13284 and previous config saved to /var/cache/conftool/dbconfig/20201117-071712-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 75%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13283 and previous config saved to /var/cache/conftool/dbconfig/20201117-071553-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 75%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13282 and previous config saved to /var/cache/conftool/dbconfig/20201117-071529-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 20%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13281 and previous config saved to /var/cache/conftool/dbconfig/20201117-070220-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 20%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13280 and previous config saved to /var/cache/conftool/dbconfig/20201117-070209-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 50%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13278 and previous config saved to /var/cache/conftool/dbconfig/20201117-070050-root.json
  • 07:00 marostegui: Stop mysql on db1124: s1 and s3, this will generate lag on enwiki and s3 on labsdb - T267090
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 50%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13277 and previous config saved to /var/cache/conftool/dbconfig/20201117-070025-root.json
  • 06:51 marostegui: Upgrade db1077 and pc2010 to 10.4.17
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Slowly pool es1034 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13276 and previous config saved to /var/cache/conftool/dbconfig/20201117-064716-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Slowly pool es1033 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13275 and previous config saved to /var/cache/conftool/dbconfig/20201117-064705-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 25%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13274 and previous config saved to /var/cache/conftool/dbconfig/20201117-064546-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 25%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13273 and previous config saved to /var/cache/conftool/dbconfig/20201117-064522-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1034 with minimum weight on es3 T261717', diff saved to https://phabricator.wikimedia.org/P13272 and previous config saved to /var/cache/conftool/dbconfig/20201117-063933-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1033 with minimum weight on es2 T261717', diff saved to https://phabricator.wikimedia.org/P13271 and previous config saved to /var/cache/conftool/dbconfig/20201117-063805-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1019 (re)pooling @ 10%: Slowly pool es1019 after cloning es1034 T261717', diff saved to https://phabricator.wikimedia.org/P13270 and previous config saved to /var/cache/conftool/dbconfig/20201117-063043-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1015 (re)pooling @ 10%: Slowly pool es1015 after cloning es1033 T261717', diff saved to https://phabricator.wikimedia.org/P13269 and previous config saved to /var/cache/conftool/dbconfig/20201117-063019-root.json
  • 02:37 dwisehaupt: shifted portion of thank you emails flowing through frmx's to 60% of the total volume
  • 01:59 eileen_: civicrm revision is b6fe8bd791, config revision is 61e2000391

2020-11-16

  • 23:28 mutante: cumin1001 - sudo systemctl start cumin-check-aliases (to confirm switching cron to timer worked) T265138
  • 22:22 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 22:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 22:09 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 22:06 mutante: planet - fixed updates of uk.planet which failed due to non-ASCII chars in a URL - since updates are systemd timers now that affects the entire systemd state monitoring
  • 21:40 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 21:40 rzl@cumin1001: conftool action : set/weight=1; selector: name=mw2250.codfw.wmnet,cluster=videoscaler,service=canary
  • 21:38 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet,cluster=jobrunner
  • 21:30 mutante: peek2001 - mv /var/lib/peek/git to git.old ; run puppet ; let it fix git checkout
  • 21:07 rzl: disable puppet on jobrunners T264991
  • 20:40 mutante: planet1002/planet2002 - delete entire crontab of user planet, drop update cronjobs after switching to systemd timers with gerrit:636105 (T265138)
  • 20:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:06 mutante: releases2002 systemctl reset-failed should clear Icinga systemd alert after gerrit:641228
  • 20:05 dwisehaupt: disabling process-control jobs and moving to maintenance mode for maint window
  • 19:57 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint (duration: 02m 27s)
  • 19:51 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint
  • 19:48 effie: disable puppet on parsoid servers - T264991
  • 19:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 18:59 mutante: mw2255 - is pooled and puppet works on next run, after it removed php 7.2 config files
  • 18:56 mutante: running puppet on mw2313 and mw2255 which were listed in puppetboard as failed puppet runs
  • 18:15 rzl: disable puppet on 'A:mw-api and not A:mw-api-canary' T264991
  • 18:05 effie: disable puppet on all appservers
  • 17:48 elukey: enable and run puppet on kafka-main2003 (it will start kafka services) - T267865
  • 17:42 dwisehaupt: frmon1001 upgraded to buster
  • 17:36 volans: moved interfaces in Netbox from old to new switch - T267865
  • 17:24 vgutierrez: switching back from lvs2010 to lvs2007 - T267865
  • 17:21 vgutierrez: repooling cp2037 and cp2038 - T267865
  • 16:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 16:40 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:16 XioNoX: update c7 serial in row C VC config - T267865
  • 16:16 rzl: disable puppet on A:mw-api-canary T264991
  • 16:14 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 16:08 effie: disable puppet in appservers canaries to install ICU 63 - T264991
  • 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet
  • 16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2037.codfw.wmnet
  • 16:06 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 16:03 hnowlan: joined maps2006 to maps codfw cassandra cluster
  • 16:01 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:57 hnowlan: roll-restarting eqiad restbase for java security updates
  • 15:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:50 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:40 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:40 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 14:16 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 in pc1 after restarting mysql T266483 (duration: 00m 59s)
  • 14:06 marostegui: Restart pc1007's mysql T266483
  • 14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 and place pc1010 instead of it T266483 (duration: 01m 00s)
  • 13:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 13:00 kormat: running schema change against s1 in codfw T259831
  • 12:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:43 moritzm: installing tcpdump security updates
  • 12:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:25 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 12:25 hnowlan: roll-restarting restbase-codfw
  • 12:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 12:10 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 hnowlan: roll restarting sessionstore for java updates
  • 11:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:13 moritzm: installing poppler security updates
  • 10:46 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:46 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:45 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:44 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:44 dcaro@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
  • 09:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 08:39 godog: centrallog1001 move invalid config /etc/logrotate.d/logrotate-debug to /etc
  • 08:35 moritzm: installing codemirror-js security updates
  • 08:32 XioNoX: asw-c-codfw> request system power-off member 7 - T267865
  • 08:24 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb] (duration: 00m 07s)
  • 08:23 joal@deploy1001: Started deploy [analytics/refinery@3df51cb] (thin): Analytics special train for webrequest table update THIN [analytics/refinery@3df51cb]
  • 08:23 joal@deploy1001: Finished deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb] (duration: 10m 09s)
  • 08:13 joal@deploy1001: Started deploy [analytics/refinery@3df51cb]: Analytics special train for webrequest table update [analytics/refinery@3df51cb]
  • 08:08 XioNoX: asw-c-codfw> request system power-off member 7 - T267865
  • 06:35 marostegui: Stop replication on s3 codfw master (db2105) for MCR schema change deployment T238966
  • 06:14 marostegui: Stop MySQL on es1018, es1015, es1019 to clone es1032, es1033, es1034 - T261717
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1018, es1015, es1019 - T261717', diff saved to https://phabricator.wikimedia.org/P13262 and previous config saved to /var/cache/conftool/dbconfig/20201116-060624-marostegui.json
  • 06:02 marostegui: Restart mysql on db1115 (tendril/dbtree) due to memory usage
  • 00:55 shdubsh: re-applied mask to kafka and kafka-mirror-main-eqiad_to_main-codfw@0 on kafka-main2003 and disabled puppet to prevent restart - T267865
  • 00:19 elukey: run 'systemctl mask kafka' and 'systemctl mask kafka-mirror-main-eqiad_to_main-codfw@0' on kafka-main2003 (for the brief moment when it was up) to avoid purged issues - T267865
  • 00:09 elukey: sudo cumin 'cp2028* or cp2036* or cp2039* or cp4022* or cp4025* or cp4028* or cp4031*' 'systemctl restart purged' -b 3 - T267865

2020-11-15

  • 22:10 cdanis: restart some purgeds in ulsfo as well T267865 T267867
  • 22:03 cdanis: T267867 T267865 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕔🍺 sudo cumin -b2 -s10 'A:cp and A:codfw' 'systemctl restart purged'
  • 14:00 cdanis: powercycling ms-be1022 via mgmt
  • 11:21 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:21 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:12 vgutierrez: depooling lvs2007, lvs2010 taking over text traffic on codfw - T267865
  • 10:00 elukey: cumin 'cp2042* or cp2036* or cp2039*' 'systemctl restart purged' -b 1
  • 09:57 elukey: restart purged on cp4028 (consumer stuck due to kafka-main2003 down)
  • 09:55 elukey: restart purged on cp4025 (consumer stuck due to kafka-main2003 down)
  • 09:53 elukey: restart purged on cp4031 (consumer stuck due to kafka-main2003 down)
  • 09:50 elukey: restart purged on cp4022 (consumer stuck due to kafka-main2003 down)
  • 09:42 elukey: restart purged on cp2028 (kafka-main2003 is down and there are connect timeouts errors)
  • 09:07 Urbanecm: Change email for SUL user Botopol via resetUserEmail.php (T267866)
  • 08:27 elukey: truncate -s 10g /var/lib/hadoop/data/n/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000177/stderr on an-worker1100
  • 08:24 elukey: sudo truncate -s 10g /var/lib/hadoop/data/c/yarn/logs/application_1601916545561_173219/container_e25_1601916545561_173219_01_000019/stderr on an-worker1098

2020-11-13

  • 22:06 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=myvwiki autopatrolled # T105570
  • 22:04 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki editor # T105570
  • 21:42 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwikinews reviewer # T105570
  • 21:40 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=bnwiki editor # T105570
  • 21:39 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki flood # T105570
  • 21:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=test2wiki upwizcampeditors # T105570
  • 21:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=aawiki communityapplica # T105570
  • 21:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=enwiki epadmin # T105570
  • 16:50 _joe_: manually rotate user.log on centrallog1001 and moved it to /srv/user.log.manual-rotation
  • away: updated fundraising CiviCRM from f7954c6659 to 74d795408f
  • 08:15 vgutierrez: restart acme-chief on acmechief1001
  • 01:30 TimStarling: on mwmaint1002 running fixT260485.php unmerged fixup script from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMaintenance/+/640348

2020-11-12

  • 19:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0f0f839: Enable "Cite" button in toolbar for enwiktionary (T267504) (duration: 00m 58s)
  • 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3ce18e6: Add artsdatabanken.no to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T267784) (duration: 01m 00s)
  • 16:12 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux at mwmaint1002 (wiki=jawiki; T246539)
  • 16:11 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; T246539)
  • 13:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=cswiki; T246539)
  • 11:40 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 11:35 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 11:30 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 11:12 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 11:08 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 11:02 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 09:19 hashar@deploy1001: Synchronized php-1.36.0-wmf.16/includes/filerepo: Revert "filerepo: clean up shared cache keys to avoid key metrics clutter" - T267668 (duration: 01m 01s)
  • 09:12 hashar: Pulled https://gerrit.wikimedia.org/r/640746 on deployment server for # T267668
  • 03:46 ejegg: updated python fundraising tools from 7853f426ee to 68e054c9ad

2020-11-11

  • 16:44 XioNoX: Revert "temporarily route Italy to codfw"
  • 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:30 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 15:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:52 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 14:29 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
  • 13:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=cp3054.esams.wmnet
  • 12:25 Lucas_WMDE: EU backport&config window done
  • 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Remove propagateChangeVisibility repo setting (duration: 00m 58s)
  • 12:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable propagatePageDeletion on Wikidata (duration: 00m 59s)
  • 12:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/DiscussionTools/includes/CommentParser.php: Backport: Fix getHeadlineNodeAndOffset() returning text nodes (T267284) (duration: 01m 01s)
  • 10:34 XioNoX: delete unused interfaces from asw-d-codfw
  • 09:53 XioNoX: prioritized DE-CIX IXP - T262681
  • 02:18 ryankemper: (WDQS deploy completed)
  • 00:48 ryankemper: Restarting `wdqs-categories` one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 00:47 ryankemper: Restarted `wdqs-categories` across wdqs test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 00:47 ryankemper: Restarted `wdqs-updater` simultaneously across all wdqs hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 00:47 ryankemper: [wdqs deploy] following deploy, example query succeeds on `query.wikidata.org`, proceeding to post deploy steps
  • 00:46 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@03219df]: 0.3.55 (duration: 11m 24s)
  • 00:46 ryankemper: T222669 [Elasticsearch reindex] Began long-running reindex of cirrus elasticsearch for `codfw`, `eqiad`, and `cloudelastic`. 3 tmux sessions on `ryankemper@mwmaint1002`: `reindex_eqiad`, `reindex_codfw`, `reindex_cloudelastic`
  • 00:38 ryankemper: Following deploy to canary `wdqs1003`, automated tests are passing as is a manual test of an example query. Proceeding...
  • 00:34 ryankemper@deploy1001: Started deploy [wdqs/wdqs@03219df]: 0.3.55
  • 00:32 ryankemper: About to begin wdqs deploy; before-deploy tests on canary `wdqs1003` are passing
  • 00:09 eileen: civicrm revision changed from d0cd7f6dbb to e5d12cc46c, config revision is e2d133eff4

2020-11-10

  • 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:08 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:05 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 21:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:58 jgleeson: update civicrm revision changed from c36a5cc1b1 to d0cd7f6dbb
  • 21:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:47 ebernhardson: unban elastic1050 from eqiad search psi cluster
  • 21:28 cstone: civicrm revision changed from b1342c4129 to c36a5cc1b1
  • 21:24 brennen@deploy1001: sync-file aborted: Testing: README.md sync-file with ssh -n for T223287 (duration: 00m 37s)
  • 21:23 brennen: testing some scap operations, modified to use ssh -n for debugging T223287
  • 21:11 ebernhardson: ban elastic1050 from eqiad psi cluster due to excessive load
  • 21:02 brennen@deploy1001: Finished scap: Backport: language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614) and Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587) (duration: 34m 46s)
  • 20:27 brennen@deploy1001: Started scap: Backport: language: Honor $wgTranslateNumerals, even if PHP does digit translation(T267614) and Downgrade the severity of the non-numeric argument to formatNum warnings (T267370, T267587)
  • 20:10 brennen@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Turn on formatnum logging (T267587, T267370) (duration: 01m 02s)
  • 19:06 hknust: holger mwmaint1002 Stop T219279
  • 18:31 hknust: holger mwmaint1002 Start T219279
  • 17:57 effie: pool mw1263 mw1264
  • 17:31 effie: briefly depool mw1263 and mw1264
  • 17:30 jynus: about to shutdown db1139 for hw maintenance T261405
  • 17:13 dwisehaupt: upping thank you mail flow through frmx's to 30% of the total runs
  • 16:32 XioNoX: add cloud-storage1-b-codfw to, well, codfw switches - T267378
  • 16:20 effie: pool mw1263
  • 16:17 hashar: Restarting Gerrit on gerrit1001
  • 16:12 hashar: Restarted Gerrit on gerrit2001 for config change
  • 15:53 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9 (duration: 01m 06s)
  • 15:52 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@1ab89ed]: Deploying venv workaround for Debian 9
  • 15:38 moritzm: installing 4.19.152 kernel packages on buster hosts (only installing the package, reboots will happen separately)
  • 15:28 effie: depool mw1263 - T244340
  • 15:09 ejegg: updated fundraising python tools from 087a596d3a to 7853f426ee
  • 14:21 effie: pooling mw1276 - T244340
  • 13:51 moritzm: imported php-memcached 3.0.1+2.2.0-1~wmf3+buster1 to component/php72 for buster-wikimedia
  • 13:29 marostegui: Restart db2093 to pick up report_host - T266483
  • 13:17 marostegui: Restart db1117* to pick up report_host - T266483
  • 12:46 effie: depool mw1276 to install onhost memcached - T244340
  • 12:33 Lucas_WMDE: EU backport&config window done
  • 12:33 moritzm: installing wireshark security updates
  • 12:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Switch parser cache to using "mcrouter-with-onhost-tier" (T264604) (duration: 00m 57s)
  • 12:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/mc.php: Config: Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches (T264604) (duration: 00m 57s)
  • 12:04 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/Wikibase: Backport: Revert JS parser commits (T266671) (duration: 01m 04s)
  • 08:59 hashar: Restarted Gerrit for plugins deployment
  • 08:06 hashar: Restarting Gerrit on gerrit2001 / gerrit-replica
  • 08:04 hashar@deploy1001: Finished deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - T184086 (duration: 00m 10s)
  • 08:04 hashar@deploy1001: Started deploy [gerrit/gerrit@5a41181]: jmx and prometheus metrics reporters - T184086
  • 07:40 elukey: import hue_4.8.0-2 to buster-wikimedia
  • 06:53 marostegui: Restart dbstore* to pick up report_host - T266483
  • 06:44 marostegui: Restart pc1010 to pick up report_host - T266483

2020-11-09

  • 22:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:14 mbsantos@deploy1001: Finished deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs (T222377) (duration: 02m 23s)
  • 21:11 mbsantos@deploy1001: Started deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs (T222377)
  • 20:53 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=maps2002.*
  • 20:36 cdanis: depool maps2002
  • 20:26 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 01m 09s)
  • 20:25 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)
  • 20:24 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 11m 36s)
  • 20:13 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)
  • 20:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.16
  • 20:04 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:01 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:58 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 18:32 mepps: updated payments-wiki from 388490e86d to 8612ed1002, config revision is 987e839869
  • 17:53 XioNoX: re-order asw-d-codfw interfaces-ranges
  • 17:51 XioNoX: standardize asw-d-codfw interfaces descriptions
  • 17:33 effie: updating mwdebug2002 to ICU 63 - T264991
  • 17:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 05s)
  • 16:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
  • 16:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:45 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:40 moritzm: imported 2.0.2+0.5.7-1~wmf3+php72+buster1 to component/php72 for buster-wikimedia
  • 16:34 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=trwiki; T246539)
  • 16:34 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; T246539)
  • 16:20 XioNoX: Netbox prod: mass import from PuppetDB (cables, etc) - T262899
  • 16:04 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:55 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:12 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 62c2e02: abusefilter.php: Enable wgAbuseFilterNotificationsPrivate by default for WMF wikis (T266298) (duration: 01m 07s)
  • 14:34 hashar: Restarting Gerrit
  • 14:07 hashar@deploy1001: Finished deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # T232678 (duration: 00m 18s)
  • 14:07 hashar@deploy1001: Started deploy [gerrit/gerrit@0a803e2]: Upgrade javamelody to 1.86.0 # T232678
  • 14:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:03 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:03 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=kowiki; T246539)
  • 14:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:59 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:40 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 12:13 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwikinews --fix --add-prefix=BROKEN # T266925
  • 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 11b8f62: Add wgNamespaceAliases for zhwikinews (T266925) (duration: 01m 06s)
  • 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 87b3eed: Enable DiscussionTools as a beta feature on fiwiki (T265446) (duration: 01m 06s)
  • 11:58 moritzm: installing remaining openldap updates on stretch
  • 11:57 jynus: restart dbstore1004 mariadb instances
  • 10:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 10:36 XioNoX: add 185.15.56.240/29 IPs to relevant cloudsw interfaces - T265288
  • 10:35 effie: merging 638109 and roll restart ms-fe* hosts to pick up the change
  • 10:11 XioNoX: renumber cloud-xlink1-eqiad
  • 09:56 Urbanecm: Purge https://vote.wikimedia.org/wiki/Main_Page (T262689)
  • 09:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session at mwmaint1002 (wiki=svwiki; T246539)
  • 09:52 hashar: Restarting Gerrit on gerrit1001 and gerrit2001 in order to have the JVM to exit after OutOfMemory # T267517
  • 09:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b0a81f: Revert "Change votewiki language temporarily to fa for fawiki elections" (T262689) (duration: 01m 08s)
  • 09:37 moritzm: installing libexif security updates
  • 09:06 godog: enable thanos query-frontend on thanos-fe hosts - T261281
  • 08:24 XioNoX: configure traceoptions on pfw3-eqiad - T263833
  • 08:11 hashar: Restarting Gerrit on gerrit1001 and gerrit2001
  • 07:58 hashar: Restarted CI Jenkins on contint2001 for Java upgrade
  • 07:17 elukey: restart gerrit on gerrit2001 (OOM registered for two days ago, uptime from systemctl since a month ago, probably in a weird state)
  • 01:35 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/tests/phpunit/maintenance/categoryChangesAsRdfTest.php: this was cherry-picked to make CI pass, pushing it out just for a clean staging dir (duration: 01m 06s)
  • 01:32 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.api/upload.js: fixing UBN T266903 (duration: 01m 06s)
  • 01:30 tstarling@deploy1001: Synchronized php-1.36.0-wmf.14/resources/src/mediawiki.Upload.js: fixing UBN T266903 (duration: 01m 07s)
  • 01:29 tstarling@deploy1001: sync-file aborted: fixing UBN T266903 (duration: 00m 01s)

2020-11-08

  • 23:08 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.api/upload.js: fixing UBN T266903 (duration: 01m 06s)
  • 23:06 tstarling@deploy1001: Synchronized php-1.36.0-wmf.16/resources/src/mediawiki.Upload.js: fixing UBN T266903 (duration: 01m 35s)
  • 20:34 cdanis: repool esams
  • 19:48 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:48 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:16 cdanis: depool esams
  • 18:35 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:35 cdanis@cumin1001: START - Cookbook sre.network.cf

2020-11-06

  • 23:38 dwisehaupt: frdata1001 upgraded to buster
  • 22:40 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling (duration: 01m 08s)
  • 22:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling
  • 22:29 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling (duration: 00m 26s)
  • 22:29 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling
  • 20:57 reedy@deploy1001: Synchronized php-1.36.0-wmf.16/skins/CologneBlue/: T267278 (duration: 01m 05s)
  • 20:56 reedy@deploy1001: Synchronized php-1.36.0-wmf.14/skins/CologneBlue/: T267278 (duration: 01m 10s)
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:54 cwhite@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
  • 17:02 dwisehaupt: rolled out new thank_you_mail_send process_control scripts to utilize frmx hosts
  • 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2005.codfw.wmnet
  • 14:46 moritzm: installing wireshark security updates
  • 14:36 hnowlan: resyncing database on maps1001
  • 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:05 hnowlan: started cassandra bootstrap of maps2005
  • 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:30 hnowlan: joining maps2005 to cassandra cluster
  • 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:09 moritzm: uploaded openjdk-8 8u272-b10-1~deb10u1 to buster-wikimedia/component/jdk
  • 10:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:06 dcausse: restarted elastic on elastic1063 (T265113)
  • 09:57 moritzm: installing spice security updates
  • 09:32 moritzm: installing libsndfile security updates
  • 09:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 moritzm: installing openldap security updates on stretch/buster (client-side tools/libs only, slapd updates already deployed)
  • 04:38 ryankemper: [Deploy finished] WDQS deploy is complete; the service is healthy per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1604633917530&to=1604637475930
  • 04:36 ryankemper: Finished restarting wdqs categories one host at a time across all wdqs production instances
  • 04:02 ryankemper: Restarting wdqs categories one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` (in progress)
  • 04:01 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:01 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:00 ryankemper: `query.wikidata.org` looks good following deploy, proceeding to post-deploy steps
  • 03:59 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@27a5c54]: 0.3.54 (duration: 11m 22s)
  • 03:51 ryankemper: Tests passing on canary `wdqs1003` following initial deployment, proceeding with deploy to rest of fleet
  • 03:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@27a5c54]: 0.3.54
  • 03:48 ryankemper: About to begin wdqs deploy, tests passing on canary `wdqs1003`
  • 00:53 brennen@deploy1001: Finished scap: Synchronizing to pick up i18n for gerrit:639505. Will resume moving train to group1 on Monday morning (US) (T263182) (duration: 69m 02s)

2020-11-05

  • 23:44 brennen@deploy1001: Started scap: Synchronizing to pick up i18n for gerrit:639505. Will resume moving train to group1 on Monday morning (US) (T263182)
  • 23:38 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/includes/media/FormatMetadata.php: Backport: media: Support GPSAltitudeRef exif tag - FormatMetData.php (T267370) (duration: 07m 22s)
  • 23:29 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages/i18n/exif: Backport: media: Support GPSAltitudeRef exif tag - i18n/exif files (T267370) (duration: 01m 08s)
  • 23:09 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/vendor: Backport: Bump wikimedia/parsoid to 0.13.0-a16 (T267146) (duration: 01m 14s)
  • 20:54 hnowlan: reenabled tilerator in eqiad
  • 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.14
  • 20:44 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 39s)
  • 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
  • 20:39 hnowlan: finished removenode of maps2002 cassandra
  • 20:22 brennen: train: waiting ~15 minutes before rolling forward to group1.
  • 20:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
  • 20:15 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/CentralAuth/includes/specials/SpecialCentralAuth.php: Backport: Dont double-format numeric edit count (T267362) (duration: 01m 06s)
  • 19:44 Urbanecm: Morning B&C window done
  • 19:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/modules/homepage/: 81cb1c7: Suggested edits: Export task count from start editing dialog (T266868; T263040) (duration: 01m 07s)
  • 19:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 453b9c6: Fix DiscussionTools wikis config for thwiki/tgwiki (T266303) (duration: 01m 08s)
  • 18:32 razzi: shutting down kafka-jumbo1005 to allow dcops to upgrade NIC
  • 17:52 akosiaris: restart uwsgi-ores in all ores1* nodes per complaint on IRC that max redis clients have been reached T263910
  • 17:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.14
  • 17:48 razzi: shutting down kafka-jumbo1004 to allow dcops to upgrade NIC
  • 17:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
  • 17:41 brennen: train is currently unblocked; rolling to group0 (T263182)
  • 17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:26 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages: Backport: language: Clean up $separatorTransformTable in km/la/my (T267091) (duration: 01m 12s)
  • 17:21 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/resources/Resources.php: Backport: mediawiki.action.edit.preview: Add versionCallback to improve startup perf (T266311) (duration: 01m 10s)
  • 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2002.codfw.wmnet
  • 17:14 hnowlan: rebuilding cassandra on maps2002
  • 17:14 jayme: imported kubernetes 1.16.15 to component/kubernetes-future stretch-wikimedia
  • 17:05 hnowlan: restarting maps2004 postgres for config change
  • 17:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 razzi: shutting down kafka-jumbo1003 to allow dcops to upgrade NIC
  • 16:26 razzi: shutting down kafka-jumbo1002 to allow dcops to upgrade NIC
  • 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 15:50 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 15:41 moritzm: installing junit4 security updates
  • 14:55 elukey: shutdown kafka-jumbo1001 to swap NICs (1g -> 10g)
  • 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 jbond42: enable puppet fleet wide to post restart puppetdb
  • 14:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 jbond42: disable puppet fleet wide to restart puppetdb
  • 13:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:52 jbond42: upgrade freetype on jessie
  • 12:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:34 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:09 marostegui: Upgrade mysql on pc2010
  • 11:58 jynus: shutting down db1139 in preparation of maintenance T261405
  • 11:55 marostegui: Upgrade mysql on db1077
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1012 to es1 master, es1011 to es2 master, es1014 to es3 (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13230 and previous config saved to /var/cache/conftool/dbconfig/20201105-114223-marostegui.json
  • 11:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:05 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=dewiki; T246539)
  • 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:55 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:16 godog: grafana-rw.wikimedia.org active and sso-enabled - T262512
  • 09:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13227 and previous config saved to /var/cache/conftool/dbconfig/20201105-094356-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13226 and previous config saved to /var/cache/conftool/dbconfig/20201105-094348-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13225 and previous config saved to /var/cache/conftool/dbconfig/20201105-094336-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13224 and previous config saved to /var/cache/conftool/dbconfig/20201105-092853-root.json
  • 09:28 moritzm: enabling CAS on grafana1002, editing dashboards will be interrupted for a bit
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13223 and previous config saved to /var/cache/conftool/dbconfig/20201105-092845-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13222 and previous config saved to /var/cache/conftool/dbconfig/20201105-092833-root.json
  • 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13219 and previous config saved to /var/cache/conftool/dbconfig/20201105-091350-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13218 and previous config saved to /var/cache/conftool/dbconfig/20201105-091341-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13217 and previous config saved to /var/cache/conftool/dbconfig/20201105-091329-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13216 and previous config saved to /var/cache/conftool/dbconfig/20201105-085846-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13215 and previous config saved to /var/cache/conftool/dbconfig/20201105-085838-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13214 and previous config saved to /var/cache/conftool/dbconfig/20201105-085826-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Slowly pool es1031 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13213 and previous config saved to /var/cache/conftool/dbconfig/20201105-084343-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Slowly pool es1030 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13212 and previous config saved to /var/cache/conftool/dbconfig/20201105-084334-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Slowly pool es1029 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13211 and previous config saved to /var/cache/conftool/dbconfig/20201105-084323-root.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13210 and previous config saved to /var/cache/conftool/dbconfig/20201105-084250-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13209 and previous config saved to /var/cache/conftool/dbconfig/20201105-083304-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13208 and previous config saved to /var/cache/conftool/dbconfig/20201105-083142-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13207 and previous config saved to /var/cache/conftool/dbconfig/20201105-081638-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13206 and previous config saved to /var/cache/conftool/dbconfig/20201105-080135-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1031 on es3 with minimium weight after being cloned from es1017 T261717', diff saved to https://phabricator.wikimedia.org/P13205 and previous config saved to /var/cache/conftool/dbconfig/20201105-075625-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1030 on es2 with minimium weight after being cloned from es1013 T261717', diff saved to https://phabricator.wikimedia.org/P13204 and previous config saved to /var/cache/conftool/dbconfig/20201105-075507-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1029 on es1 with minimium weight after being cloned from es1016 T261717', diff saved to https://phabricator.wikimedia.org/P13203 and previous config saved to /var/cache/conftool/dbconfig/20201105-075358-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13202 and previous config saved to /var/cache/conftool/dbconfig/20201105-074631-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 T267216', diff saved to https://phabricator.wikimedia.org/P13201 and previous config saved to /var/cache/conftool/dbconfig/20201105-072352-marostegui.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 100%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13200 and previous config saved to /var/cache/conftool/dbconfig/20201105-071017-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 100%: After cloning es1030 T261717', diff saved to https://phabricator.wikimedia.org/P13199 and previous config saved to /var/cache/conftool/dbconfig/20201105-070616-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 100%: After cloning es1031 T261717', diff saved to https://phabricator.wikimedia.org/P13198 and previous config saved to /var/cache/conftool/dbconfig/20201105-070610-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 75%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13197 and previous config saved to /var/cache/conftool/dbconfig/20201105-065514-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 75%: After cloning es1030 T261717', diff saved to https://phabricator.wikimedia.org/P13196 and previous config saved to /var/cache/conftool/dbconfig/20201105-065113-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 75%: After cloning es1031 T261717', diff saved to https://phabricator.wikimedia.org/P13195 and previous config saved to /var/cache/conftool/dbconfig/20201105-065107-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 50%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13193 and previous config saved to /var/cache/conftool/dbconfig/20201105-064010-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 50%: After cloning es1030 T261717', diff saved to https://phabricator.wikimedia.org/P13192 and previous config saved to /var/cache/conftool/dbconfig/20201105-063610-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 50%: After cloning es1031 T261717', diff saved to https://phabricator.wikimedia.org/P13191 and previous config saved to /var/cache/conftool/dbconfig/20201105-063603-root.json
  • 06:34 elukey: truncate application_1601916545561_129457's taskmanager.log (~600G) on an-worker1113 due to partition 'e' full
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 25%: After cloning es1029 T261717', diff saved to https://phabricator.wikimedia.org/P13190 and previous config saved to /var/cache/conftool/dbconfig/20201105-062507-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 25%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13189 and previous config saved to /var/cache/conftool/dbconfig/20201105-062454-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 25%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13188 and previous config saved to /var/cache/conftool/dbconfig/20201105-062446-root.json
  • 01:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407] (duration: 00m 08s)
  • 01:56 milimetric@deploy1001: Started deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407]
  • 01:56 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407] (duration: 08m 34s)
  • 01:47 milimetric@deploy1001: Started deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407]

2020-11-04

  • 20:36 Urbanecm: Late B&C Morning window completed, deployment host is clear
  • 20:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee0ba54: Disable the search in header A/B test (T265333) (duration: 01m 06s)
  • 20:33 ejegg: updated payments-wiki from 1ad4ba9639 to 388490e86d
  • 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NewcomerTask event stream to EventGate on testwiki - T259163 (duration: 01m 07s)
  • 20:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 82579bf: Enable wgImagePreconnect on remaining wikis (T123582) (duration: 01m 06s)
  • 20:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d2a5772: Enable DiscussionTools as a beta feature on almost all Wikipedias (T266303) (duration: 01m 07s)
  • 20:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fb5c032: Enable wgCheckUserLogLogins at all wikis but loginwiki (T253802) (duration: 01m 08s)
  • 19:59 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.16 (duration: 62m 44s)
  • 18:57 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.16
  • 18:52 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.10 (duration: 27m 38s)
  • 18:51 Urbanecm: Strip 2FA for Mark83 at SUL (T267257)
  • 18:20 elukey: restart memcached on mc1036 to pick up new settings (see https://gerrit.wikimedia.org/r/639099)
  • 18:15 hknust: holger@mwmaint1002 END - Run updateRestrictions.php
  • 17:44 hknust: holger@mwmaint1002 START - Run updateRestrictions.php
  • 17:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch (duration: 01m 15s)
  • 17:13 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch
  • 17:07 effie: Reimage mc1036 for real this time
  • 16:40 brennen: 1.36.0-wmf.16 was branched at f51ccd2 for T263182
  • 16:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:10 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:39 effie: Reimage mc1036 to buster - T252391
  • 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on all wikis - T259163 (duration: 00m 58s)
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on testwiki - T259163 (duration: 00m 59s)
  • 14:37 jynus: restart mysql at db1133 T266483
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:17 elukey: upload hue 4.8.0-1+deb10u1 to buster-wikimedia
  • 14:15 jynus: restart mysqls at db209[789],db210[01], db2139, db2141 T266483
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:59 jynus: restart mysqls at db1150 T266483
  • 13:54 jynus: restart mysqls at db1145 T266483
  • 13:51 jynus: restart mysqls at db1140 T266483
  • 13:47 jynus: restart mysqls at db1139 T266483
  • 13:43 jynus: restart mysqls at db1116 T266483
  • 13:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jynus: restart mysqls at db1102 T266483
  • 13:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:35 jynus: restart mysqls at db1095 T266483
  • 13:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:50 Lucas_WMDE: EU backport&config done
  • 12:11 Urbanecm: Run scap pull at snapshot1010 manually
  • 12:09 Urbanecm: scap-sync file returned `snapshot1010.eqiad.wmnet returned [255]: Host key verification failed.`
  • 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ed3c43d: Add www.irishstatutebook.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T267193) (duration: 01m 02s)
  • 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:23 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P13185 and previous config saved to /var/cache/conftool/dbconfig/20201104-102341-kormat.json
  • 10:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=fiwiki; T246539)
  • 10:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P13184 and previous config saved to /var/cache/conftool/dbconfig/20201104-101729-kormat.json
  • 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:08 _joe_: restarting envoyproxy on all of restbase codfw, sending the command in parallel via cumin, to test poolcounter usage by the safe restart scripts
  • 10:05 _joe_: restarting envoyproxy on restbase20{09,10} to test poolcounter usage by the safe restart scripts
  • 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 09:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 09:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:44 moritzm: uploaded freetype 2.5.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13182 and previous config saved to /var/cache/conftool/dbconfig/20201104-080033-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13181 and previous config saved to /var/cache/conftool/dbconfig/20201104-080024-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13180 and previous config saved to /var/cache/conftool/dbconfig/20201104-075953-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13179 and previous config saved to /var/cache/conftool/dbconfig/20201104-074530-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13178 and previous config saved to /var/cache/conftool/dbconfig/20201104-074520-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13177 and previous config saved to /var/cache/conftool/dbconfig/20201104-074449-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13176 and previous config saved to /var/cache/conftool/dbconfig/20201104-073026-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13175 and previous config saved to /var/cache/conftool/dbconfig/20201104-073017-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13174 and previous config saved to /var/cache/conftool/dbconfig/20201104-072946-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13173 and previous config saved to /var/cache/conftool/dbconfig/20201104-071523-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13172 and previous config saved to /var/cache/conftool/dbconfig/20201104-071513-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13171 and previous config saved to /var/cache/conftool/dbconfig/20201104-071443-root.json
  • 07:09 elukey: manual cleanup of mcelog and its wmf-auto-restart (failing) on mw1381 (kernel 4.19, doesn't support mcelog)
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 es1013 es1017 T261717', diff saved to https://phabricator.wikimedia.org/P13170 and previous config saved to /var/cache/conftool/dbconfig/20201104-070121-marostegui.json
  • 07:00 marostegui: Stop mysql on es1016, es1013, es1017 to clone es1029, es1030, es1031 T261717
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Slowly pool es1028 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13169 and previous config saved to /var/cache/conftool/dbconfig/20201104-070020-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Slowly pool es1027 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13168 and previous config saved to /var/cache/conftool/dbconfig/20201104-070010-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: Slowly pool es1026 after being recloned T261717', diff saved to https://phabricator.wikimedia.org/P13167 and previous config saved to /var/cache/conftool/dbconfig/20201104-065939-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 100%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13166 and previous config saved to /var/cache/conftool/dbconfig/20201104-065926-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 100%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13165 and previous config saved to /var/cache/conftool/dbconfig/20201104-065905-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 100%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13164 and previous config saved to /var/cache/conftool/dbconfig/20201104-065849-root.json
  • 06:52 elukey: force start of rasdaemon.service on dumpsdata1002 (its auto-restart unit was failing for it)
  • 06:47 elukey: set an-presto1004's netbox status as "active" (was: failed) after hw maintenance - T253438
  • 06:44 elukey: force restart of uwsgi-ores on ores1005 - daemon down after reload, max client reached error messages in the logs
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 75%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13163 and previous config saved to /var/cache/conftool/dbconfig/20201104-064422-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 75%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13162 and previous config saved to /var/cache/conftool/dbconfig/20201104-064402-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 75%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13161 and previous config saved to /var/cache/conftool/dbconfig/20201104-064345-root.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1028 with minimum weight after recloning T261717', diff saved to https://phabricator.wikimedia.org/P13160 and previous config saved to /var/cache/conftool/dbconfig/20201104-063028-marostegui.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 50%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13159 and previous config saved to /var/cache/conftool/dbconfig/20201104-062919-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 50%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13158 and previous config saved to /var/cache/conftool/dbconfig/20201104-062858-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 50%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13157 and previous config saved to /var/cache/conftool/dbconfig/20201104-062842-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1027 with minimum weight after recloning T261717', diff saved to https://phabricator.wikimedia.org/P13156 and previous config saved to /var/cache/conftool/dbconfig/20201104-061829-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1026 with minimum weight after recloning T261717', diff saved to https://phabricator.wikimedia.org/P13155 and previous config saved to /var/cache/conftool/dbconfig/20201104-061549-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 25%: After cloning es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13154 and previous config saved to /var/cache/conftool/dbconfig/20201104-061416-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 25%: After cloning es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13153 and previous config saved to /var/cache/conftool/dbconfig/20201104-061355-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 25%: After cloning es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13152 and previous config saved to /var/cache/conftool/dbconfig/20201104-061339-root.json

2020-11-03

  • 22:56 _joe_: repooling mw1346
  • 22:55 _joe_: depooling mw1346
  • 22:49 cdanis: mw1342 restart-php7.2-fpm
  • 22:37 cdanis: repool mw1278 and mw1279
  • 22:35 cdanis: ✔️ cdanis@mw1290.eqiad.wmnet ~ 🕠🍺 sudo restart-php7.2-fpm
  • 22:34 cdanis: restart-php7.2-fpm and pool on mw1276
  • 22:31 cdanis: depool mw1276 and mw1279 also
  • 22:25 cdanis: ✔️ cdanis@mw1278.eqiad.wmnet ~ 🕠🍺 sudo depool
  • 21:16 hashar: Gerrit: triggering java garbage collection # T263008
  • 19:32 gehel: restarting blazegraph on wdqs1007 to reset ban list
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:45 cmjohnson1: shutting elastic1063 down to reseat DIMM T265113
  • 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:13 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:13 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 16:04 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:59 elukey: shutdown kafka-jumbo1006 to replace 1G with 10G nic
  • 15:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:08 moritzm: imported php-redis/xdebug to component/php72 for buster-wikimedia
  • 14:37 moritzm: imported php-apcu-bc/php-igbinary/tideways-xhprof to component/php72 for buster-wikimedia
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:53 moritzm: imported php-mongodb/php-wmerrors/wikidiff2 to component/php72 for buster-wikimedia
  • 13:43 sobanski: Removing db1091 from tendril and zarcillo T267088
  • 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:33 lsobanski@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:24 lsobanski@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 moritzm: imported php-apcu/php-geoip/php-imagick/php-mailparse to component/php72 for buster-wikimedia
  • 11:57 moritzm: running "reprepro clearvanished" to prune thirdparty/orchestrator
  • 11:51 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: T266985 (duration: 00m 03s)
  • 11:51 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: T266985
  • 11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 11:23 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 11:23 hnowlan: resyncing postgres replica maps1001
  • 11:03 Amir1: rolling restart of ores
  • 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:45 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: T266985 (duration: 00m 07s)
  • 10:45 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: T266985
  • 10:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:22 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: T266985 (duration: 00m 26s)
  • 10:21 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: T266985
  • 10:16 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 02m 15s)
  • 10:14 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
  • 10:13 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 01m 45s)
  • 10:11 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
  • 10:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:57 kormat: uploaded orchestrator 3.2.3-2 to apt
  • 09:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P13139 and previous config saved to /var/cache/conftool/dbconfig/20201103-090523-kormat.json
  • 09:00 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P13138 and previous config saved to /var/cache/conftool/dbconfig/20201103-090013-kormat.json
  • 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 godog: Prometheus re-enable compactions - T261281
  • 06:59 marostegui: Remove db1091 from tendril and zarcillo T267088
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1091 from dbctl T267088', diff saved to https://phabricator.wikimedia.org/P13137 and previous config saved to /var/cache/conftool/dbconfig/20201103-065756-marostegui.json
  • 06:46 marostegui: Deploy schema change on s1 codfw master: T265349
  • 06:16 marostegui: Stop MySQL on es1014 to clone es1028 T261717
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 to reclone es1028 T261717', diff saved to https://phabricator.wikimedia.org/P13136 and previous config saved to /var/cache/conftool/dbconfig/20201103-061423-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1019 to es3 master (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13135 and previous config saved to /var/cache/conftool/dbconfig/20201103-061403-marostegui.json
  • 06:11 marostegui: Stop MySQL on es1012 to clone es1027 T261717
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 to reclone es1027 T261717', diff saved to https://phabricator.wikimedia.org/P13134 and previous config saved to /var/cache/conftool/dbconfig/20201103-060727-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1018 to es1 master (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13133 and previous config saved to /var/cache/conftool/dbconfig/20201103-060705-marostegui.json
  • 06:04 marostegui: Stop MySQL on es1011 to clone es1026 T261717
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 to reclone es1026 T261717', diff saved to https://phabricator.wikimedia.org/P13132 and previous config saved to /var/cache/conftool/dbconfig/20201103-060054-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1015 to es2 master (this is a noop) T261717', diff saved to https://phabricator.wikimedia.org/P13131 and previous config saved to /var/cache/conftool/dbconfig/20201103-060038-marostegui.json
  • 04:39 cstone: civicrm revision changed from cd13d9e30f to b1342c4129
  • 02:13 shdubsh: restart ES on logstash1009 - oom killed
  • 01:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:59 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:40 robh@cumin1001: START - Cookbook sre.hosts.downtime

2020-11-02

  • 22:19 twentyafterfour: restart php7.3-fpm on phab1001
  • 22:03 twentyafterfour: applied 113a244a66 on phab1001 to hotfix T240862
  • 20:22 eileen: process-control config revision is 313a36312f re-enable thank you
  • 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:47 eileen: civicrm revision changed from 3317d30356 to cd13d9e30f, config revision is db912e3bba
  • 19:45 eileen: process-control config revision is db912e3bba - thankyou job off for testing
  • 19:07 Urbanecm: Deployed security fix for T205908
  • 19:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:59 andrewbogott: added dcaro to ops and wmf ldap groups
  • 18:59 mutante: decom'ing testvm1001
  • 18:58 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:14 XioNoX: push new pfw policies - T267051
  • 16:39 ejegg: updated payments-wiki from adc3369cb3 to 1ad4ba9639
  • 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:36 moritzm: imported php-excimer/php-luasandbox to component/php72 for buster-wikimedia
  • 14:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:34 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
  • 14:17 kormat: uploaded orchestrator 3.2.3-1 to apt
  • 14:01 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove $wgExtDistListFile, unused - T266024 (duration: 00m 58s)
  • 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 13:40 elukey: roll restart zookeeper ok an-conf* to pick up new openjdk upgrades
  • 13:40 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 13:03 Lucas_WMDE: EU backport&config window done
  • 13:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/Wikibase: Backport: Revert JS parser commits (T266671) (duration: 01m 09s)
  • 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Response namespace at otrs_wikiwiki to namespaces searched by default (T266917) (duration: 00m 58s)
  • 12:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon, 2/2 (Beta) (duration: 00m 57s)
  • 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon, 1/2 (production) (duration: 01m 02s)
  • 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon (duration: 00m 58s)
  • 12:15 volans: upgraded python3-wmflib to 0.0.4 on cumin[12]001
  • 12:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Fix array depth for properties array (T266835), Beta part (prod no-op) (duration: 00m 58s)
  • 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Fix array depth for properties array (T266835) (duration: 00m 59s)
  • 12:02 volans: uploaded python3-wmflib_0.0.4 to apt.wikimedia.org buster-wikimedia
  • 11:51 effie: disable puppet on thumbor1001 and thumbor1002 to test 636024
  • 11:51 effie: disable thumbor on thumbor1001 and thumbor1002 to test 636024
  • 11:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 11:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:06 godog: upgrade thanos to 0.16.0 on prometheus hosts - T261281
  • 10:59 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:50 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 10:23 moritzm: installing openldap security updates on corp LDAP replicas
  • 08:46 XioNoX: add uRPF strict to ulsfo office links - T266561
  • 08:41 moritzm: installing openldap security updates on LDAP replicas
  • 08:40 godog: upgrade thanos to 0.16 in codfw/eqiad - T261281
  • 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf

2020-11-01

  • 22:41 Urbanecm: mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=metawiki Turkmen # T266976
  • 09:52 ariel@deploy1001: Finished deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run (duration: 00m 04s)
  • 09:52 ariel@deploy1001: Started deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run
  • 09:16 ariel@deploy1001: Finished deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed (duration: 00m 04s)
  • 09:16 ariel@deploy1001: Started deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed
  • 01:26 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:26 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:16 rzl@cumin1001: dbctl commit (dc=all): 'Depool db1091', diff saved to https://phabricator.wikimedia.org/P13124 and previous config saved to /var/cache/conftool/dbconfig/20201101-011600-rzl.json

2020-10-31

  • 00:12 mutante: removed Nuria from wmf group, she is already in nda group (T266086)

2020-10-30

  • 23:35 foks: removing two files for legal compliance
  • 23:32 mutante: adding query.wikidata.org to TLS cert for webserver-misc-apps.discovery.wmnet T266702
  • 23:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:02 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:00 jiji@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:59 mutante: mw1267,mw1268 - scap pull and repool - back to prod - T266164
  • 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
  • 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
  • 20:56 mutante: mw1267,mw1268 - scap pull
  • 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:06 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:04 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:48 cdanis: the above scap began (and mostly finished) several minutes ago but is hanging on a couple hosts down for maintenance
  • 18:48 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: lower frwiki featured feeds limit 1a41ef634 T266865 (duration: 05m 14s)
  • 18:48 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕝☕ scap sync-file wmf-config/InitialiseSettings.php 'lower frwiki featured feeds limit 1a41ef634 T266865'
  • 18:27 hashar@deploy1001: Finished deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index (duration: 00m 06s)
  • 18:27 hashar@deploy1001: Started deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index
  • 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:19 effie: disable puppet on mc1036 and mc2036 - T252391
  • 17:18 effie: enable puppet on all mediawiki and mc* hosts
  • 16:19 elukey: kafka-jumbo1006 still running with 1g nick
  • 15:36 effie: stopping puppet on mediawiki and mc* hosts
  • 15:11 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 rzl@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:09 rzl: downtiming mc2036 for buster reimage
  • 14:42 elukey: stop kafka-jumbo1006 to swap NICs (1g -> 10g, d1 -> d4 rack)
  • 14:14 cmjohnson1: moving mw1267 and mw168 to rack A8 eqiad T266164
  • 12:29 XioNoX: set normal VRRP balancing on cr2-eqiad
  • 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 ladsgroup@deploy1001: Synchronized static/images/project-logos: Revert: Changing logo of Wikidata for the brithday (duration: 01m 12s)
  • 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:07 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 08:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:54 elukey: decom an-tool1006 (old analytics test vm) - T255139
  • 08:53 elukey@cumin1001: START - Cookbook sre.hosts.decommission

2020-10-29

  • 23:59 eileen: process-control config revision is 6891d35bce
  • 23:39 Urbanecm: Evening B&C window done
  • 23:38 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikiquote --add-prefix=BROKEN --fix # T266605 # P13112
  • 23:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ddb7e08: Add namespace aliases to Turkish Wikiquote (T266605) (duration: 00m 57s)
  • 23:36 eileen: process-control config revision is 1114512f90
  • 23:29 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikisource --add-prefix=BROKEN --fix # T266606 # P13111
  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c3a8555: Add namespace aliases to Turkish Wikisource (T266606) (duration: 00m 56s)
  • 23:23 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikibooks --fix # T266608
  • 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1800d11: Add namespace aliases to Turkish Wikibooks (T266608) (duration: 00m 57s)
  • 23:22 eileen: civicrm revision changed from e1d65b0f3a to 3317d30356, config revision is d70fe02cb9
  • 23:18 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwiktionary --fix # T266609
  • 23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 090f757: Add namespace aliases to Turkish Wiktionary (T266609) (duration: 00m 58s)
  • 22:35 mutante: mw1268 - depooled for T266164
  • 22:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
  • 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:32 mutante: mw1269 rsyncd/ferm for scap proxy was enabled - mw1268 rsyncd/ferm for scan proxy was removed - deploy1001 scap-proxies dsh group was adjusted
  • 22:21 mutante: replacing scap proxy for rack A7 eqiad because mw1268 needs to move physically (T266164)
  • 22:21 bstorm: updated packages for thirdparty/kubeadm-k8s-1-17 to prepare for install T263284
  • 22:10 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:08 razzi@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:06 mutante: depooled mw1267 (T266164)
  • 22:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
  • 22:04 mutante: scandium - puppet disabled again (but only until tomorrow), downtimed in Icinga, for ongoing parsoid tests from testreduce1001
  • 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:23 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:06 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:31 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:31 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session on mwmaint1002 (wiki=ukwiki; T246539)
  • 19:13 Amir1: rolling restart of ores uwsgi
  • 19:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:16 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikiLove on hewikiquote (T266744) (duration: 00m 57s)
  • 18:09 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:07 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 18:07 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:06 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 18:06 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:06 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
  • 18:05 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikiquote wikilove # T266744
  • 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b7eaaab: [cswiki] Set wgGEHomepageManualAssignmentMentorsList to Wikipedie:Potřebuji pomoc/Mentoři/Manuální (T245639) (duration: 00m 57s)
  • 17:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 17:29 hashar: Restarted CI Jenkins a bit ago
  • 17:15 hashar: CI: killed all java agents (java upgrade)
  • 17:12 hashar: Stopping CI Jenkins
  • 16:59 XioNoX: Delete cr1-eqiad:ae2.1120 and related static routes - T265288
  • 16:46 _joe_: restarted kartotherian on all servers in eqiad at the same time
  • 16:38 XioNoX: Move cr2-eqiad:ae2.1120 to cloudsw1-d5:irb.1120 - T265288
  • 16:34 XioNoX: force VRRP master on cr1-eqiad - T265288
  • 16:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
  • 16:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
  • 15:34 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: switch restbase to use envoy, https (duration: 00m 57s)
  • 15:22 moritzm: installing bacula updates from Buster point release
  • 15:22 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/intersection/: 483c3bc: Attempt to add a query cache to DPL (T263220) (duration: 00m 58s)
  • 15:16 papaul: poweroff mc2029 for relocation
  • 15:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 19c5aff: Set wgDLPQueryCacheTime to 120 at all wikis (T263220) (duration: 00m 59s)
  • 15:09 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase to use envoy, https (duration: 00m 57s)
  • 15:06 vgutierrez: rolling restart of ATS to upgrade to trafficserver 8.0.8-1wm3 - T265911
  • 14:59 papaul: poweroff sessionstore2002 for relocation
  • 14:36 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:35 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:33 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:29 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:24 elukey: restart zookeeper on an-conf1001 for openjdk upgrades
  • 14:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:08 godog: bump FS for prometheus codfw global instance
  • 13:54 elukey: roll out profile::java on all zookeeper instances
  • 13:53 moritzm: installing Java 11 security updates
  • 13:52 bblack: authdns1001 - restart gdnsd - T266746
  • 13:46 bblack: authdns2001 - restart gdnsd - T266746
  • 13:38 bblack: staggered restart of gdnsd on dns[12345]001 (1/2 recursors in each DC) - T266746
  • 13:29 bblack: staggered restart of gdnsd on dns[12345]002 (1/2 recursors in each DC) - T266746
  • 13:25 Urbanecm: Correction: Obviously 1002 (T246539)
  • 13:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=idwiki; T246539)
  • 13:21 moritzm: installing bluez security updates on stretch
  • 12:56 marostegui: Make orchestrator discover pc2 T266485
  • 12:55 marostegui: Deploy orchestrator grants on pc2 T266485
  • 12:44 marostegui: Deploy grants for cluster alias on pc1 T266485
  • 12:35 moritzm: upgrade idp-test* hosts to latest Java securiy updates
  • 12:35 moritzm: restart idp-test
  • 12:34 ariel@deploy1001: Finished deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables (duration: 00m 05s)
  • 12:33 ariel@deploy1001: Started deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables
  • 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 11:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 11:14 Urbanecm: EU B&C window done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 28152b7: Add another SDC property to search for matching media statements (T264925) (duration: 00m 58s)
  • 11:11 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:07 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:07 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:06 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:06 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:12 elukey: restart tilerator on maps100[1,4] - redis errors in the logs
  • 10:11 elukey: restart tilerator on maps1002 - redis errors in the logs
  • 10:03 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:03 elukey: drop 10.64.21.6/24 and 2620:0:861:105:10:64:21:6/64 from netbox (an-tool-ui1001 related records)
  • 09:59 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Fix cxserver's configuration to use envoy (duration: 00m 59s)
  • 09:52 elukey: add gdnsd.service to all gdnsd hosts (with LimitNOFILE=infinity as override) - no daemon restart done - T266746
  • 09:41 marostegui: Deploy schema change on s8 wikidata codfw master (db2079) T264109
  • 09:33 elukey: clean up 10.64.21.7/24 and 2620:0:861:105:10:64:21:7/64 from netbox (an-test-ui1001 already have ips previously allocated by makevm)
  • 09:32 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 09:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:54 vgutierrez: turn off ECDHE-ECDSA-AES128-SHA support on the main caching cluster - T258405
  • 08:54 moritzm: fixing up stray jenkins auto restart timers on secondary releases server
  • 08:53 vgutierrez: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 T266567 T264398
  • 08:48 moritzm: fixing up stray mcelog auto restart timers on kubestage*
  • 08:38 moritzm: fixing up stray cas auto restart timers on secondary IDP servers
  • 08:19 moritzm: fixing up stray pmacctd auto restart timers on netflow*
  • 08:19 moritzm: fixing up stray pcacctd auto restart timers on netflow*
  • 08:02 marostegui: Disconnect replication codfw -> eqiad on s1 T266663
  • 07:56 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns1001
  • 07:54 marostegui: Disconnect replication codfw -> eqiad on s4 T266663
  • 07:50 vgutierrez: restart haproxy on authdns2001
  • 07:49 marostegui: Disconnect replication codfw -> eqiad on s8 T266663
  • 07:48 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 07:46 marostegui: Disconnect replication codfw -> eqiad on s3 T266663
  • 07:43 vgutierrez: restart anycast-healthchecker on authdns2001
  • 07:34 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns2001
  • 07:27 elukey: "sudo truncate -s 10g /var/log/daemon.log" on authdns2001
  • 06:52 marostegui: Disconnect replication codfw -> eqiad on s2 T266663
  • 06:38 marostegui: Disconnect replication codfw -> eqiad on s7 T266663
  • 06:36 marostegui: Disconnect replication codfw -> eqiad on s6 T266663
  • 06:25 elukey: execute 'truncate -s 10g /var/log/syslog.1 on authdns2001 - root partition full
  • 06:23 marostegui: Disconnect replication codfw -> eqiad on s5 T266663
  • 06:10 marostegui: Disconnect replication codfw -> eqiad on es4 and es5 T266663
  • 06:07 marostegui: Disconnect replication codfw -> eqiad on x1 T266663
  • 05:58 marostegui: Disconnect replication codfw -> eqiad on pc1, pc2 and pc3 T266663
  • 04:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 01:41 mutante: scandium reimaged a second time after making puppet changes to ensure nodejs/npm is NOT installed anymore (T257906)
  • 01:17 ryankemper: T266492 Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad`
  • 01:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 00:51 ryankemper: Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy
  • 00:14 Amir1: rolling restart of ores
  • 00:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:04 ryankemper: Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 00:03 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 00:03 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 00:02 ryankemper: Following wdqs deploy, https://query.wikidata.org successfully responds to an example query
  • 00:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s)

2020-10-28

  • 23:54 ryankemper: Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet
  • 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
  • 23:52 ryankemper@deploy1001: deploy aborted: 0.3.53 (duration: 00m 00s)
  • 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
  • 22:54 mutante: scandium - scap pull after reinstalling OS
  • 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:41 ryankemper: Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`
  • 21:22 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:22 ladsgroup@deploy1001: Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)
  • 19:56 jgleeson: updated Smashpig from 2246685626 to 09f29c1da5
  • 19:53 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 19:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:36 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 tgr_: Morning deploys done
  • 18:55 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Temporary enable 'editpage' warn logging (T251023) (duration: 00m 57s)
  • 18:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "cirrus: Hardcode more_like to codfw cirrus cluster" (duration: 00m 56s)
  • 18:45 tgr@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Config: Revert "Revert "Increase cirrus morelike pool counter by 20%"" () (duration: 00m 57s)
  • 18:43 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: Suggested edits: Include page ID with task preview data (T266600) (duration: 00m 59s)
  • 18:19 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Removing obsolete license definition (duration: 01m 00s)
  • 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:02 elukey@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 17:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 hnowlan: reimporting OSM data for eqiad
  • 17:24 hnowlan: removing OSM database on maps1004
  • 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
  • 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
  • 16:18 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet
  • 16:16 hnowlan: Disabling tilerator in eqiad
  • 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:06 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:05 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:03 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:51 Amir1: restarting uwsgi on ores in eqiad
  • 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 15:23 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 15:10 godog: roll restart logstash5 in codfw
  • 14:50 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 14:05 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 12:39 moritzm: installing libdatetime-timezone-perl updates
  • 11:46 XioNoX: configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - T266561
  • 10:39 ema: due to T266651, cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 T266567 T264398
  • 10:38 elukey: clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - T266648
  • 10:35 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:25 ema: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 T266567 T264398
  • 10:20 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:26 jayme: imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia
  • 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:37 jynus: updated dump grants on db2093
  • 07:53 volans: upgraded python3-wmflib to 0.0.3 on the cumin hosts - T257905
  • 07:40 godog: update thanos-fe1002 to thanos 0.16.0 - T261281
  • 07:22 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 04:43 ryankemper: T266492 Finished rolling restart of codfw cirrus cluster
  • 04:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 02:58 ryankemper: T266492 Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw`
  • 02:57 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
  • 02:12 eileen: tools revision changed from a2a91d6c6a to 087a596d3a
  • 00:40 eileen: civicrm revision changed from 4fdfb8408b to e1d65b0f3a, config revision is f16003ab62

2020-10-27

  • 22:20 mutante: systemctl reset-failed on various servers to see which are coming back later from failed auto_restart and which don't
  • 21:40 mutante: mwmaint2001 - systemctl reset-failed - mediawiki_job_parser_cache_purging.service
  • 20:56 mutante: ms-be1057 is network down but running, NO-CARRIER on NIC, cable disconnected?
  • 20:43 mutante: releases2002 - systemctl reset-failed .. after removing wmf_auto_restart_rsync
  • 20:13 mutante: gerrit1001/gerrit2001: manually deleting list_mediawiki_extensions cron job (T266024)
  • 19:40 eileen: civicrm revision changed from bb7c08bf6d to 4fdfb8408b, config revision is f16003ab62
  • 18:35 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:22 mutante: gerrit1001/2001 - sudo rm /var/www/mediawiki-extensions.txt
  • 17:18 ejegg: updated payments-wiki from 4c1503ad91 to adc3369cb3
  • 16:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:34 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 16:05 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:42 mepps: updated payments-wiki-staging from 5fdd29bc16 to 4c1503ad91
  • 15:25 ema: cp4032: downgrade varnish to 6.0.4 T264398
  • 15:13 ema: cp4032: varnish-frontend-restart with libvmod-netmapper 1.9-1 T266567
  • 14:55 ema: upload libvmod-netmapper 1.9-1 to buster-wikimedia component/varnish6 T266567
  • 14:49 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:40 _joe_: restarting envoyproxy on the jobrunners in codfw
  • 14:36 akosiaris: rolling restart of all pods in codfw changeprop-jobqueue
  • 14:27 _joe_: restart php-fpm on jobrunners in codfw
  • 14:17 cdanis: ran puppet on alert1001
  • 14:16 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 14:11 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:09 rzl@cumin1001: MediaWiki read-only period ends at: 2020-10-27 14:09:02.873019
  • 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:06 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:06 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:01 rzl@cumin1001: MediaWiki read-only period starts at: 2020-10-27 14:01:54.999830
  • 14:01 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:56 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:56 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:55 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:50 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:49 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:46 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 13:04 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 13:01 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 12:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 12:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 12:51 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:14 ema: A:cp remove libvarnishapi1, replaced by libvarnishapi2 a while ago T261487
  • 11:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 11:06 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:54 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 10:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:21 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqiad - T265589
  • 10:20 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqsin - T265589
  • 10:19 XioNoX: update policies from-zone production to-zone junos-host on mr1-ulsfo - T265589
  • 10:15 XioNoX: update policies from-zone production to-zone junos-host on mr1-esams - T265589
  • 10:06 XioNoX: update policies from-zone production to-zone junos-host on mr1-codfw - T265589
  • 08:58 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:39 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
  • 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 08:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:15 godog: update thanos-fe2002 to thanos 0.16.0 - T261281
  • 07:35 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 06:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 06:50 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-4
  • 06:42 ryankemper: T263970 Set number of replicas to 2 (from previous value of 1) for all codfw indices matching `apifeatureusage*`, new shards have been assigned without issue

2020-10-26

  • 23:12 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set (T266501) (duration: 01m 00s)
  • 22:30 mutante: netflow5001 - systemctl reset-failed
  • 21:44 rzl: live test of sre.switchdc.mediawiki complete, the foregoing logging noise had no actual production impact
  • 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 21:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 21:41 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 21:37 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-10-26 21:37:17.809596
  • 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 21:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 21:35 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-10-26 21:35:20.837214
  • 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 21:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 21:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 21:32 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 21:32 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 21:31 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:31 rzl: starting a live test of sre.switchdc.mediawiki, which will create some logging noise but no actual production impact
  • 20:54 mutante: scandium rm /usr/local/bin/update_parsoid.sh (gerrit:636494)
  • 20:15 ladsgroup@deploy1001: Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata (T261326) (duration: 06m 53s)
  • 20:08 ladsgroup@deploy1001: Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata (T261326)
  • 19:31 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:29 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:26 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove variant setting override (no-op) (T265556) (duration: 00m 57s)
  • 18:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure $wgBabelCategoryNames on ndswiki (T264990) (duration: 00m 58s)
  • 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www.legislation.gov.uk to $wgCopyUploadsDomains on commonswiki (T265690) (duration: 00m 58s)
  • 18:47 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Make variant D the default, remove variant A (T265372, T265556) (duration: 00m 58s)
  • 18:46 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/vendor/wikimedia/parsoid/: Bump wikimedia/parsoid to v0.13.0-a13, enabling 6-element DSRs (T266285) (duration: 00m 58s)
  • 18:43 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/skins/Vector/: Fix logic in collapsibleTabs code (T71729) (duration: 00m 58s)
  • 18:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wtp2001-wtp2020 from LinterSubmitterWhitelist (T265558) (duration: 00m 59s)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Make variant D the default on all wikis (T265556) (duration: 00m 58s)
  • 17:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 17:48 mutante: an-worker109* - systemctl reset-failed to clear Icinga alerts related to wmf_auto_restart changes
  • 17:45 mutante: releases2002,netmon2001, various other hosts - systemctl reset-failed to clear Icinga alerts related to wmf_auto_restart changes
  • 17:39 krinkle@deploy1001: Synchronized php-1.36.0-wmf.13/resources/src/mediawiki.util/: T265809, I1011f6 (duration: 01m 00s)
  • 16:41 XioNoX: bounce security log on pfw3-eqiad - T263833
  • 16:29 XioNoX: set security-log traceoptions on pfw3-eqiad - T263833
  • 16:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:51 rzl@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|graphoid|kartotherian|mathoid|mobileapps|ores|parsoid|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|termbox|wdqs|wdqs-internal|wikifeeds|zotero,name=eqiad
  • 15:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=eqiad
  • 15:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
  • 15:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
  • 15:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
  • 15:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=termbox,name=eqiad
  • 15:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 15:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 15:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=schema,name=eqiad
  • 15:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
  • 15:08 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase,name=eqiad
  • 15:05 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
  • 15:02 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=eqiad
  • 14:59 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=proton,name=eqiad
  • 14:56 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid,name=eqiad
  • 14:53 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:50 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mobileapps,name=eqiad
  • 14:47 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=eqiad
  • 14:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki (duration: 16m 43s)
  • 14:44 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:41 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=graphoid,name=eqiad
  • 14:38 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
  • 14:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=eqiad
  • 14:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=eqiad
  • 14:30 ppchelko@deploy1001: Started deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki
  • 14:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
  • 14:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
  • 14:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=echostore,name=eqiad
  • 14:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=cxserver,name=eqiad
  • 14:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
  • 14:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=api-gateway,name=eqiad
  • 14:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=apertium,name=eqiad
  • 14:06 rzl@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=apertium|api-gateway|citoid|cxserver|echostore|eventgate-analytics|eventgate-analytics-external|eventgate-logging-external|eventgate-main|eventstreams|graphoid|kartotherian|mathoid|mobileapps|ores|parsoid|proton|push-notifications|recommendation-api|restbase|restbase-async|schema|search|sessionstore|termbox|wdqs|wdqs-internal|wikifeeds|zotero,name=eqiad
  • 13:48 moritzm: imported cas 6.2.4-1 to apt.wikimedia.org T265857
  • 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bff6b37: Add foto.digitalarkivet.no to wgCopyUploadsDomains whitelist of Wikimedia Commons (T266390) (duration: 01m 14s)
  • 11:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 11:26 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 11:11 vgutierrez: upgrade trafficserver to 8.0.8-1wm3 on cp4032 - T265911
  • 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 11:02 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 10:51 vgutierrez: manually reloading nginx on cloudelastic[1005-1006]
  • 10:29 vgutierrez: upload trafficserver 8.0.8-1wm3 to apt.wm.org (buster) - T265911
  • 10:18 godog: roll restart pybal to apply latest configuration
  • 09:51 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-3
  • 09:31 moritzm: restarting PHP FPM on mw canaries to pick up freetype update
  • 09:04 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:58 moritzm: installing freetype security updates for stretch
  • 08:57 XioNoX: remove down sessions to AS38758
  • 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:43 XioNoX: remove down sessions to AS8560
  • 08:41 XioNoX: remove down sessions to AS31334
  • 08:28 XioNoX: remove down sessions to AS6327
  • 08:27 XioNoX: remove down sessions to AS8674
  • 08:25 XioNoX: remove down sessions to AS24429
  • 08:21 XioNoX: remove down sessions to AS16509
  • 06:59 _joe_: rolling restart of php7.2-fpm on the codfw jobrunners, to reduce the number of dangling transcodes after restarting cp-jobqueue for a deploy
  • 06:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 06:16 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=jobrunner,dc=codfw,name=mw224.*
  • 06:15 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=videoscaler,dc=codfw,name=mw228.*
  • 06:10 marostegui: Warm up tables T261914

2020-10-25

  • 15:53 dwisehaupt: kernel upgrade and reboot for frdb1003
  • 15:50 dwisehaupt: kernel upgrade and reboot for fran1001

2020-10-23

  • 22:56 mutante: added Nuria to "nda" LDAP group - leaving her in "wmf" until the actual last day - shell account remains so no puppet change needed in ldap_only_admins (T266086)
  • 15:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:04 ema: rolling thumbor-instances restart to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/636012/ T266155
  • 12:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
  • 10:57 kormat: uploaded orchestrator v3.2.3 to apt.wikimedia.org buster-wikimedia - T266023 (forgot to log this earlier)
  • 10:56 volans: uploaded python3-wmflib_0.0.3 to apt.wikimedia.org buster-wikimedia - T257905
  • 10:09 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-2
  • 09:51 moritzm: masking slapd on the old Stretch replicas to uncover potential direct access outside of the LVSes T264388
  • 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:31 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-1
  • 09:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:09 volans: upgrading spicerack to 0.0.44 on cumin hosts - T257905

2020-10-22

  • 22:42 mutante: ganeti1001 - adding 2 more vcpus to VM testreduce1001 - T257940
  • 22:03 mutante: deploy1002 - armed keyholder, all deployment keys loaded T265963
  • 21:56 mutante: deploy1002 - scap pull and added to mediawiki-installation "dsh" group - will be part of scap trains but just like any appserver (T265963)
  • 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:13 mutante: deploy1002 currently cloning ALL the deployment repos - new setup
  • 18:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:54 mutante: applying deployment_server role to new server deploy1002 - might show up in monitoring but is not prod yet, deploy1001 still is
  • 18:34 mutante: adding mcrouter cert for deploy1002.eqiad.wmnet T265963
  • 18:12 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Expand to group1 (T123582) (duration: 00m 56s)
  • 18:12 volans: cumin 'A:dns-rec' 'rec_control wipe-cache wikimedia.org$' - T258729
  • 18:07 chaomodus: Updating eqiad public network DNS to automation
  • 17:50 volans: cumin 'A:dns-rec' 'rec_control wipe-cache eqiad.wmnet$' - T258729
  • 17:49 elukey: add thirdparty/bigtop14 to buster-wikimedia
  • 17:46 chaomodus: Updating eqiad private network DNS to automation
  • 17:21 bd808@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 17:21 bd808@cumin1001: Added views for new wiki: smnwiki T264900
  • 17:07 bd808@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:56 moritzm: installing remaining mariadb-10.3 updates for buster (as packaged in Debian, not the wmf-mariadb package)
  • 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:13 andrewbogott: upgrading mariadb on cloudcontrol1003, 1004, 1005
  • 14:05 ottomata: bump camus version to wmf12 for all camus jobs. should be no-op now. - T251609
  • 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for all eventgate-analytics-external bound streams - T251609 (duration: 01m 02s)
  • 13:55 moritzm: depooling ldap-eqiad-replica01/ldap-eqiad-replica02 T264388
  • 13:41 moritzm: pooling ldap-replica1001/1002 T264388
  • 13:10 moritzm: depooling ldap-replica2001/2002 T264388
  • 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.14
  • 13:01 moritzm: pooling ldap-replica2004 T264388
  • 12:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for 3 eventgate-analytics bound streams - T251609 (duration: 01m 05s)
  • 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 52ad2d4: Do not log logins at loginwiki via CU (T253802) (duration: 01m 06s)
  • 12:03 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
  • 11:59 Lucas_WMDE: EU backport&config window done
  • 11:58 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable propagatePageDeletion on Test Wikidata, 2/2 (duration: 01m 04s)
  • 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable propagatePageDeletion on Test Wikidata, 1/2 (duration: 01m 02s)
  • 11:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=huwiki; T246539)
  • 11:39 moritzm: restarting nginx on acmechief*, debmonitor*, schema*, puppetdb* to pick up freetype update
  • 11:38 marostegui: Compare s1-s8 tables - T261914
  • 11:33 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: Config: Add ary, avk, awa, lld, shy and smn to InterwikiSortOrders.php (duration: 01m 08s)
  • 11:31 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 moritzm: restarting apache and smokeping* on netmon* to pick up freetype update
  • 11:21 moritzm: correction: installing freetype security updates for buster (stretch TBD)
  • 10:43 moritzm: installing freetype security updates for stretch/buster
  • 10:33 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:27 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:38 arturo: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 change to network data yaml
  • 08:31 kormat: enabling replication from eqiad to codfw T261914
  • 08:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:52 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 03:37 eileen: civicrm revision changed from 4dce7bf535 to bb7c08bf6d, config revision is 9a522d03dd
  • 03:13 eileen: civicrm revision changed from 3c3dcf80ae to 4dce7bf535, config revision is 9a522d03dd
  • 01:12 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@870829c]: 0.3.52 (duration: 09m 07s)
  • 01:04 ryankemper: Tests passing on canary `wdqs1003`, proceeding with wdqs deploy for rest of fleet
  • 01:03 ryankemper@deploy1001: Started deploy [wdqs/wdqs@870829c]: 0.3.52

2020-10-21

  • 23:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: T266033 (duration: 01m 05s)
  • 23:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: T265751 T265754 (duration: 01m 08s)
  • 21:38 mutante: testreduce1001 assigned 2 more GBs of RAM - rebooting (T257940, T257906)
  • 19:44 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T264963)
  • 19:15 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T264963)
  • 18:13 Urbanecm: Morning B&C window done
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 45312d3: [WikibaseMediaInfo] Fix concept chips array nesting structure (T256431) (duration: 01m 05s)
  • 18:12 mepps: updated payments-wiki-staging from db03677b2d to 5fdd29bc16
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d94e33f: cirrus: Hardcode more_like to codfw cirrus cluster (duration: 01m 05s)
  • 17:56 XioNoX: configure FB PNI in eqdfw
  • 17:43 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.14/skins/WikimediaApiPortal: Backport gerrit:635329, T266021 (duration: 01m 06s)
  • 17:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch ParserCache to JSON on testwiki gerrit:635382 (duration: 01m 05s)
  • 17:24 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 08s)
  • 17:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 06s)
  • 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:57 mutante: scandium - disabling puppet so that Parsoid team can make some tests on testreduce1001 today
  • 16:46 effie: restart php-fpm and pool mw2252 and mw2328
  • 15:58 Lucas_WMDE: Deployed patch for T260349
  • 15:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:31 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:28 moritzm: updating prometheus-openldap-exporter to 0+git20171128-3 to buster-wikimedia
  • 15:23 jbond42: upgrade puppetlabs-stdlib to 6.5.0 https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278
  • 15:08 moritzm: imported prometheus-openldap-exporter 0+git20171128-3 to buster-wikimedia T264388
  • 15:02 otto@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster (duration: 02m 56s)
  • 15:01 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 otto@deploy1001: Started deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster
  • 14:56 crusnov@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler T242554 (duration: 01m 07s)
  • 14:34 dcausse: restarting blazegraph on codfw servers (T263952)
  • 13:21 moritzm: pooling ldap-replica2003 T264388
  • 13:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.14 (duration: 01m 04s)
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.14
  • 11:40 matthiasmullie: EU B&C done
  • 11:33 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [WikibaseMediaInfo] Add config for related terms API (duration: 01m 04s)
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 785404f: Disable registrations stat on Special:TranslationStats (T264158) (duration: 01m 05s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1156742: Enable ContentTranslation in 5 Wikipedias as a default tool (T264737; T264738; T264739; T264740; T264741) (duration: 01m 30s)
  • 11:00 marostegui: Upgrade db2093's mariadb version T266003
  • 10:58 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:56 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=rowiki; T246539)
  • 10:37 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; T246539)
  • 10:01 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; T246539)
  • 10:00 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; T246539)
  • 09:59 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 100% - T258405
  • 09:42 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; T246539)
  • 09:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; T246539)
  • 09:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; T246539)
  • 09:37 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; T246539
  • 09:30 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; T246539)
  • 09:23 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:21 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:52 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; T246539)
  • 08:50 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; T246539
  • 08:46 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # T246539
  • 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:38 root@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:33 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:10 XioNoX: Upgrade Routinator 3000 to 0.8.0 on rpki1001 - T266001
  • 08:09 XioNoX: add Routinator 3000 0.8.0 to apt - T266001
  • 07:58 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/635319
  • 04:35 ryankemper: re-enabled icinga notifications on all wdqs hosts now that `wdqs-updater` is healthy

2020-10-20

  • 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
  • 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
  • 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
  • 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
  • 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
  • 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
  • 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
  • 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
  • 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:48 effie: depooling mw2328 - T266052
  • 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args (duration: 01m 31s)
  • 15:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args
  • 15:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: fee2d3b: Prevent uncaught warnings/exception on Special:AbuseFilter (T265994) (duration: 01m 03s)
  • 14:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: 00ef00f: Prevent uncaught warnings/exception on Special:AbuseFilter (T265994) (duration: 01m 01s)
  • 14:48 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/FileImporter/: 5eee9b7: Set originalRequest (incl. X-Forwarded-For) for remote edits (T265810) (duration: 01m 06s)
  • 14:16 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/FileImporter/: 5f8d3de: Set originalRequest (incl. X-Forwarded-For) for remote edits (T265810) (duration: 01m 09s)
  • 14:15 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master u=)]$ sudo /usr/local/sbin/fix-staging-perms
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13033 and previous config saved to /var/cache/conftool/dbconfig/20201020-135436-root.json
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 80%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13032 and previous config saved to /var/cache/conftool/dbconfig/20201020-133933-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 60%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13031 and previous config saved to /var/cache/conftool/dbconfig/20201020-132430-root.json
  • 13:19 XioNoX: install routinator 3000 0.8.0 on rpki2001 - T266001
  • 13:16 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.14
  • 13:11 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.14 (duration: 58m 03s)
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13030 and previous config saved to /var/cache/conftool/dbconfig/20201020-130926-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13029 and previous config saved to /var/cache/conftool/dbconfig/20201020-125423-root.json
  • 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 12:13 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.14
  • 11:37 liw: 1.36.0-wmf.14 was branched at 1b7b5f7 for T263180
  • 11:35 Lucas_WMDE: EU backport/config window done
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Backport: SearchSatisfaction: Set isAnon field (T259250) (duration: 00m 57s)
  • 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set Wikidata MF to collapse sections by default (T239195) (duration: 00m 56s)
  • 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove noratelimit from Wikidata bot group (T258354) (duration: 00m 56s)
  • 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 10:04 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 09:59 dcausse: T255399: resuming wdqs-data-reload manually from chunk no 776 on wdqs1009
  • 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .

2020-10-19

  • 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 23:56 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 23:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation (duration: 04m 33s)
  • 23:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation
  • 23:02 mutante: etherpad got restarted with new config options related to rate limiting - hopefully this fixed T265490
  • 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:19 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions (duration: 04m 48s)
  • 21:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions
  • 21:01 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 20:41 eileen: drush vset match_on_import 1
  • 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp2020.codfw.wmnet
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item (duration: 01m 03s)
  • 20:16 mutante: decom'ing wtp201[0-9].codfw.wmnet (pooled=inactive) T265558
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:15 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp201[0-9].codfw.wmnet
  • 20:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item
  • 20:09 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=parsoid,service=canary
  • 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:01 mutante: decom'ing wtp200[1-9].codfw.wmnet (pooled=inactive) T265558
  • 20:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp200[1-9].codfw.wmnet
  • 19:57 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads (duration: 03m 35s)
  • 19:41 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads
  • 19:35 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:34 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 19:33 mutante: wtp2001 - sudo confctl decommission
  • 19:29 dzahn@cumin1001: conftool action : set/weight=0; selector: dc=codfw,cluster=parsoid,service=canary
  • 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Set default variant to D on trwiki (T243445, T265556) (duration: 00m 56s)
  • 18:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 18902aa: Change votewiki language temporarily to fa for fawiki elections (T262689) (duration: 00m 56s)
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on trwiki (T243445) (duration: 00m 57s)
  • 18:29 tzatziki: removing 10 files for legal compliance
  • 18:24 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/MobileFrontend/: Fix mobile diff redirect when curid parameter is present (T265654) (duration: 00m 58s)
  • 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable variant C/D for new users (T265556) (duration: 00m 56s)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop wgHiddenPrefs hack for VE beta feature (T254349) (duration: 00m 56s)
  • 17:53 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:59 Urbanecm: mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=smnwiki --cluster=all
  • 15:31 elukey: update puppet compilers' facts
  • 14:36 bpirkle@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:634841 Add api.wikimedia.org to the list of allowed CORS origins (duration: 00m 57s)
  • 14:32 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 55s)
  • 14:30 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 56s)
  • 14:15 moritzm: installing llvm-toolchain-7 bugfix updates from Buster point release
  • 13:34 Urbanecm: Start of `[urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist` (T246539; wikis.dblist is medium wikis from group2.dblist)
  • 13:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:31 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:26 moritzm: import prometheus-openldap-exporter 0+git20171128-2+deb10u1 for buster-wikimedia T264388
  • 12:48 moritzm: installing httpcomponents-client security updates on Buster
  • 12:26 Urbanecm: Creation of smnwiki is done (T264859)
  • 12:25 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 56s)
  • 12:22 urbanecm@deploy1001: Synchronized langlist: Creating smnwiki (T264859) (duration: 00m 56s)
  • 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:16 marostegui: Sanitize smnwiki on db1124:3315 and db2094:3315 - T264900
  • 12:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating smnwiki (T264859) (duration: 00m 56s)
  • 12:15 marostegui: Deploy schema change on smnwiki T265321 T264900
  • 12:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating smnwiki (T264859)
  • 12:12 urbanecm@deploy1001: Synchronized dblists: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:11 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating smnwiki (T264859) (duration: 00m 55s)
  • 12:10 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating smnwiki (T264859) (duration: 00m 56s)
  • 11:51 moritzm: updating idp-test1001 to CAS 6.2.4
  • 11:46 moritzm: updating idp-test2001 to CAS 6.2.4
  • 11:43 Urbanecm: End of `[urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist` # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
  • 11:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` (T246539)
  • 11:40 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # T246539 # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
  • 11:31 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:24 Urbanecm: EU B&C window done
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ce92c98: Restore bureaucrat abilities at uzwiki (T265746) (duration: 00m 56s)
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 26b9726: Disable EditorJourney (UnderstandingFirstDay) (T252391) (duration: 01m 10s)
  • 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:13 Urbanecm: Manually run `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` for several small group2 wikis (T246539)
  • 10:57 Urbanecm: Start `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` in a tmux session named updateVarDumps at mwmaint2001 (T246539)
  • 10:53 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # T246539
  • 09:09 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 08:40 jayme: updated helm to 2.16.12-1 on deploy*,chartmuseum*,contint*
  • 08:37 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog2001 - T259780
  • 08:31 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:26 jayme: updated helm to 2.16.12-1 on deploy2001
  • 08:24 jayme: imported helm 2.16.12-1 to buster-wikimedia stretch-wikimedia jessie-wikimedia - T263616
  • 08:01 godog: re-enable compaction for prometheus[12]003 - T261281
  • 07:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 07:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 07:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 ', diff saved to https://phabricator.wikimedia.org/P13022 and previous config saved to /var/cache/conftool/dbconfig/20201019-071614-marostegui.json
  • 06:46 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27 (duration: 00m 10s)
  • 06:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27

2020-10-17

  • 13:22 Urbanecm: [urbanecm@mwmaint2001 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Fæ . # T264529

2020-10-16

  • 21:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:43 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:25 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:39 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:37 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:43 thcipriani: restarting gerrit due to gc thrashing
  • 16:25 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s)
  • 16:21 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors
  • 15:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:36 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 15:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:01 bblack@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:41 effie: pooling mw2279.codfw.wmnet T264698
  • 12:11 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:09 jiji@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:35 reedy@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping T265571 (duration: 01m 12s)
  • 09:23 ema: text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 09:19 ema: upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 09:08 ema: upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 09:03 XioNoX: eqsin, push CR 634473
  • 09:01 ema: text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:53 ema: upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 08:52 XioNoX: add BGP_IXP_RS_in to eqsin RS BGP sessions
  • 08:48 ema: text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:29 ema: upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 08:24 ema: text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 08:09 elukey: reboot stat1005/stat1008 to pick up correct GPU settings
  • 08:09 ema: upload@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest T264074
  • 07:59 ema: text@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances T264074
  • 07:19 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table (duration: 04m 22s)
  • 07:15 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table
  • 06:57 XioNoX: enable cr2-eqdfw:xe-0/1/2
  • 02:14 eileen: civicrm revision changed from 585eb835d8 to 3c3dcf80ae, config revision is f76d7849bc
  • 01:01 ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`
  • 00:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 00:56 cdanis@cumin1001: START - Cookbook sre.network.cf

2020-10-15

  • 23:49 ryankemper: Began in-place reindex of `eqiad`, `codfw`, and `cloudelastic`. Running on `ryankemper@mwmaint2001` under tmux sessions `inplace_reindex_[eqiad, codfw, cloudelastic]`
  • 23:00 krinkle@deploy1001: Synchronized wmf-config/env.php: I245e84e0b8c (duration: 01m 10s)
  • 22:09 cdanis: previous sre.network.cf invocation was a no-op; just checking status
  • 22:08 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 22:08 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 22:06 mutante: depooled remaining wtp* servers in codfw. old parsoid servers, new servers are parse2* (T265558)
  • 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp2020.codfw.wmnet
  • 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[6-9].codfw.wmnet
  • 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[0-5].codfw.wmnet
  • 20:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:27 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources (duration: 06m 22s)
  • 19:43 marxarelli: all wikis promoted to 1.36.0-wmf.13 (T263179)
  • 19:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources
  • 19:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.13
  • 19:30 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:20 catrope@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing (T265500) (duration: 01m 29s)
  • 19:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing (T265500) (duration: 01m 51s)
  • 19:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/Echo/: Drop text indent in modern Vector (T264339) (duration: 01m 51s)
  • 19:09 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/: Vertically align personal tools (T264339) (duration: 01m 43s)
  • 19:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Revert "clientError: Adds is_logged_in tag to aid filtering" (T256173) (duration: 01m 58s)
  • 19:04 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/UploadWizard/: Work around LESS calculating calc() values wrong (T265560) (duration: 02m 07s)
  • 18:32 mutante: depooling wtp2005 through wtp2009 (parsoid, old server generation) T265558
  • 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[6-9].codfw.wmnet
  • 18:07 mutante: mx1001/mx2001: made previous live hack official and added benefactors@wikipedia alias, re-enabling puppet
  • 17:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:19 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 jbond42: deleteing old pcc reports in compiler1002 to free disk space
  • 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 17:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 16:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 16:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 16:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 16:51 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 16:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 16:46 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 16:11 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/CheckUser/includes/specials/: fd94002: Revert "Validate username input before constructing subpage links" (T265606) (duration: 02m 48s)
  • 15:50 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:35 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:19 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 15:07 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs (duration: 00m 59s)
  • 15:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs
  • 14:51 elukey: roll restart druid-historical daemons on druid1004-1008 to pick up new conn pooling changes
  • 14:51 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 14:45 jbond42: enable puppet post deploy puppetdb change blacklisting dynamic facts
  • 14:41 ema: varnish 6.0.6-1wm2 uploaded to apt.wikimedia.org component/varnish6 T264074
  • 14:38 jbond42: disable puppet to deploy puppetdb change blacklisting dynamic facts
  • 14:21 ema: cp3050: systemctl reload varnishkafka-webrequest.service T264074
  • 14:21 jayme: imported doxygen_1.8.19-1~deb10+wmf1 to component/ci buster-wikimedia - T265579
  • 14:12 ema: cp3050: restart varnishkafka-webrequest w/ libvarnishapi2 6.0.6-1wm2 T264074
  • 14:11 ema: cp3050: upgrade varnish to 6.0.6-1wm2 T264074
  • 14:10 ema: cp3050: upgrade varnish to 6.0.6-1wm2 T26407
  • 12:58 gilles@deploy1001: Finished deploy [performance/navtiming@dff55f8]: (no justification provided) (duration: 00m 05s)
  • 12:58 gilles@deploy1001: Started deploy [performance/navtiming@dff55f8]: (no justification provided)
  • 12:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:47 vgutierrez: restart ats-backend on cp3050
  • 10:00 akosiaris: T264209. Initiate a docker pull of docker-registry.discovery.wmnet/mwcachedir:0.0.1 from all kubernetes and kubernetes staging nodes.
  • 08:17 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 04:27 ryankemper: Rolling upgrade for cirrus `codfw` complete
  • 04:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 02:18 ryankemper: Rolling upgrade for cirrussearch `codfw` beginning
  • 02:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 02:14 ryankemper: Rolling upgrade for cirrussearch `eqiad` is complete
  • 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 00:36 ryankemper: Beginning rolling upgrade for cirrussearch `eqiad`. Cookbook will restart elasticsearch on 36 nodes total, 3 nodes at a time
  • 00:36 eileen: tools revision changed from d4e08c52de to a2a91d6c6a
  • 00:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 00:24 twentyafterfour: phabricator update was uneventful
  • 00:13 twentyafterfour: updating phabricator

2020-10-14

  • 23:35 foks: Removing one further file for legal compliance
  • 23:28 foks: Removing nine files for legal compliance
  • 23:11 ebernhardson: Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day
  • 23:08 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s)
  • 23:00 dwisehaupt: all payments hosts in eqiad are now running the REL1_35 code.
  • 22:41 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s)
  • 22:38 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression
  • 22:13 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
  • 22:12 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
  • 22:08 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s)
  • 22:04 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive
  • 22:01 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: Make attribution source logic more defensive T263599 (duration: 01m 05s)
  • 21:51 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 (T123582) (duration: 01m 03s)
  • 21:33 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: Stylesheet needs to be compatible with cached HTML T265543 (duration: 01m 07s)
  • 20:39 marxarelli: group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: T263179)
  • 20:37 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11
  • 20:32 marxarelli: rolling back group1 due to malformed html in nav menu
  • 19:46 marxarelli: 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates (T263179)
  • 19:39 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
  • 19:38 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
  • 19:33 mutante: mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed
  • 19:14 mutante: depooling 5 of the older parsoid servers in codfw
  • 19:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet
  • 18:28 Urbanecm: wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # T265347
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d6a56bb: Add rollbacker right on uzwiki (T265509) (duration: 01m 04s)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 0da8999: Add spamblacklistlog as a default right for the CU log user (T239288) (duration: 01m 05s)
  • 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:59 elukey: drain + reboot an-worker1100 to pick up GPU settings - T255138
  • 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:29 elukey: drain + reboot an-worker110[1,2] to pick up GPU settings - T255138
  • 15:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 15:24 jayme: enabled and ran puppet on deploy1001 - T260917
  • 14:56 elukey: drain + reboot an-worker109[8,9] to pick up GPU settings - T255138
  • 14:55 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 14:12 jayme: disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - T260917
  • 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T264209
  • 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. T265183
  • 13:53 jbond42: enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs
  • 13:48 jbond42: disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs
  • 12:46 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - T258405
  • 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:43 moritzm: imported php-memcached, php-redis to component/icu63 T264991
  • 11:25 Urbanecm: EU B&C window completed
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c63632d: Enable DiscussionTools as a beta feature on 30 more wikis (T264693) (duration: 01m 15s)
  • 11:16 moritzm: imported php-igbinary, php-apcu-bc to component/icu63 T264991
  • 09:59 moritzm: imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 T264991
  • 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 08:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:09 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json
  • 06:12 marostegui: Change UNIQUE into KEY on enwikivoyage.imagelinks T265445
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance T260670 ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json

2020-10-13

  • 23:22 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: Revert removal of variant A (T265372) (duration: 01m 04s)
  • 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Rename GrowthExperiments help desk on ptwiki (T265214) (duration: 01m 04s)
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable event logging in MediaViewer (T260582) (duration: 01m 04s)
  • 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry on frwiki, fawiki, dewiki, cswiki (T264780) (duration: 01m 04s)
  • 21:16 mutante: icinga had gerrit health alert but did not notice an issue myself and was gone next check
  • 21:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:44 mutante: bast1002 - apt-get autoremove - cleans up golang and ruby packages
  • 20:44 mutante: bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)
  • 20:15 ebernhardson: unban elastic2029 from production-search-psi-codfw
  • 20:14 ebernhardson: restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell
  • 20:06 marxarelli: 1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates (T263179)
  • 20:03 ebernhardson: add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell
  • 19:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13
  • 19:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:40 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)
  • 19:00 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.13
  • 18:58 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)
  • 18:56 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)
  • 18:53 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)
  • 18:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11
  • 18:21 marxarelli: 1.36.0-wmf.11 promoted to group1. no new errors (T263177). promoting to all wikis
  • 18:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:09 robh: scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw T238036
  • 18:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:01 robh: scs-c1-codfw firmware update via T238036
  • 17:47 marxarelli: 1.36.0-wmf.13 branched at a6be801 for T263179
  • 17:35 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)
  • 17:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
  • 17:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 marxarelli: 1.36.0-wmf.11 promoted to group0. no new errors (T263177). preparing to promote to group1
  • 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 17:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 16:39 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 16:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)
  • 16:26 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc
  • 15:56 papaul: power down ms-be2036 for maintenance
  • 15:02 godog: bounce logstash on logstash1007, GC death
  • 14:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:18 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5b28fd6: Add setmentor to wgAvailableRights (duration: 00m 59s)
  • 13:42 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:40 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 13:15 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # T265336
  • 13:08 moritzm: imported php-mailparse, php-mongodb, php-msgpack to component/icu63 T264991
  • 12:50 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # T265336
  • 12:49 Urbanecm: End of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix` # T265336
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 for on-site maintenance T263837 ', diff saved to https://phabricator.wikimedia.org/P12975 and previous config saved to /var/cache/conftool/dbconfig/20201013-124940-marostegui.json
  • 12:20 moritzm: imported dh-php, php-acpu, php-imagick to component/icu63 T264991
  • 11:22 moritzm: imported php-defaults, php-excimer, php-luasandbox, php-geoip to component/icu63 T264991
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 90028b4: Add suppressredirect right to reviewers on bnwiki (T265169) (duration: 00m 58s)
  • 11:14 Urbanecm: Start of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix # T265336`
  • 11:13 volans: installed spicerack_0.0.43-1+deb10u1_amd64.deb on cumin2001 , need to wait a long-rnning cookbook to end to upgrade both hosts
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e61fceb: Add namespace aliases for Turkish Wikipedia (T265336) (duration: 00m 59s)
  • 10:47 jayme: no-change rolling restart of push-notifications in codfw - T265258
  • 10:29 volans: upgrading spicerack on cumin2001 to 0.0.44
  • 10:19 ema: cp3050: clear varnishkafka-webrequest's vut->sighup via stap T264074
  • 10:09 ema: cp3050: *reload* varnishkafka-webrequest T264074
  • 10:04 volans: uploaded spicerack_0.0.44 to apt.wikimedia.org buster-wikimedia
  • 09:55 ema: cp3054: systemctl restart varnishkafka-webrequest.service T264074
  • 09:51 ema: cp3052: systemctl restart varnishkafka-webrequest.service T264074
  • 09:39 kormat: running schema change against s1 in eqiad T259831
  • 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:32 ema: cp3050: set grouping by request (vut->g_arg = 2) on varnishkafka-webrequest T264074
  • 08:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:55 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:43 kormat: running schema change against s3 in eqiad T259831
  • 07:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:37 moritzm: installing ruby security updates on stretch
  • 07:02 moritzm: installing PHP 7.0 security updates
  • 06:39 moritzm: Installing httpcomponents-client security updates for Stretch
  • 05:35 marostegui: Set global innodb_change_buffering = inserts; on pc2009 T263443

2020-10-12

  • 17:03 jayme: fixed /var/lock/ permission (1777) on ms-be2036 - T265208
  • 15:41 godog: roll-restart logstash5 in codfw
  • 14:44 _joe_: freed 1.5 GB of space on ms-be2036 by running "apt-get clean"
  • 14:05 moritzm: uploaded php7.2 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 to component/icu63 T264991
  • 12:39 moritzm: installing rails security updates on Stretch
  • 12:26 moritzm: installing spice security updates on Buster
  • 11:38 Urbanecm: EU B&C done
  • 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fff2532: [testwiki, test2wiki] Allow bureaucrats to grant import rights (duration: 00m 58s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4966e8a: Enable wgCheckUserLogLogins at all wikis but few large wikis (T253802) (duration: 00m 58s)
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Require autoconfirmed status to edit Wikidata Properties (T254280) (duration: 01m 00s)
  • 10:26 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:26 hnowlan: roll-restarting restbase201[345678] for cert refresh
  • 08:50 moritzm: uploaded libxml2 2.9.4+dfsg1-2.2+deb9u3+wmf1 to component/icu63 T264991
  • 07:54 godog: reboot ms-be2036 - T265208
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime

2020-10-10

2020-10-09

  • 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Wikidata (T264799) (duration: 00m 59s)
  • 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Commons (T264799) (duration: 00m 59s)
  • 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role (T260271)
  • 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
  • 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group1, except Commons/Wikidata (T264799) (duration: 00m 57s)
  • 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 04s)
  • 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group0 (T264799) (duration: 00m 59s)
  • 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 06s)
  • 22:01 tgr_: rolling out T264799#6533622
  • 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # T263935
  • 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
  • 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
  • 20:04 dwisehaupt: upgrading payments1001 to buster
  • 19:14 dwisehaupt: upgrading payments1002 to buster
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:30 dwisehaupt: upgrading payments1003 to buster
  • 17:53 dwisehaupt: upgrading payments1004 to buster
  • 17:52 cstone: civicrm revision changed from b86a15a430 to 585eb835d8, config revision is 57843925bb
  • 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:45 jayme: helm rollback push-notification in eqiad to revision 8
  • 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:07 XioNoX: remove user from all network devices
  • 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
  • 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:36 moritzm: installing xen security updates for buster (libs only)
  • 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission

2020-10-08

  • 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
  • 23:37 ryankemper: `cloudelastic1005` done
  • 23:31 ryankemper: `cloudelastic1004` done
  • 23:27 ryankemper: `cloudelastic1003` done
  • 23:23 ryankemper: `cloudelastic1002` done
  • 23:16 tgr_: Evening deploys done
  • 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
  • 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes everywhere (T264793) (duration: 01m 01s)
  • 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
  • 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
  • 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
  • 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
  • 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
  • 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
  • 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
  • 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - T264793 (duration: 01m 01s)
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:43 volans: deploying Netbox DNS zone consolidation - T264273
  • 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
  • 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
  • 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide: (duration: 00m 06s)
  • 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
  • 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate by default on production (T264357) (duration: 01m 06s)
  • 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
  • 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
  • 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - T210137
  • 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
  • 16:19 hashar: Restarting CI Jenkins
  • 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:21 marostegui: Set global innodb_change_buffering = all; on pc2009 T263443
  • 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 T264991
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:29 kart_: Updated cxserver to 2020-10-08-053343-production (T264407, T264859)
  • 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
  • 10:37 moritzm: installing Postgres security updates on netboxdb1001
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
  • 10:32 moritzm: installing Postgres security updates on netboxdb2001
  • 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
  • 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 godog: roll-restart swift-object-replicator on ms-be2* - T261633
  • 08:19 kormat: running schema change against s8 in eqiad T259831
  • 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 gehel: repooling wdqs2002
  • 07:55 marostegui: Rebuild db2125 from snapshots - T260670
  • 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
  • 07:40 gehel: depooled wdqs2002 to catch up on lag
  • 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
  • 07:23 moritzm: installing pyzmq updates from Buster point release
  • 07:00 dcausse: depooling wdqs2002 (catching-up lag)
  • 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) T242453
  • 06:51 _joe_: enable notifications for wdqs-ssl-codfw
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:05 ejegg: updated fundraising python tools from 5515923ef7 to d4e08c52de
  • 00:31 tgr_: evening deploys done
  • 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (again, forgot to rebase the previous time) (duration: 00m 59s)
  • 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (duration: 00m 57s)
  • 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group0 (T264793) (duration: 00m 58s)

2020-10-07

  • 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: Log when SessionManager is emitting cookies (T264793) (duration: 01m 00s)
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
  • 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
  • 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in T264859. https://en.wikipedia.org/wiki/Inari_Sami | https://iso639-3.sil.org/code/smn |
  • 18:30 ryankemper: search team's backport deploy is complete
  • 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: cloudelastic: envoy sits in front now (T263073) (duration: 00m 58s)
  • 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'`
  • 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
  • 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
  • 18:20 ryankemper: (backport) HEAD set to 834b457 as expected
  • 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - T261260 (duration: 01m 01s)
  • 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680 on deployment staging area and mw2001
  • 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 jgleeson: updated civicrm from 39b4f954ed to b86a15a430
  • 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo T242602
  • 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - T259780
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 (T263986)
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
  • 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 04s)
  • 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
  • 11:22 Urbanecm: EU B&C window done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f85bc30: Enable bot passwords at all fishbowl and private wikis (T258356) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
  • 11:14 urbanecm@deploy1001: sync-file aborted: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6cdeea2: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia (T264161) (duration: 00m 59s)
  • 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 T263443
  • 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups T259831', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
  • 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
  • 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 T264755 ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
  • 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - T264588
  • 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
  • 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl T264700', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
  • 07:14 marostegui: Stop MySQL es2015 for decommissioning T264700
  • 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 02:37 eileen: civicrm revision changed from a30da7f92a to 39b4f954ed, config revision is 0ca9a3a055
  • 01:00 cdanis: repool esams; cr2-esams router upgrade complete
  • 00:43 cdanis: T259621 cdanis@re1.cr2-esams> request chassis routing-engine master switch
  • 00:40 cdanis: T259621 cdanis@re1.cr2-esams> request system reboot other-routing-engine
  • 00:36 cdanis: T259621 cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
  • 00:26 cdanis: T259621 cdanis@re0.cr2-esams> request chassis routing-engine master switch
  • 00:22 cdanis: T259621 cdanis@re0.cr2-esams> request system reboot other-routing-engine
  • 00:15 cdanis: T259621 cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
  • 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs (T252526)

2020-10-06

  • 23:55 mutante: 🖧 switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* (T252526) 🖧
  • 23:53 mutante: 🖧 switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* (T252526) 🖧
  • 23:52 mutante: 🖧 switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* (T252526) 🖧
  • 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
  • 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
  • 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
  • 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
  • 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:40 Urbanecm: Morning B&C done
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: 2118d26: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 00s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: d428ccb: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 03s)
  • 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 T263493 T259622 (duration: 00m 58s)
  • 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 T263493 T259622 (duration: 00m 59s)
  • 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 T264043 (duration: 00m 59s)
  • 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 T264637 (duration: 00m 58s)
  • 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 T264637 (duration: 00m 58s)
  • 15:41 godog: centrallog* delete archived logs from old, single file, organization
  • 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
  • 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - T263789
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - T262946
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
  • 14:36 hnowlan: repooling restbase2009
  • 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
  • 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
  • 14:08 marostegui: Reboot db1076 for kernel upgrade T264755
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 14:03 marostegui: Power cycle db1076 T264755
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
  • 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
  • 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
  • 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - T264157
  • 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 T263443
  • 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
  • 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
  • 12:08 jbond42: deploy puppetlabs-stdlib 5.2
  • 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # T264430 # P12930
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 07c19f9: arbcom_ruwiki: Set AK as alias for NS_PROJECT (T264430) (duration: 00m 58s)
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b1a4fa: ruewiki: Add rollbacker, grantable and revokable by sysops (T264147) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5cc7027: Allow bureaucrats to remove sysop permissions on Commons (T261481) (duration: 00m 58s)
  • 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5f9721b: GrowthExperiments: Change Help Page URL for kowiki (T254364) (duration: 01m 00s)
  • 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
  • 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:48 effie: set mw2279.codfw.wmnet as inactive T264698
  • 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
  • 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
  • 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
  • 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
  • 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
  • 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
  • 09:59 effie: enable puppet on mc20*
  • 09:41 effie: enable puppet on mc10*
  • 09:38 effie: disable puppet on mc*
  • 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
  • 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
  • 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 T263443
  • 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo T264700 T264386
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 T264700 ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
  • 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl T264386', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json

2020-10-05

  • 23:11 ejegg: updated payments staging from 52704ffe24 to db03677b2d
  • 22:27 mutante: removing shinken puppet module and role
  • 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster T264053
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings T264053
  • 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings T264053
  • 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings T264053
  • 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
  • 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
  • 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
  • 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
  • 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: T263133 T264035 (duration: 22m 23s)
  • 14:25 elukey: shutdown an-master1001 for ram expansion
  • 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: T263133 T264035
  • 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:54 elukey: shutdown stat1005 for ram upgrade
  • 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
  • 12:39 moritzm: installing curl security updates on remaining hosts
  • 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" (T264295) (duration: 00m 59s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: be73f15: Move changetags right from users to sysop [trwiki] (T264508) (duration: 00m 59s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cd30b62: wgSkipSkins: Exclude contenttranslation skin from skin options for users (T263093) (duration: 00m 59s)
  • 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 T264398
  • 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 T264398
  • 10:08 moritzm: installing ldap-replica1002 T264390
  • 09:52 moritzm: installing ldap-replica1001 T264390
  • 09:22 moritzm: installing ldap-replica2003 T264390
  • 09:02 hnowlan: bootstrapping restbase1030-b
  • 08:57 moritzm: installing ldap-replica2004 T264390
  • 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
  • 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
  • 08:23 godog: prometheus codfw/ops, add 100G to the LV
  • 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 07:46 marostegui: Stop mysql on es2017 T264386
  • 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 06:52 XioNoX: add static NAT to pfw3-eqiad - T264356
  • 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 T264386 ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json

2020-10-03

  • 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: 840545f: Restrict flow-hide right to autoconfirmed users on zhwiki (T264489) (duration: 01m 17s)
  • 00:08 ejegg: updated fundraising CiviCRM from 256adda03c to a30da7f92a

2020-10-02

  • 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
  • 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:27 effie: enable puppet on mw2271
  • 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
  • 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
  • 17:15 mutante: submitted puppet refactoring change on maps servers
  • 16:49 effie: disable puppet on mw2271 and briefly depool it
  • 15:39 _joe_: restarting redis on rdb2003, instance 6380
  • 15:28 hnowlan: bootstrapping restbase1030-a
  • 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server T261531 4573776bd 2fb4c20ae (duration: 01m 01s)
  • 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
  • 14:08 effie: enable puppet on mwdebug1001
  • 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
  • 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis (T258356)
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
  • 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
  • 12:26 effie: disable puppet on mwdebug1001
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
  • 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
  • 12:05 hnowlan: bootstrapping restbase1029-c
  • 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
  • 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
  • 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
  • 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
  • 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
  • 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255875 T255869
  • 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875 T255869
  • 09:12 jayme: running puppet on lvs servers - T255875 T255869
  • 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 (T264221)
  • 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:07 hnowlan: bootstrapping restbase1029-b cassandra
  • 09:05 hashar: gerrit: running garbage collector
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
  • 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
  • 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
  • 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 08:29 moritzm: installing pyzmq bugfix update from buster point release
  • 08:24 moritzm: installing nginx security updates on puppetdb*
  • 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
  • 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 07:42 moritzm: installing libcommons-compress-java security updates
  • 07:35 godog: swift codfw-prod bump weight for ms-be2057 - T261633
  • 07:29 godog: prometheus codfw/k8s, add 50G to the LV
  • 07:23 moritzm: installing libx11 security updates on buster
  • 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at T264362
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl T264261', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json

2020-10-01

  • 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
  • 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
  • 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at 7ab9a74 for T264365
  • 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well T264363
  • 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at 796693c for T264365
  • 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
  • 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 refs T263177 (duration: 01m 06s)
  • 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11 refs T263177
  • 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train T264257 T263177 (duration: 00m 59s)
  • 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days T264053 (duration: 00m 59s)
  • 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 13m 42s)
  • 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 01m 34s)
  • 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
  • 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:46 volans: migrating esams DNS records to the autogenerated ones from Netbox - T258729
  • 16:19 bblack: rebooting lvs1016 to a fresh state for interface config and error counters, etc - T264227
  • 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously - T264227
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously
  • 14:55 jayme: running ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889 on lvs1015.eqiad.wmnet - T244843 T255878
  • 14:55 moritzm: installing npm security updates on buster
  • 14:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:53 jayme: running ipvsadm -D -t 10.2.1.10:8081; ipvsadm -D -t 10.2.1.47:8889 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:48 jayme: restarting pybal on lvs2010.codfw.wmnet - T244843 T255878
  • 14:42 jayme: running puppet on lvs servers - T244843 T255878
  • 14:35 Urbanecm: Create bot_passwords table at all private wikis (T258356)
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12886 and previous config saved to /var/cache/conftool/dbconfig/20201001-142156-kormat.json
  • 14:14 andrewbogott: reimaging cloudvirt-wdqs1001 to buster
  • 14:12 effie: enable puppet on mw2271
  • 14:08 moritzm: installing pillow security updates
  • 14:06 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12885 and previous config saved to /var/cache/conftool/dbconfig/20201001-140653-kormat.json
  • 13:59 moritzm: installing nginx security updates on schema*
  • 13:51 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12884 and previous config saved to /var/cache/conftool/dbconfig/20201001-135149-kormat.json
  • 13:50 klausman: rebooting an-worker1096 for cluster maintenance
  • 13:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:43 vgutierrez: use synthetic warning for 2% of ECDHE-ECDSA-AES128-SHA pageviews - T258405
  • 13:29 moritzm: restarting mw canaries to pick up curl update
  • 13:22 moritzm: installing curl security updates on stretch
  • 12:57 kormat@cumin1001: dbctl commit (dc=all): 'db2136 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12883 and previous config saved to /var/cache/conftool/dbconfig/20201001-125707-kormat.json
  • 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12882 and previous config saved to /var/cache/conftool/dbconfig/20201001-123925-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12881 and previous config saved to /var/cache/conftool/dbconfig/20201001-122422-kormat.json
  • 12:15 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 00m 59s)
  • 12:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 01m 00s)
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12880 and previous config saved to /var/cache/conftool/dbconfig/20201001-120919-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12879 and previous config saved to /var/cache/conftool/dbconfig/20201001-115415-kormat.json
  • 11:14 arturo: pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 (T263284)
  • 11:09 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=kuwiktionary --fix # T262046
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 58a8c82: kuwiktionary: Create Jinûvesazî namespace (T262046) (duration: 01m 01s)
  • 10:47 kormat@cumin1001: dbctl commit (dc=all): 'db2119 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12878 and previous config saved to /var/cache/conftool/dbconfig/20201001-104716-kormat.json
  • 10:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:55 hnowlan: adding buster host restbase1028-b to cassandra
  • 08:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:38 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P12877 and previous config saved to /var/cache/conftool/dbconfig/20201001-083321-marostegui.json
  • 08:28 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:27 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:22 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 ', diff saved to https://phabricator.wikimedia.org/P12875 and previous config saved to /var/cache/conftool/dbconfig/20201001-081308-marostegui.json
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P12874 and previous config saved to /var/cache/conftool/dbconfig/20201001-071442-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091 ', diff saved to https://phabricator.wikimedia.org/P12873 and previous config saved to /var/cache/conftool/dbconfig/20201001-071413-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12872 and previous config saved to /var/cache/conftool/dbconfig/20201001-071347-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12871 and previous config saved to /var/cache/conftool/dbconfig/20201001-071321-marostegui.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2083', diff saved to https://phabricator.wikimedia.org/P12870 and previous config saved to /var/cache/conftool/dbconfig/20201001-071241-marostegui.json
  • 07:12 elukey: restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2083', diff saved to https://phabricator.wikimedia.org/P12869 and previous config saved to /var/cache/conftool/dbconfig/20201001-071155-marostegui.json
  • 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Make es2033 master of es2 T261717', diff saved to https://phabricator.wikimedia.org/P12867 and previous config saved to /var/cache/conftool/dbconfig/20201001-063104-marostegui.json
  • 06:18 jayme: imported envoyproxy 1.15.1 to buster-wikimedia, stretch-wikimedia - T264157
  • 05:45 marostegui: Stop MySQL on es2011 T264261
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 T264261', diff saved to https://phabricator.wikimedia.org/P12866 and previous config saved to /var/cache/conftool/dbconfig/20201001-054335-marostegui.json
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Deploy schema change on s3 (testwikidatawiki) T264109
  • 05:19 marostegui: Repool labsdb1011
  • 04:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:27 krinkle@deploy1001: Synchronized php-1.36.0-wmf.10/includes/parser/: Ia3357b2f593c (duration: 00m 58s)
  • 01:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: 1721d2aa0 - Reject ParserCache entries from the last wmf.11 deployment (duration: 05m 13s)

2020-09-30

  • 22:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:46 cdanis: depool mw2356 and mw2319
  • 21:45 eileen: civicrm revision changed from 5a53bfe6ed to 256adda03c, config revision is 646817a2c0
  • 21:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 also
  • 21:19 ejegg: updated fundraising CiviCRM from 6e843649ac to 5a53bfe6ed
  • 21:04 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback
  • 21:00 twentyafterfour@deploy1001: scap failed: average error rate on 5/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 20:58 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 20s)
  • 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
  • 20:47 mutante: temp disabling puppet on C:profile::swift::stats_reporter hosts, applying gerrit:631158 refactoring change
  • 20:36 mutante: temp disabling puppet on swift::storage (swift-be) hosts, applying gerrit:631157 refactoring change
  • 19:21 mutante: activating DHCP and squid on install[345]001.wikimedia.org
  • 19:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 19:01 effie: disable puppet on mw2271 and use onhost memcached - T263958
  • 19:00 hoo@deploy1001: Synchronized wmf-config/: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:58 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on svwiki (T257220) (duration: 00m 58s)
  • 18:36 bblack: lvs1016 pybal diff alerts downtimed in icinga for ~48h to reduce annoying flappy alert spam, with reference to https://phabricator.wikimedia.org/T264227
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki (T225027) (duration: 00m 58s)
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put search in header for anons on all wikis, not just desktop-improvements wikis (T263032) (duration: 00m 59s)
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable clientError on Wikidata and all Wikipedias except enwiki (T255585) (duration: 00m 58s)
  • 18:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move search in header for anons (T263032) (duration: 00m 59s)
  • 17:52 bblack: lvs1016: restart pybal
  • 17:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:01 hnowlan: finished adding restbase2018-a to the cassandra cluster
  • 16:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 cicalese@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Add beta config for API Portal/OAuth communications (duration: 00m 58s)
  • 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:21 mutante: re-enabled puppet on install2003
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:28 moritzm: removed librsvg 2.40.20-3+wmf1+stretch1 from component/thumbor, superseded by 2.40.21-0+deb9u1 released via stretch-security
  • 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 cmjohnson1: powering down ores100[3-9 to upgrade memory in each T259909
  • 14:05 elukey: create thirdparty/amd-rocm33 for stretch-wikimedia
  • 14:03 cmjohnson1: powering down ores1002 to upgrade memory T259909
  • 13:55 cmjohnson1: powering down ores1001 to upgrade memory T259909
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 hnowlan: started bootstrapping restbase1028-a, first buster restbase host
  • 12:39 marostegui: Deploy schema change on db2080, db2081 T264109
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P12858 and previous config saved to /var/cache/conftool/dbconfig/20200930-123851-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P12857 and previous config saved to /var/cache/conftool/dbconfig/20200930-123824-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P12856 and previous config saved to /var/cache/conftool/dbconfig/20200930-123753-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080', diff saved to https://phabricator.wikimedia.org/P12855 and previous config saved to /var/cache/conftool/dbconfig/20200930-123659-marostegui.json
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 effie: enable puppet P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 11:21 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Enable Special:TranslationStats (T263004) (duration: 00m 59s)
  • 11:06 effie: disable puppet on P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 10:57 moritzm: installing librsvg security updates
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:21 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:07 kormat: deploying schema change to s4/eqiad T259831
  • 10:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:50 jayme: imported envoyproxy 1.15.1 to buster-wikimedia component/envoy-future - T264157
  • 09:12 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:45 kormat: deploying schema change to s7/eqiad T259831
  • 08:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2016 from dbctl T264156', diff saved to https://phabricator.wikimedia.org/P12853 and previous config saved to /var/cache/conftool/dbconfig/20200930-080817-marostegui.json
  • 08:06 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:00 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 07:56 akosiaris: upgrade termbox to latest chart, fixing various prometheus-statsd-export configuration minor issues.
  • 07:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 07:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1131 on s6 eqiad master T263227, also give weight to db1093 as new API host', diff saved to https://phabricator.wikimedia.org/P12852 and previous config saved to /var/cache/conftool/dbconfig/20200930-074417-marostegui.json
  • 07:41 marostegui: Starting s6 eqiad failover from db1093 to db1131 - T263227
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T263227', diff saved to https://phabricator.wikimedia.org/P12851 and previous config saved to /var/cache/conftool/dbconfig/20200930-071841-marostegui.json
  • 07:05 marostegui: Stop mysql on es2016 before decommissioning T264156
  • 07:01 elukey@deploy1001: Finished deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 (duration: 00m 49s)
  • 07:00 elukey@deploy1001: Started deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2016 T264156', diff saved to https://phabricator.wikimedia.org/P12850 and previous config saved to /var/cache/conftool/dbconfig/20200930-065838-marostegui.json
  • 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:19 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2082', diff saved to https://phabricator.wikimedia.org/P12849 and previous config saved to /var/cache/conftool/dbconfig/20200930-061036-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P12848 and previous config saved to /var/cache/conftool/dbconfig/20200930-061005-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12847 and previous config saved to /var/cache/conftool/dbconfig/20200930-060754-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12846 and previous config saved to /var/cache/conftool/dbconfig/20200930-060705-marostegui.json
  • 05:43 marostegui: Remove es2019 from tendril and zarcillo T264063
  • 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Reduce busy-time from 3600 to 1800 on labsdb1010
  • 02:30 eileen: process-control config revision is 646817a2c0
  • 00:41 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/: Backport: Ensure variant A homepage sidebar is always at least 300px (T263905) (duration: 01m 01s)

2020-09-29

  • 23:35 mutante: created testvm3001.esams.wmnet to test install3001
  • 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Echo app push on all Wikipedias (T262936) (duration: 00m 59s)
  • 23:20 Urbanecm: Evening B&C window completed
  • 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 68d7af9: Enable watchlist expiry feature (wikisource; T260461) (duration: 00m 58s)
  • 23:18 eileen: process-control config revision is 8b39770e93
  • 23:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bc6dda2: Enable watchlist expiry feature (T260461) (duration: 00m 58s)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:52 eileen: process-control config revision is 16a6dcafd6
  • 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 eileen: civicrm revision changed from 035ad1c351 to 06a5289d1a, config revision is 2622fd2c09
  • 22:45 eileen: process-control config revision is 2622fd2c09 jobs disabled
  • 22:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:26 mutante: phab1001 - re-enabled puppet and running it
  • 22:24 ejegg: CiviCRM rolled back from 4aa0aeccd1 to 035ad1c351
  • 22:16 eileen: civicrm revision changed from 035ad1c351 to 4aa0aeccd1, config revision is b9120969bf
  • 21:59 mutante: temp. disabled puppet on phab1001
  • 21:49 mutante: restarted aphlict service on aphlict1001
  • 21:47 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 13m 45s)
  • 21:34 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 21:30 mutante: started DHCP service on install2003 again
  • 21:22 mutante: temp stopping DHCP service on install2003 for a test
  • 21:09 mutante: rebooting testvm5001 for install test after switching DHCP/TFTP in eqsin to new dedicated VM
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 cdanis@cumin1001: dbctl commit (dc=all): 'depool db2125', diff saved to https://phabricator.wikimedia.org/P12843 and previous config saved to /var/cache/conftool/dbconfig/20200929-205453-cdanis.json
  • 20:51 mutante: DHCP server for EQSIN switched from bast5001 to install5001 (T252526)
  • 20:45 twentyafterfour@deploy1001: Finished scap: testwikis to 1.36.0-wmf.11 refs T263177 (duration: 69m 57s)
  • 19:44 andrewbogott: apt-get update && apt-get upgrade on wikitech-static
  • 19:40 mutante: temp. disabling puppet on ms-fe (swift-proxy) hosts, applying puppet refactoring change carefully
  • 19:35 twentyafterfour@deploy1001: Started scap: testwikis to 1.36.0-wmf.11 refs T263177
  • 19:29 twentyafterfour: Checked out mediawiki 1.36.0-wmf.11 on deploy1001 see T263177
  • 17:30 hnowlan: ported cassandra-tools-wmf to wikimedia-buster
  • 17:12 jbond42: update libdbi-perl on dbmonitor1001 and helium
  • 17:02 jbond42: re-enable puppet to post deploy puppetdb change
  • 16:57 jbond42: disable puppet to deploy puppetdb change
  • 16:34 chaomodus: deploying eqsin automated DNS
  • 15:51 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:39 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:23 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:00 vgutierrez: restarting acme-chief on acmechief1001
  • 14:48 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:41 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:32 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 14:30 bblack: switching eqsin and esams public-facing unified certs to letsencrypt - https://gerrit.wikimedia.org/r/c/operations/puppet/+/630847
  • 14:06 moritzm: installing facter updates from Buster 10.6 point release
  • 13:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:49 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2126 from dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12841 and previous config saved to /var/cache/conftool/dbconfig/20200929-134926-kormat.json
  • 13:47 ema: text@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12840 and previous config saved to /var/cache/conftool/dbconfig/20200929-134018-kormat.json
  • 13:36 ema: upload@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:28 moritzm: installing lua5.3 security updates
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12839 and previous config saved to /var/cache/conftool/dbconfig/20200929-132515-kormat.json
  • 13:10 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12838 and previous config saved to /var/cache/conftool/dbconfig/20200929-131011-kormat.json
  • 12:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 12:55 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12837 and previous config saved to /var/cache/conftool/dbconfig/20200929-125508-kormat.json
  • 12:53 moritzm: installing QT security updates
  • 12:29 kormat@cumin1001: dbctl commit (dc=all): 'db2108 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12836 and previous config saved to /var/cache/conftool/dbconfig/20200929-122914-kormat.json
  • 12:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:28 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db2126 to dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12835 and previous config saved to /var/cache/conftool/dbconfig/20200929-122811-kormat.json
  • 12:05 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:54 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:28 vgutierrez: disabling DHE-RSA-AES128-SHA support - T258405
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12834 and previous config saved to /var/cache/conftool/dbconfig/20200929-111804-root.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12833 and previous config saved to /var/cache/conftool/dbconfig/20200929-110300-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12832 and previous config saved to /var/cache/conftool/dbconfig/20200929-104757-root.json
  • 10:42 XioNoX: re-enable TFTP ALGs on all mr
  • 10:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 moritzm: installing libdbi-perl security updates for stretch/buster
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12831 and previous config saved to /var/cache/conftool/dbconfig/20200929-103253-root.json
  • 10:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1104 on s8 eqiad master T239238', diff saved to https://phabricator.wikimedia.org/P12830 and previous config saved to /var/cache/conftool/dbconfig/20200929-100723-kormat.json
  • 10:05 kormat: Starting s8 eqiad failover from db1109 to db1104 - T239238
  • 10:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 kormat@cumin1001: dbctl commit (dc=all): 'Set db1104 with weight 0 T239238', diff saved to https://phabricator.wikimedia.org/P12829 and previous config saved to /var/cache/conftool/dbconfig/20200929-095135-kormat.json
  • 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:17 marostegui: Depool labsdb1010 from web role
  • 09:08 jbond42: update rails on puppetmasters
  • 08:21 jayme: switching esams pybal back to conf1006 - T196487
  • 08:01 ema: cp3050: varnish upgrade to 6.0.6-1wm1 T263557
  • 07:55 gehel: badblocks check on wdqs1009 - T263125
  • 07:46 marostegui: Stop MySQL on es2019 before decommissioning T264063
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2019 from dbctl T264063', diff saved to https://phabricator.wikimedia.org/P12825 and previous config saved to /var/cache/conftool/dbconfig/20200929-074602-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2019 T264063', diff saved to https://phabricator.wikimedia.org/P12824 and previous config saved to /var/cache/conftool/dbconfig/20200929-060538-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 as es3 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12823 and previous config saved to /var/cache/conftool/dbconfig/20200929-060253-marostegui.json
  • 05:13 marostegui: Stop mysql and reboot es2026 - T263837
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 T263837', diff saved to https://phabricator.wikimedia.org/P12822 and previous config saved to /var/cache/conftool/dbconfig/20200929-051236-marostegui.json
  • 05:10 marostegui: Remove es2013 from tendril and zarcillo T263740
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:32 tgr_: B&C done
  • 00:31 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: Backport: Add (and increment) CacheDecorator cache version ([PHABRICATOR-TASK]) (duration: 00m 58s)
  • 00:09 mutante: TFTP/install server for eqsin switched from bast5001 to install5001 - T252526

2020-09-28

  • 23:56 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T264053: Remove commonswiki from sidebar search (duration: 01m 09s)
  • 23:42 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/ConfigurationLoader/PageConfigurationLoader.php: Backport: Properly handle namespaces in tasktype template configuration (T264029) (duration: 01m 03s)
  • 22:27 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:25 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:24 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:51 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:10 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:12 ejegg: updated staging payments-wiki from 43470629cc to 885d87a905
  • 18:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:15 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:15 Urbanecm: Morning B&C done
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c7e08bc: Enable search in header A/B test for logged in users (T263032) (duration: 00m 58s)
  • 17:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 ejegg: updated payment-wiki from b2eb456ed1 to 2083498811
  • 16:34 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 16:20 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:08 hnowlan: reimaging new restbase hosts - restbase1028, restbase1029, restbase1030
  • 16:08 XioNoX: push pfw policies - T264013
  • 15:51 papaul: poweroff elastic2037 for DIMM replacing
  • 15:26 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1114 T196487', diff saved to https://phabricator.wikimedia.org/P12818 and previous config saved to /var/cache/conftool/dbconfig/20200928-152635-kormat.json
  • 15:25 hashar: Restarting CI Jenkins for plugins uninstallation T260565
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 moritzm: installing glib-networking security updates
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:40 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1006.eqiad.wmnet
  • 14:33 XioNoX: repool eqiad
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:05 moritzm: uploaded libdbi-perl 1.631-3+wmf1 for jessie-wikimedia T259102
  • 13:58 XioNoX: asw2-d-eqiad# run request system power-off member 4
  • 13:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1006.eqiad.wmnet
  • 13:45 XioNoX: downtiming all eqiad row D hosts - T196487
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:38 godog: roll restart object-replicator on ms-be2* for higher concurrency - T261633
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:20 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:19 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 13:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:57 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:37 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:31 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:29 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript resetUserEmail.php --wiki=arbcom_ruwiki 'Adamant.pwn' 'adamant.pwn@hotmail.com' # T262812
  • 12:28 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript createAndPromote.php --wiki=arbcom_ruwiki --bureaucrat --sysop 'Adamant.pwn' <PASSWORD REDACTED> # T262812
  • 12:26 Urbanecm: arbcom_ruwiki is created (T262812)
  • 12:26 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 48s)
  • 12:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:23 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arbcom_ruwiki (T262812)
  • 12:20 urbanecm@deploy1001: Synchronized dblists: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:19 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:17 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:59 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: prep for rack switch upgrade T196487', diff saved to https://phabricator.wikimedia.org/P12815 and previous config saved to /var/cache/conftool/dbconfig/20200928-115904-kormat.json
  • 11:43 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 56s)
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 61eac95: Creation of patroller group on arz.wikipedia (T262218) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 57s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:33 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:25 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 09:48 ema: upload@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:29 ema: text@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:17 _joe_: changing the restbase public TLS certs to include restbase-async.discovery.wmnet
  • 09:17 XioNoX: restart bird on dns2001 - T262372
  • 09:15 jynus: restart db1077 for upgrade and cleanup T187984
  • 09:06 XioNoX: restart bird on centrallog2001 - T262372
  • 09:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:00 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:56 dcausse: T263970: recovering lost apifeature indices (copying eqiad indices -> codfw)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:46 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:37 elukey: decommission the hadoop test cluster (analytics1028->41)
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:36 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:34 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:32 ema: text@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:28 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12813 and previous config saved to /var/cache/conftool/dbconfig/20200928-082825-kormat.json
  • 08:21 ema: upload@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:21 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2113 from contributions/logpager/recentchanges*/watchlist T263842', diff saved to https://phabricator.wikimedia.org/P12812 and previous config saved to /var/cache/conftool/dbconfig/20200928-082114-kormat.json
  • 08:13 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12811 and previous config saved to /var/cache/conftool/dbconfig/20200928-081321-kormat.json
  • 08:07 jayme: restarting pybal on lvs3005 for switching to conf1005 - T196487
  • 08:06 jayme: restarting pybal on lvs3006 for switching to conf1005 - T196487
  • 08:02 jayme: restarting pybal on lvs3007 for switching to conf1005 - T196487
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 07:58 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12810 and previous config saved to /var/cache/conftool/dbconfig/20200928-075817-kormat.json
  • 07:54 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 07:43 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12809 and previous config saved to /var/cache/conftool/dbconfig/20200928-074313-kormat.json
  • 07:29 _joe_: restarting pybal on the LVS primaries
  • 07:24 dcausse: T263970: forcing allocation of enwiki_general_1587198756 (chi@eqiad)
  • 07:18 _joe_: restarting pybal on the backup LVS in eqiad, codfw to pick up the new wikifeeds endpoint
  • 07:17 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
  • 07:09 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2028 as es1 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12806 and previous config saved to /var/cache/conftool/dbconfig/20200928-065938-marostegui.json
  • 06:15 marostegui: Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) T263443
  • 05:55 marostegui: Stop MySQL on es2013 before decommissioning it T263740
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2013 from dbctl T263740', diff saved to https://phabricator.wikimedia.org/P12805 and previous config saved to /var/cache/conftool/dbconfig/20200928-055410-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013 T263740', diff saved to https://phabricator.wikimedia.org/P12804 and previous config saved to /var/cache/conftool/dbconfig/20200928-054846-marostegui.json
  • 05:22 marostegui: Decrease labsdb1011 weight

2020-09-27

  • 06:36 elukey: powercycle analytics1048

2020-09-26

  • 19:20 chrisalbon: sudo service uwsgi-ores restart
  • 02:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
  • 01:56 cdanis: ❌cdanis@cumin2001.codfw.wmnet ~ 🕙🍺 sudo cumin 'A:ores and A:codfw' 'systemctl restart celery-ores-worker.service uwsgi-ores.service '
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ 🕤🍺 sudo systemctl restart uwsgi-ores.service
  • 01:11 cdanis: ✔️ cdanis@ores2001.codfw.wmnet ~ 🕘🍺 sudo systemctl restart celery-ores-worker.service
  • 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2020-09-25

  • 23:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables (duration: 26m 57s)
  • 22:36 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables
  • 22:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity (duration: 10m 42s)
  • food: updated fundraising CiviCRM from eb90dbcfd3 to 035ad1c351
  • 22:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity
  • 21:23 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment (duration: 11m 33s)
  • 21:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment
  • 20:26 effie: installing memcached 1.4.33-1+deb9u1 on mwdebug1001
  • 19:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1 (duration: 53m 58s)
  • 18:40 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1
  • 17:47 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/MobileFrontend/: Backport: Make all section `collapsible` during server side rendering (T263832) (duration: 00m 59s)
  • 17:37 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3 (duration: 02m 01s)
  • 17:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3
  • 16:35 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import (duration: 01m 10s)
  • 16:34 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import
  • 16:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Promote 1.35.0 to stable in extensiondistributor (duration: 00m 57s)
  • 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:23 jynus: fixing enwikivoyage ipblocks inconsistency cluster-wide T263842
  • 14:54 elukey: install linux-image-4.19-amd64 on an-worker1096 + reboot
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Add db2113 to various groups T263842', diff saved to https://phabricator.wikimedia.org/P12797 and previous config saved to /var/cache/conftool/dbconfig/20200925-121332-kormat.json
  • 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:23 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:10 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 10:42 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:28 moritzm: reimaging sretest1002 to validate puppetised sources.list with a new installation T158562
  • 09:58 moritzm: restarting archiva to pick up Java security update
  • 09:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 ema: upload@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 ema: text@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 06:50 elukey: shutdown ganeti5002 (mistakenly powercycled it without seeing T261130)
  • 06:40 elukey: powercycle ganeti5002 (no instances running on it, mgmt console shows no tty usable)
  • 06:34 elukey: reboot stat1004 to pick up kernel settings
  • 03:10 ejegg: updated payments-wiki from f89c594e12 to b2eb456ed1
  • 02:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: new codfw, T263798 (duration: 09m 05s)
  • 02:27 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 00m 07s)
  • 02:27 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 02:20 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: new codfw, T263798
  • 02:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798 (duration: 06m 09s)
  • 02:14 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798

2020-09-24

  • 23:39 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 01m 58s)
  • 23:37 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 21:40 mutante: mw1349 - systemctl reset-failed
  • 21:03 cdanis: reprepro: add backported ipvsadm 1:1.31-1+deb10u1 to buster-wikimedia
  • 21:00 andrew@deploy1001: Finished deploy [horizon/deploy@404e205]: (no justification provided) (duration: 01m 05s)
  • 20:59 andrew@deploy1001: Started deploy [horizon/deploy@404e205]: (no justification provided)
  • 20:41 andrew@deploy1001: Finished deploy [horizon/deploy@24368a5]: (no justification provided) (duration: 02m 10s)
  • 20:39 andrew@deploy1001: Started deploy [horizon/deploy@24368a5]: (no justification provided)
  • 20:35 andrew@deploy1001: Finished deploy [horizon/deploy@85125d1]: (no justification provided) (duration: 00m 52s)
  • 20:34 andrew@deploy1001: Started deploy [horizon/deploy@85125d1]: (no justification provided)
  • 19:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 19:47 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: cloudelastic: envoy sits in front now (duration: 00m 59s)
  • 19:41 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 00m 36s)
  • 19:41 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:39 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 01m 08s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:30 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: dev (duration: 00m 44s)
  • 19:29 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: dev
  • 19:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.10
  • 19:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bcf9fcb: Enable mobile block notice tracking in MobileFrontend (T260218) (duration: 01m 04s)
  • 18:58 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate on itwiki and svwiki (T262436) (duration: 01m 05s)
  • 18:01 mutante: temp. disabled puppet on install4001/install5001 - applying install_server role to new servers, starting with install3001
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:24 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:21 jbond42: enable puppet fleet wide post update puppetdb postgres logging
  • 17:19 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 jbond42: disable puppet fleet wide to update puppetdb postgres loggin
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:09 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:04 mutante: syncing facts to puppet compiler hosts
  • 17:01 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:26 robh: properly pooled mw1360 this time T262151
  • 16:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:04 XioNoX: pfw3-eqiad> restart security-log gracefully
  • 15:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/AbuseFilter/includes/Hooks/AbuseFilterHookRunner.php: 5e88c36: HookRunner: onAbuseFilterGenerateUserVars should run generateUserVars (T263750) (duration: 01m 06s)
  • 15:46 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=simplewiki --username="Oversight~simplewiki"` (T263760)
  • 15:44 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=enwiki --username=Oversight` (T263760)
  • 15:43 Urbanecm: Rename all local Oversight accounts but enwiki to Oversight~dbname, see task for full list (T263760)
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12794 and previous config saved to /var/cache/conftool/dbconfig/20200924-152626-root.json
  • 15:15 robh: mw1360 scap and repooled post work via T262151
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 66%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12793 and previous config saved to /var/cache/conftool/dbconfig/20200924-151120-root.json
  • 15:10 jayme: switched zotero service-proxy listener to use TLS - T255869
  • 15:00 XioNoX: repool eqiad - T256112
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 33%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12792 and previous config saved to /var/cache/conftool/dbconfig/20200924-145617-root.json
  • 14:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:28 XioNoX: [Netops] In window: turn VC-ports on/off for proper cabling: - T256112
  • 14:19 XioNoX: remove damping on anycast group for cr2-codfw
  • 14:18 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255869
  • 14:16 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255869
  • 14:16 XioNoX: [Netops] Disable unused VC ports to not risk them going online at connect: - T256112
  • 14:09 jayme: running puppet on lvs servers - T255869
  • 14:09 cmjohnson1: removing the cable connected to FPC1:1/0 (DAC 3m) FPC8:1/0 (DAC 3m)
  • 13:58 moritzm: upgrading mariadb on cloudcontrol-2001/2003/2004
  • 13:52 XioNoX: depool eqiad for row D recabling - T256112
  • 13:32 ottomata: Increased retention time for *.mediawiki.job.processMediaModeration topics in kafka main-eqiad and main-codfw to 31 days (as per request from Pchelolo )
  • 13:22 elukey: moved the hadoop cluster to puppet TLS certificates - T253957
  • 13:17 XioNoX: add damping to anycast BGP - T262372
  • 12:58 jayme: switched mathoid service-proxy listener to use TLS - T255875
  • 12:50 moritzm: upgrading bird on centtrallog1001
  • 12:43 gehel: restarting wdqs-categories on wdqs1009
  • 12:43 moritzm: installing netty-3.9 security updates
  • 12:42 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 12:30 ema: upload@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:29 godog: swift codfw-prod: rebalance only, no weight change
  • 12:27 kormat: powering off db2125 for maintenance T260670
  • 12:25 moritzm: installing xorg-server security updates
  • 12:09 ema: text@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:02 ema: cp4022: upgrade varnish to 6.0.6-1wm1 T263557
  • 11:40 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/Translate/tag/TPSection.php: fa4900e: Fix validation of translation unit section names (T263546) (duration: 01m 07s)
  • 11:25 jbond42: re-enable puppet fleet wide
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fdab74c: Enable ContentTranslation in Bashkir, Urdu and Welsh WPs as a default tool (T258504; T260022; T260024) (duration: 01m 05s)
  • 11:21 jbond42: disable puppet fleet wide to reduce log level on puppetdb
  • 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 90c7291: Move DiscussionTools out of beta on arwiki, cswiki, huwiki (T249394); d8553f3: Simplify DiscussionTools config (duration: 01m 11s)
  • 11:06 moritzm: installing imagemagick security updates on stretch
  • 11:02 jbond42: re-enable puppet fleet wide
  • 10:51 jbond42: disable puppet fleet wide to deploy a puppetmaster change
  • 10:49 moritzm: installing libproxy security updates
  • 10:23 volans: uploaded python3-wmflib_0.0.2 to apt.wikimedia.org buster-wikimedia
  • 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12789 and previous config saved to /var/cache/conftool/dbconfig/20200924-102025-kormat.json
  • 10:05 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12788 and previous config saved to /var/cache/conftool/dbconfig/20200924-100521-kormat.json
  • 10:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12787 and previous config saved to /var/cache/conftool/dbconfig/20200924-095018-kormat.json
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:48 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875
  • 09:46 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875
  • 09:43 jayme: running puppet on lvs servers - T255875
  • 09:35 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12786 and previous config saved to /var/cache/conftool/dbconfig/20200924-093514-kormat.json
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:20 ema: cp4021: repool with varnish 6.0.6-1wm1 T263557
  • 09:19 ema: cp4021: redepool with varnish to 6.0.6-1wm1 T263557
  • 09:14 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12785 and previous config saved to /var/cache/conftool/dbconfig/20200924-091445-kormat.json
  • 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 ema: cp4021: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 09:05 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12784 and previous config saved to /var/cache/conftool/dbconfig/20200924-082443-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12783 and previous config saved to /var/cache/conftool/dbconfig/20200924-082319-root.json
  • 08:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 08:15 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:15 XioNoX: configure vrrp_master_pinning in codfw - T263212
  • 08:10 moritzm: installing mariadb-10.1/mariadb-10.3 updates (packaged version from Debian, not the wmf-mariadb variants we used for mysqld)
  • 08:09 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:08 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 66%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12782 and previous config saved to /var/cache/conftool/dbconfig/20200924-080816-root.json
  • 07:58 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:57 marostegui: Remove es2018 from tendril and zarcillo T263613
  • 07:57 XioNoX: configure vrrp_master_pinning in eqiad - T263212
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 33%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12781 and previous config saved to /var/cache/conftool/dbconfig/20200924-075312-root.json
  • 07:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:49 godog: roll restart logstash codfw, gc death
  • 07:25 XioNoX: push pfw policies - T263674
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Place db2073 into vslow, not api in s4', diff saved to https://phabricator.wikimedia.org/P12780 and previous config saved to /var/cache/conftool/dbconfig/20200924-064018-marostegui.json
  • 06:22 elukey: powercycle elastic2037 (host stuck, no mgmt serial console working, DIMM errors in racadm getsel)
  • 05:57 marostegui: Remove es2012 from tendril and zarcillo T263613
  • 05:41 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2012 and es2018 from dbctl - T263615 T263613', diff saved to https://phabricator.wikimedia.org/P12778 and previous config saved to /var/cache/conftool/dbconfig/20200924-053001-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12777 and previous config saved to /var/cache/conftool/dbconfig/20200924-052207-marostegui.json
  • 01:25 ryankemper: Root cause of sigkill of `elasticsearch_5@production-logstash-eqiad.service` appears to be OOMKill of the java process: `Killed process 1775 (java) total-vm:8016136kB, anon-rss:4888232kB, file-rss:0kB, shmem-rss:0kB`. Service appears to have restarted itself and is healthy again
  • 01:21 ryankemper: Observed that `elasticsearch_5@production-logstash-eqiad.service` is in a `failed` state since `Thu 2020-09-24 00:53:53 UTC`; appears the process received a SIGKILL - not sure why
  • 01:19 ryankemper: Getting `connection refused` when trying to `curl -X GET 'http://localhost:9200/_cluster/health'` on `logstash1009`
  • 01:16 ryankemper: (after) `{"cluster_name":"production-elk7-codfw","status":"green","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":868,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:16 ryankemper: Ran `curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'`, cluster status is green again
  • 01:15 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:14 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0

2020-09-23

  • 23:52 mutante: alert1001 - systemctl restar ircecho because icinga-wm left the chat
  • 23:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cbd77e3: Add new Racine namespace to frwiktionary (T263525) (duration: 01m 05s)
  • 23:44 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:40 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 22382a9: remove wtp2005 from wgLinterSubmitterWhitelist (T257903) (duration: 01m 04s)
  • 23:14 eileen: civicrm revision changed from 32a82aa1b7 to eb90dbcfd3, config revision is 2a55766237
  • 23:13 eileen: civicrm revision is 32a82aa1b7, config revision is 2a55766237
  • 23:10 mutante: ganeti5003 - rebooting install5001 - OS install on 3001/4001/5001 T263684
  • 23:04 mutante: ganeti4003 - rebooting install4001
  • 22:51 mutante: ganeti5003 - rebooting install5001
  • 22:27 mutante: ganeti5003 - gnt-instance start install5001
  • 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:38 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:30 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.10 (duration: 01m 04s)
  • 21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.10
  • 21:24 dancy@deploy1001: Finished scap: (no justification provided) (duration: 42m 52s)
  • 21:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:06 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:57 mepps: updated payments-wiki from 7bb99ce03a to f89c594e12
  • 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 20:42 dancy: dancy@deploy1001 Started scap: Deploying fixes for T263601 and T263675 to 1.36.0-wmf.10
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:41 dancy@deploy1001: Started scap: (no justification provided)
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:36 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:36 eileen: civicrm revision changed from a789afd79b to 32a82aa1b7, config revision is 2a55766237
  • 20:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:30 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 20:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:27 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:22 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:18 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:08 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:06 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:02 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:42 robh: ganeti5002 firmware update before hw testing via T261130
  • 18:57 ryankemper: (Above deploy complete)
  • 18:54 ryankemper: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'` from `ryankemper@deploy1001:/srv/mediawiki-staging`
  • 18:47 ryankemper: Above deploy appears successful, test requests seem to be taking 40ms instead of the previous 140ms
  • 18:31 ryankemper: HEAD of `/srv/mediawiki-staging` is now at 7a96d63 as expected
  • 18:13 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:13 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:12 Urbanecm: urbanecm@deploy1001: scap sync-file wmf-config/InitialiseSettings.php 'b1554f36be68106c9364f4aa2fd70d759ad74356: Set $wgCategoryCollation = uca-tr on trwikiquote (T263628)'
  • 18:11 Urbanecm: Logmsgbot seems to be down
  • 17:29 robh: migrating ganeti instances off ganeti5002 for troubleshooting per T261130
  • 16:37 sukhe: upload dnsdist_1.4.0-1~deb10u2 to apt.wm.o (buster) - T252132
  • 16:00 herron: switching icinga over from icinga1001 to alert1001 T247966
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2088:3312 from api now that db2104/db2126 are done T259831', diff saved to https://phabricator.wikimedia.org/P12775 and previous config saved to /var/cache/conftool/dbconfig/20200923-160010-kormat.json
  • 15:58 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12774 and previous config saved to /var/cache/conftool/dbconfig/20200923-155819-kormat.json
  • 15:57 robh: updating firmware on mw1360, troubleshooting nic failure issue T262151
  • 15:57 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialBlock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 07s)
  • 15:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialUnblock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 08s)
  • 15:53 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:51 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:48 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:45 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:44 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:43 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12773 and previous config saved to /var/cache/conftool/dbconfig/20200923-154315-kormat.json
  • 15:40 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:37 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:28 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12772 and previous config saved to /var/cache/conftool/dbconfig/20200923-152812-kormat.json
  • 15:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:13 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12771 and previous config saved to /var/cache/conftool/dbconfig/20200923-151308-kormat.json
  • 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 kormat@cumin1001: dbctl commit (dc=all): 'db2126 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12770 and previous config saved to /var/cache/conftool/dbconfig/20200923-144441-kormat.json
  • 14:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 herron: grew prometheus1004 prometheus-ops filesystem to 1.6T
  • 14:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable repo config propagateChangeVisibility everywhere, 2/2 (duration: 01m 06s)
  • 14:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable repo config propagateChangeVisibility everywhere, 1/2 (duration: 01m 06s)
  • 13:50 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12769 and previous config saved to /var/cache/conftool/dbconfig/20200923-135028-kormat.json
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12768 and previous config saved to /var/cache/conftool/dbconfig/20200923-133525-kormat.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 100%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12766 and previous config saved to /var/cache/conftool/dbconfig/20200923-132918-root.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12765 and previous config saved to /var/cache/conftool/dbconfig/20200923-132022-kormat.json
  • 13:20 moritzm: installing ruby-json security updates
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 75%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12764 and previous config saved to /var/cache/conftool/dbconfig/20200923-131414-root.json
  • 13:05 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12763 and previous config saved to /var/cache/conftool/dbconfig/20200923-130518-kormat.json
  • 13:04 moritzm: installing multipath-tools bugfix updates from buster 10.5 point release
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12762 and previous config saved to /var/cache/conftool/dbconfig/20200923-125911-root.json
  • 12:49 moritzm: installing libunwind bugfix updates from buster 10.5 point release
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2104 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12761 and previous config saved to /var/cache/conftool/dbconfig/20200923-123922-kormat.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P12760 and previous config saved to /var/cache/conftool/dbconfig/20200923-123806-marostegui.json
  • 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Add db2088:3312 to api while db2104 gets depooled T259831', diff saved to https://phabricator.wikimedia.org/P12759 and previous config saved to /var/cache/conftool/dbconfig/20200923-123649-kormat.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly db2074 ', diff saved to https://phabricator.wikimedia.org/P12758 and previous config saved to /var/cache/conftool/dbconfig/20200923-123528-root.json
  • 12:22 ema: cp4027: repool with varnish 6.0.6-1wm1 T263557
  • 12:09 ema: cp4027: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 11:52 moritzm: installing GNUTLS bugfix updates from buster 10.5 point release
  • 11:51 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.GrowthTasksApi.js: 73b5ce8: Fix GrowthTasksApi lazy-loading flags for pages with no views (T263611) (duration: 01m 05s)
  • 11:49 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEdit.js: 1ab31a9: Mark pageviews as not used in the mobile postedit notice (T263611) (duration: 01m 06s)
  • 11:38 Urbanecm: Revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629188 and fetch to deploy1001 to unblock EU B&C deployment (T237467; cc twentyafterfour)
  • 11:27 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12756 and previous config saved to /var/cache/conftool/dbconfig/20200923-112712-kormat.json
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12755 and previous config saved to /var/cache/conftool/dbconfig/20200923-111209-kormat.json
  • 11:08 Urbanecm: Create ContentTranslation tables at testwiki using SQL files from `/srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql` (T263417
  • 10:57 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12754 and previous config saved to /var/cache/conftool/dbconfig/20200923-105705-kormat.json
  • 10:42 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12753 and previous config saved to /var/cache/conftool/dbconfig/20200923-104202-kormat.json
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12752 and previous config saved to /var/cache/conftool/dbconfig/20200923-102120-kormat.json
  • 10:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12751 and previous config saved to /var/cache/conftool/dbconfig/20200923-100156-marostegui.json
  • 10:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Configure entityDataCachePaths for Wikibase (duration: 01m 05s)
  • 09:59 elukey: update puppet compiler's facts
  • 09:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 2/2 (production no-op) (duration: 01m 04s)
  • 09:55 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 1/2 (duration: 01m 16s)
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12750 and previous config saved to /var/cache/conftool/dbconfig/20200923-094511-marostegui.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12748 and previous config saved to /var/cache/conftool/dbconfig/20200923-083200-marostegui.json
  • 08:29 moritzm: installing dbus security updates on buster
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12747 and previous config saved to /var/cache/conftool/dbconfig/20200923-080651-marostegui.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12746 and previous config saved to /var/cache/conftool/dbconfig/20200923-071129-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 to re-add change_revision_id index T262856', diff saved to https://phabricator.wikimedia.org/P12745 and previous config saved to /var/cache/conftool/dbconfig/20200923-070926-marostegui.json
  • 06:34 marostegui: Stop MySQL on es2012 and es2018 T263613 T263615
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2018 T263615', diff saved to https://phabricator.wikimedia.org/P12744 and previous config saved to /var/cache/conftool/dbconfig/20200923-063140-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.org/P12743 and previous config saved to /var/cache/conftool/dbconfig/20200923-060812-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index removal T262856', diff saved to https://phabricator.wikimedia.org/P12742 and previous config saved to /var/cache/conftool/dbconfig/20200923-055850-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 T262856', diff saved to https://phabricator.wikimedia.org/P12741 and previous config saved to /var/cache/conftool/dbconfig/20200923-055531-marostegui.json
  • 05:37 marostegui: Purge global_status_log table on tendril - T252331
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:03 marostegui: Remove triggers from db2094:3313 for MCR schema change T238966
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12739 and previous config saved to /var/cache/conftool/dbconfig/20200923-050234-marostegui.json
  • 04:25 eileen: civicrm revision changed from 8f32b6301f to a789afd79b, config revision is 9933605187

2020-09-22

  • 23:27 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: clientError: enable on ja,es,de,ru,it,zh,pt wikipedias (T255585) (duration: 01m 04s)
  • 23:24 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry feature (T261249) (duration: 01m 06s)
  • 21:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:46 ebernhardson: T259539 enabled adaptive replica selection on elasticsearch at search.svc.eqiad.wmnet:9[246]43
  • 20:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:43 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.10
  • 20:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 42m 21s)
  • 20:30 mutante: gerrit2001 (gerrit-replica) restarting gerrit service
  • 19:49 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 19:44 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.5 (duration: 17m 59s)
  • 19:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 16:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:00 robh: running dell epsa test on down host mw1360 per T262151
  • 14:34 moritzm: installing nginx security updates on buster
  • 14:33 shdubsh: restart apache on prometheus nodes to pick up new ext endpoint
  • 14:24 ema: upload libvmod-re2 1.5.3-1 to buster-wikimedia component/varnish6 T261632
  • 14:24 papaul: rebooting ms-be2019
  • 14:15 XioNoX: upgrade FNM on netflow2001 - T257035
  • 14:12 jayme: running ipvsadm -D -t 10.2.1.19:1970; ipvsadm -D -t 10.2.1.21:24766 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:12 jayme: running ipvsadm -D -t 10.2.2.19:1970; ipvsadm -D -t 10.2.2.21:24766 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255868 T255877
  • 14:11 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:10 XioNoX: upgrade FNM on netflow5001 - T257035
  • 14:09 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255868 T255877
  • 14:09 shdubsh: restart statsv on webperf[1-2]001 to route metrics through statsd-exporter
  • 14:09 XioNoX: upgrade FNM on netflow1001 - T257035
  • 14:06 XioNoX: upgrade FNM on netflow3001 - T257035
  • 14:05 jayme: running puppet on lvs servers - T255868 T255877
  • 14:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:02 hnowlan: roll-restarting restbase codfw for java updates
  • 13:59 XioNoX: add fastnetmon_1.1.7 to buster-wikimedia repo - T257035
  • 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:55 ema: upload varnish-modules 0.15.0-1+wmf1 to buster-wikimedia component/varnish6 T261632
  • 13:49 marostegui: Deploy MCR change on db2098:3313 - T238966
  • 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:39 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:35 ema: upload libvmod-netmapper 1.8-1 to buster-wikimedia component/varnish6 T261632
  • 12:54 ema: upload varnishkafka 1.1.0-1 to buster-wikimedia component/varnish6 T261632
  • 12:11 moritzm: installing python3.7 security updates on Buster
  • 12:09 moritzm: installing bundler updates on buster
  • 11:59 Urbanecm: Reset password for SUL User:Freibo
  • 11:58 Lucas_WMDE: EU backport&config window done
  • 11:56 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource --fix | tee T263358.fix # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:55 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource | tee T263358.dryrun # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Create Portal and Portal_talk namespaces on trwikisource, and fix an incorrect alias (T263358) (duration: 00m 57s)
  • 11:47 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Removing Wikipedia store link from enwiki (T262329) (duration: 00m 57s)
  • 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set timezone for wikis of the CWIRP to Europe/Rome (T263123) (duration: 00m 59s)
  • 11:35 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:35 hnowlan: roll-restarting restbase eqiad for java updates
  • 11:25 ema: upload varnish 6.0.6-1wm1 to buster-wikimedia component/varnish6 T261632
  • 11:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:13 moritzm: installing intel-microcode 3.20200616.1 on Buster baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 11:00 moritzm: installing intel-microcode 3.20200616.1 on Stretch baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 10:51 XioNoX: Add policy-options for primary IXPs to all routers - T262517
  • 10:48 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:48 hnowlan: roll-restarting sessionstore for java security updates
  • 10:44 moritzm: installing bacula security updates on stretch
  • 10:33 moritzm: installing remaining libx11 security updates
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12733 and previous config saved to /var/cache/conftool/dbconfig/20200922-101342-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12732 and previous config saved to /var/cache/conftool/dbconfig/20200922-101324-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12731 and previous config saved to /var/cache/conftool/dbconfig/20200922-101308-root.json
  • 10:00 kormat: deploying schema change to s2 in eqiad. labsdb will have s2 lag until this finishes. T259831
  • 09:59 jayme: running ipvsadm -D -t 10.2.1.45:34192; ipvsadm -D -t 10.2.1.42:35192 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:59 jayme: running ipvsadm -D -t 10.2.2.45:34192; ipvsadm -D -t 10.2.2.42:35192 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255873 T255870
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12730 and previous config saved to /var/cache/conftool/dbconfig/20200922-095839-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12729 and previous config saved to /var/cache/conftool/dbconfig/20200922-095821-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12728 and previous config saved to /var/cache/conftool/dbconfig/20200922-095805-root.json
  • 09:57 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:55 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255873 T255870
  • 09:51 jayme: running puppet on lvs servers - T255873 T255870
  • 09:46 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
  • 09:46 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12727 and previous config saved to /var/cache/conftool/dbconfig/20200922-094336-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12726 and previous config saved to /var/cache/conftool/dbconfig/20200922-094317-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12725 and previous config saved to /var/cache/conftool/dbconfig/20200922-094302-root.json
  • 09:30 volans: repooling ulsfo after merging DNS migration to Netbox zonefiles - T258729
  • 09:30 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.uptime (exit_code=0)
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12724 and previous config saved to /var/cache/conftool/dbconfig/20200922-092832-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12723 and previous config saved to /var/cache/conftool/dbconfig/20200922-092814-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12722 and previous config saved to /var/cache/conftool/dbconfig/20200922-092758-root.json
  • 09:26 jbond@cumin1001: START - Cookbook sre.pdus.uptime
  • 09:24 XioNoX: replace BGP_IXP_in with BGP_IXP_PRIMARY_in on cr3-ulsfo IX BGP group - T262517
  • 09:22 XioNoX: add BGP_IXP_PRIMARY_in to cr3-ulsfo - T262517
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12721 and previous config saved to /var/cache/conftool/dbconfig/20200922-091329-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12720 and previous config saved to /var/cache/conftool/dbconfig/20200922-091310-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12719 and previous config saved to /var/cache/conftool/dbconfig/20200922-091255-root.json
  • 09:11 jbond42: update snmp string on ps1-a8-codfw
  • 09:05 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12718 and previous config saved to /var/cache/conftool/dbconfig/20200922-090520-kormat.json
  • 08:58 _joe_: restart pybal on lvs2009
  • 08:56 _joe_: restarting pybal on lvs2010
  • 08:54 _joe_: restarted pybal on lvs1015
  • 08:50 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12717 and previous config saved to /var/cache/conftool/dbconfig/20200922-085017-kormat.json
  • 08:36 _joe_: restarting pybal low-traffic in eqiad to pick up lvs changes
  • 08:35 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12715 and previous config saved to /var/cache/conftool/dbconfig/20200922-083514-kormat.json
  • 08:22 volans: migrating ulsfo public DNS records to the Netbox-generated ones - T258729
  • 08:20 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12714 and previous config saved to /var/cache/conftool/dbconfig/20200922-082010-kormat.json
  • 08:13 kormat: uploaded wmfmariadbpy v0.5 to apt. deploying now to fleet
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2032, es2033 and es2034 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12713 and previous config saved to /var/cache/conftool/dbconfig/20200922-081154-marostegui.json
  • 07:57 volans: migrating ulsfo private DNS records to the Netbox-generated ones - T258729
  • 07:54 kormat@cumin1001: dbctl commit (dc=all): 'db2076 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12712 and previous config saved to /var/cache/conftool/dbconfig/20200922-075429-kormat.json
  • 07:51 jayme: running ipvsadm -D -t 10.2.1.18:8080; ipvsadm -D -t 10.2.1.46:3030 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:49 jayme: running ipvsadm -D -t 10.2.2.18:8080; ipvsadm -D -t 10.2.2.46:3030 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255879 T254581
  • 07:46 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:42 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255879 T254581
  • 07:39 jayme: running puppet on lvs servers - T255879 T254581
  • 07:34 volans: depooling ulsfo to merge DNS migration to Netbox zonefiles - T258729
  • 07:24 marostegui: Stop MySQL on es2014 - host will be decommissioned T262889
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2014 from dbctl T262889', diff saved to https://phabricator.wikimedia.org/P12711 and previous config saved to /var/cache/conftool/dbconfig/20200922-071435-marostegui.json
  • 07:11 XioNoX: cr1-codfw# run clear bfd session address fe80::f27c:c7ff:fe11:2c1b
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 for decommissioning T262889', diff saved to https://phabricator.wikimedia.org/P12710 and previous config saved to /var/cache/conftool/dbconfig/20200922-061815-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 100%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12709 and previous config saved to /var/cache/conftool/dbconfig/20200922-054455-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12708 and previous config saved to /var/cache/conftool/dbconfig/20200922-054438-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12707 and previous config saved to /var/cache/conftool/dbconfig/20200922-054430-root.json
  • 05:41 marostegui: Log remove triggers on revision table on db1124:3313 T238966
  • 05:39 marostegui: Deploy MCR schema change on s3 eqiad, this will generate lag on s3 on labsdb T238966
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2032, es2033 and es2034 into dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12706 and previous config saved to /var/cache/conftool/dbconfig/20200922-053346-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 75%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12705 and previous config saved to /var/cache/conftool/dbconfig/20200922-052951-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12704 and previous config saved to /var/cache/conftool/dbconfig/20200922-052935-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12703 and previous config saved to /var/cache/conftool/dbconfig/20200922-052926-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 50%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12702 and previous config saved to /var/cache/conftool/dbconfig/20200922-051448-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12701 and previous config saved to /var/cache/conftool/dbconfig/20200922-051431-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12700 and previous config saved to /var/cache/conftool/dbconfig/20200922-051423-root.json
  • 05:00 marostegui: Add es2032 es2033 and es2034 to tendril and zarcillo T261717
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 25%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12699 and previous config saved to /var/cache/conftool/dbconfig/20200922-045944-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12698 and previous config saved to /var/cache/conftool/dbconfig/20200922-045928-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12697 and previous config saved to /var/cache/conftool/dbconfig/20200922-045919-root.json
  • 01:35 ryankemper: `sudo cumin C:profile::services_proxy::envoy 'enable-puppet "adding cloudelastic to the service proxy --rkemper"'` done
  • 01:35 ryankemper: woot! `curl -X GET -s 'http://localhost:6105/_cluster/health'` gives a response as expected. (As do 6106 and 6107). Re-enabling puppet across the fleet...
  • 01:32 ryankemper: `sudo run-puppet-agent -e "adding cloudelastic to the service proxy --rkemper"` on `mwdebug1002.eqiad.wmnet`
  • 01:28 ryankemper: `sudo puppet-merge` done, now will run puppet on a single eqiad appserver and verify we can curl `localhost:610{5,6,7}`
  • 01:17 ryankemper: Disabling puppet on affected nodes via `sudo cumin C:profile::services_proxy::envoy 'disable-puppet "adding cloudelastic to the service proxy --rkemper"'`
  • 01:17 ryankemper: Going to test patch to stick envoy in front of `cloudelastic`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/628243

2020-09-21

  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:39 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:36 mutante: debmonitor2002 - systemctl reset-failed
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:20 mutante: releases.wikimedia.org has been converted to an active-active service with geodns/ backends in both DCs
  • 21:56 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:54 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:51 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:49 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adjust enwiktionary completion search ranking (duration: 00m 57s)
  • 20:47 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/CirrusSearch/: Remove pages from completion search by page id (duration: 01m 00s)
  • 20:04 herron: moving prometheus instance from bast3004 to prometheus3001 T243057
  • 19:46 herron: moving prometheus instance from bast4002 to prometheus4001 T243057
  • 19:38 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Push notifications deployment (4/5) (duration: 00m 57s)
  • 19:34 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Push notifications deployment (3/5) (duration: 00m 57s)
  • 19:28 mholloway-shell@deploy1001: Synchronized wmf-config/ProductionServices.php: Push notifications deployment (2/5) (duration: 00m 57s)
  • 19:26 mholloway-shell@deploy1001: Synchronized wmf-config/LabsServices.php: Push notifications deployment (1/5) (duration: 00m 57s)
  • 19:19 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:18 mepps: updated crm to 8f32b6301f
  • 19:15 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:14 ejegg: updated fundraising CiviCRM from e5ebf9d18a to 8f32b6301f
  • 19:13 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:59 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622863 T249745 (duration: 00m 56s)
  • 18:57 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365 (duration: 06m 54s)
  • 18:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on plwiki (T254239) and ptwiki (T255027) (duration: 00m 56s)
  • 18:50 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365
  • 18:33 mepps: updated crm from cc1f7e6d13 to e5ebf9d18a
  • 18:26 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Define Chinese logo variants for Modern Vector (no-op) (part 2) (T261153) (duration: 00m 56s)
  • 18:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Define Chinese logo variants for Modern Vector (no-op) (T261153) (duration: 00m 57s)
  • 18:21 catrope@deploy1001: Synchronized static/images/mobile/copyright/: Update Chinese logo variants for Modern Vector (T261153) (duration: 00m 56s)
  • 18:08 XioNoX: add NAT rule to pfw3-codfw - T263488
  • 17:42 papaul: rebooting ps1-a8-codfw firmware upgrade
  • 16:46 papaul: shutting down ms-be2019 for BBU replacing
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12696 and previous config saved to /var/cache/conftool/dbconfig/20200921-162433-root.json
  • 16:17 papaul: replacing msw-c8-codfw
  • 16:16 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12695 and previous config saved to /var/cache/conftool/dbconfig/20200921-160929-root.json
  • 16:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12694 and previous config saved to /var/cache/conftool/dbconfig/20200921-155426-root.json
  • 15:51 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 56s)
  • 15:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 59s)
  • 15:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12693 and previous config saved to /var/cache/conftool/dbconfig/20200921-153923-root.json
  • 15:24 hnowlan: roll-restarting restbase-dev for java security updates
  • 15:24 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Take db2124 back out of dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12692 and previous config saved to /var/cache/conftool/dbconfig/20200921-151210-kormat.json
  • 15:10 moritzm: rolling restart of mw canaries in codfw to pick up libx11 update
  • 15:07 moritzm: installing libx11 security updates on stretch
  • 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12691 and previous config saved to /var/cache/conftool/dbconfig/20200921-150233-kormat.json
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12690 and previous config saved to /var/cache/conftool/dbconfig/20200921-144729-kormat.json
  • 14:40 moritzm: installing qemu security updates on ganeti* stretch nodes
  • 14:37 papaul: firmware upgrade on db2127
  • 14:36 moritzm: installing qemu security updates on ganeti2011 and gnt-instance reboot debmonitor2001
  • 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:32 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12689 and previous config saved to /var/cache/conftool/dbconfig/20200921-143226-kormat.json
  • 14:30 herron: moving prometheus from bast5001 to prometheus5001 T243057
  • 14:24 papaul: disconnecting mgmt on msw-c1-codfw to re-do cable end T263138
  • 14:21 marostegui: Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing T263443
  • 14:17 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12688 and previous config saved to /var/cache/conftool/dbconfig/20200921-141722-kormat.json
  • 14:11 papaul: disconnecting mgmt on msw-d6-codfw to re-do cable end T263138
  • 14:00 moritzm: installing Java security updates on restbase/sessionstore*
  • 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2117 for schema change, add db2124 to dump/vslow in the interim T259831', diff saved to https://phabricator.wikimedia.org/P12687 and previous config saved to /var/cache/conftool/dbconfig/20200921-135821-kormat.json
  • 13:21 moritzm: installing glib-networking security updates for Stretch
  • 13:21 marostegui: Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing T263443
  • 12:59 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=codfw
  • 12:38 XioNoX: set same OSPF metric on both eqiad/codfw links - T263230
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing T263443
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing T263443
  • 11:38 effie: restart pybal on lvs2009 and lvs1015 - T256973
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed', diff saved to https://phabricator.wikimedia.org/P12684 and previous config saved to /var/cache/conftool/dbconfig/20200921-113708-marostegui.json
  • 11:35 Urbanecm: EU B&C done
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend/includes/Transforms/MoveLeadParagraphTransform.php: 3fab588: Simplify lead paragraph check (duration: 00m 59s)
  • 11:22 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a62212a: Allow local steward group members to bigdelete (duration: 00m 57s)
  • 11:12 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=shnwiktionary --fix # T256348 # P12683
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1cf4664: Set WT namespace alias to NS_PROJECT in shn.wiktionary (T256348) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 01ba828: Add archive.wul.waseda.ac.jp to the wgCopyUploadDomains (T261037) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bd51f47: Add *.70yearsindonesiaaustralia.com to the wgCopyUploadsDomains allowlist of commonswiki (T262238) (duration: 00m 57s)
  • 11:02 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 10:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 12s)
  • 09:03 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12682 and previous config saved to /var/cache/conftool/dbconfig/20200921-090343-kormat.json
  • 08:48 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12681 and previous config saved to /var/cache/conftool/dbconfig/20200921-084840-kormat.json
  • 08:48 marostegui: Stop MySQL on db2127 for on-site maintenance - T262247
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 T262247', diff saved to https://phabricator.wikimedia.org/P12680 and previous config saved to /var/cache/conftool/dbconfig/20200921-084730-marostegui.json
  • 08:33 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12679 and previous config saved to /var/cache/conftool/dbconfig/20200921-083337-kormat.json
  • 08:21 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 08:18 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12678 and previous config saved to /var/cache/conftool/dbconfig/20200921-081833-kormat.json
  • 08:15 godog: roll-restart swift-object-replicator in codfw and eqiad for increased concurrency
  • 07:53 hashar: Upgrading all CI Jenkins jobs to Quibble 0.0.45
  • 07:05 XioNoX: upgrade FNM to 1.1.7 in ulsfo - T257035
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12677 and previous config saved to /var/cache/conftool/dbconfig/20200921-060053-marostegui.json
  • 05:48 marostegui: Set innodb_change_buffering = inserts; on db2129 (s6 master) for performance testing
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12676 and previous config saved to /var/cache/conftool/dbconfig/20200921-054730-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12675 and previous config saved to /var/cache/conftool/dbconfig/20200921-052704-marostegui.json
  • 05:18 marostegui: Stop mysql on: es2013 es2016 es2019 to clone es2032 es2033 es2034 - T261717
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12674 and previous config saved to /var/cache/conftool/dbconfig/20200921-050632-marostegui.json
  • 05:06 marostegui: Deploy MCR schema change on s8 eqiad master, lag will appear on s8 (wikidata) on labsdb hosts T238966
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013,es2016 and es2019 to clone new hosts T261717', diff saved to https://phabricator.wikimedia.org/P12673 and previous config saved to /var/cache/conftool/dbconfig/20200921-050305-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2015 as es2 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12672 and previous config saved to /var/cache/conftool/dbconfig/20200921-050228-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12671 and previous config saved to /var/cache/conftool/dbconfig/20200921-045919-marostegui.json
  • 04:37 marostegui: Set innodb_change_buffering = inserts; on db2116 for performance testing
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12670 and previous config saved to /var/cache/conftool/dbconfig/20200921-043154-marostegui.json

2020-09-20

  • 08:46 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Tepig10102020' 'Davidfromtheworld' # T263317
  • 07:42 gehel: depooling wdqs2002 to catch up on lag
  • 07:36 gehel: restarting blazegraph + updater on wdqs2002

2020-09-19

  • 19:03 ariel@deploy1001: Finished deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed (duration: 00m 04s)
  • 19:02 ariel@deploy1001: Started deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed
  • 16:49 ejegg: reverted PayPal failmail diversion - IPN verification is working again
  • 16:27 ejegg: Diverted SmashPig PayPal failmail to eeggleston only

2020-09-18

  • 21:48 tzatziki: changed password for Millennium bug@ptwiki
  • 19:28 eileen: process-control config revision is 739ea754ca
  • 18:52 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:44 ryankemper: `sudo kill 254017 254018 254028 254029` to kill some dangling serdi / gzip processes, all the wikidata cleanup should be complete
  • 18:38 ryankemper: `sudo kill 126121 126122 126124 126128 249520 249521 254016 254027` on `snapshot1008` to terminate wikidata dump jobs that are in a bad state
  • 18:10 ryankemper: Removed stale `wikidatardf-dumps` crontab entry from `dumpsgen@snapshot1008`, stored backup of previous state of crontab in the (admittedly verbose) `/tmp/dumpsgen_crontab_before_removing_stale_wikidata_dump_entry_see_gerrit_puppet_patch_622342`
  • 17:15 mutante: lists1001 - apt-get install pwgen to generate passwords (this was installed on previous list server but apparently not puppetized, puppet patch coming up)
  • 16:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:09 mutante: restarting gerrit service to apply gerrit::628338 to make it dump heap if out of memory (T263008)
  • 14:15 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 00m 56s)
  • 14:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 01m 00s)
  • 13:02 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:00 kormat@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
  • 12:41 kormat: reimaging db2125 T263244
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12665 and previous config saved to /var/cache/conftool/dbconfig/20200918-123947-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12664 and previous config saved to /var/cache/conftool/dbconfig/20200918-122444-kormat.json
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12663 and previous config saved to /var/cache/conftool/dbconfig/20200918-120940-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12662 and previous config saved to /var/cache/conftool/dbconfig/20200918-115437-kormat.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125', diff saved to https://phabricator.wikimedia.org/P12661 and previous config saved to /var/cache/conftool/dbconfig/20200918-113509-marostegui.json
  • 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12660 and previous config saved to /var/cache/conftool/dbconfig/20200918-111529-kormat.json
  • 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12659 and previous config saved to /var/cache/conftool/dbconfig/20200918-105645-kormat.json
  • 10:45 jiji@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:41 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12658 and previous config saved to /var/cache/conftool/dbconfig/20200918-104141-kormat.json
  • 10:35 jiji@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:28 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12657 and previous config saved to /var/cache/conftool/dbconfig/20200918-102638-kormat.json
  • 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12656 and previous config saved to /var/cache/conftool/dbconfig/20200918-101135-kormat.json
  • 09:55 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12655 and previous config saved to /var/cache/conftool/dbconfig/20200918-095554-kormat.json
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 twentyafterfour: deployed hotfix for T263063 to phab1001
  • 09:47 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - T262527
  • 09:46 jayme: uncordoned kubestage1001 - T262527
  • 09:46 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12654 and previous config saved to /var/cache/conftool/dbconfig/20200918-094608-kormat.json
  • 09:31 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 80%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12653 and previous config saved to /var/cache/conftool/dbconfig/20200918-093105-kormat.json
  • 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 60%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12652 and previous config saved to /var/cache/conftool/dbconfig/20200918-091601-kormat.json
  • 09:00 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 40%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12651 and previous config saved to /var/cache/conftool/dbconfig/20200918-090058-kormat.json
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:56 jayme: reboot kubestage1001 for clean state - T262527
  • 08:54 elukey: change analytics-in4/in6 filters on cr1/cr2 after https://gerrit.wikimedia.org/r/628300
  • 08:47 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:45 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12650 and previous config saved to /var/cache/conftool/dbconfig/20200918-084554-kormat.json
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme: reboot kubestage1001 for kernel upgrade - T262527
  • 08:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1001 for clean state testing - T262527
  • 08:22 kormat@cumin1001: dbctl commit (dc=all): 'db2124 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12648 and previous config saved to /var/cache/conftool/dbconfig/20200918-082223-kormat.json
  • 08:16 klausman: reinstalling stat1004 with Buster
  • 07:17 moritzm: installing xdg-utils security updates
  • 07:14 XioNoX: push pfw policies - T263168
  • 07:12 jayme: draining kubestage1001 for kernel upgrade - T262527
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12647 and previous config saved to /var/cache/conftool/dbconfig/20200918-062127-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12646 and previous config saved to /var/cache/conftool/dbconfig/20200918-060815-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after rack move', diff saved to https://phabricator.wikimedia.org/P12645 and previous config saved to /var/cache/conftool/dbconfig/20200918-060724-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12644 and previous config saved to /var/cache/conftool/dbconfig/20200918-060103-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12643 and previous config saved to /var/cache/conftool/dbconfig/20200918-053758-marostegui.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2029 and es2030 to dbctl depooled - T261717', diff saved to https://phabricator.wikimedia.org/P12642 and previous config saved to /var/cache/conftool/dbconfig/20200918-053604-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12641 and previous config saved to /var/cache/conftool/dbconfig/20200918-052608-marostegui.json
  • 05:15 marostegui: Restart wikibugs

2020-09-17

  • 23:41 ejegg: updated payments-wiki from 86c997fdb2 to 7bb99ce03a
  • 23:01 ejegg: updated payments-wiki from 1e5a52ed26 to 86c997fdb2
  • 20:47 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 19b9b98: Fix APCOND_FR_NEVERBLOCKED handling (part 3; T262970) (duration: 00m 57s)
  • 19:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Filomena ciavarella' 'Filomena Ciavarella' #T262657
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:29 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:11 Urbanecm: Morning B&C done
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 40591d3: Enable DiscussionTools beta on jawiki & viwiki (T261654; T262109) (duration: 00m 56s)
  • 18:06 Urbanecm: Move /srv/mediawiki-stagging/grep (owned by tstarling) to /home/urbanecm to make working directory clean (cc TimStarling)
  • 17:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 17:20 rzl: repooled eqiad at 17:11
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:03 papaul: restarting ps1-d8-codfw
  • 16:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 01m 12s)
  • 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 02m 50s)
  • 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 07m 26s)
  • 16:33 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema (duration: 06m 14s)
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 marostegui: Restart wikibugs
  • 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 papaul: replacing msw-d8-codfw
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1131 IP after moving it to a different rack T262901', diff saved to https://phabricator.wikimedia.org/P12639 and previous config saved to /var/cache/conftool/dbconfig/20200917-160540-marostegui.json
  • 16:03 marostegui: Recreate db1131 on tendril T262901
  • 15:59 marostegui: Update rack location on zarcillo for db1131 T262901
  • 15:57 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 100% T259831', diff saved to https://phabricator.wikimedia.org/P12638 and previous config saved to /var/cache/conftool/dbconfig/20200917-155708-kormat.json
  • 15:44 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 75% T259831', diff saved to https://phabricator.wikimedia.org/P12637 and previous config saved to /var/cache/conftool/dbconfig/20200917-154431-kormat.json
  • 15:43 mepps: updated payments-wiki from 3c073a6a56 to 1e5a52ed26
  • 15:35 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 50% T259831', diff saved to https://phabricator.wikimedia.org/P12636 and previous config saved to /var/cache/conftool/dbconfig/20200917-153514-kormat.json
  • 15:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 25% T259831', diff saved to https://phabricator.wikimedia.org/P12635 and previous config saved to /var/cache/conftool/dbconfig/20200917-152019-kormat.json
  • 15:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12634 and previous config saved to /var/cache/conftool/dbconfig/20200917-151347-marostegui.json
  • 15:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12633 and previous config saved to /var/cache/conftool/dbconfig/20200917-150234-marostegui.json
  • 15:02 jynus: deploying extended grants for admin account on sys/p_s at s8@codfw T195578
  • 15:00 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2114: depool for schema change T259831', diff saved to https://phabricator.wikimedia.org/P12632 and previous config saved to /var/cache/conftool/dbconfig/20200917-145451-kormat.json
  • 14:49 cmjohnson1: ending pdu maintenance in eqiad
  • 14:40 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12631 and previous config saved to /var/cache/conftool/dbconfig/20200917-143914-marostegui.json
  • 14:32 papaul: replacing msw-d1,d2,d3,d4,d5 and d6
  • 14:31 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12630 and previous config saved to /var/cache/conftool/dbconfig/20200917-141825-marostegui.json
  • 14:02 marostegui: Start mysql on db1125 after PDU maintenance T261459
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12629 and previous config saved to /var/cache/conftool/dbconfig/20200917-140014-marostegui.json
  • 13:33 jayme: ran ipvsadm -D -t 10.2.2.14:8888 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:32 jayme: ran ipvsadm -D -t 10.2.2.31:8748 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.31:8748 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.14:8888 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'Start depooling db2114 T259831', diff saved to https://phabricator.wikimedia.org/P12628 and previous config saved to /var/cache/conftool/dbconfig/20200917-132513-kormat.json
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:19 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet
  • 13:17 marostegui: Stop MySQL on db2125 for on-site maintenance T260670
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:13 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.9
  • 12:18 cmjohnson1: pdu swap maintenance beginning now for racks D1, D2 and C1 eqiad
  • 11:24 matthiasmullie: End Euro B&C
  • 11:24 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/NavigationTiming/: Account for empty layout shift sources array (duration: 01m 05s)
  • 11:22 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/WikimediaEvents/: Disable MediaSearch A/B test (duration: 01m 08s)
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12627 and previous config saved to /var/cache/conftool/dbconfig/20200917-111028-marostegui.json
  • 11:06 vgutierrez: update to acme-chief 0.29 on acmechief[12]001 - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:04 vgutierrez: upload acme-chief 0.29 to apt.wm.o (buster) - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:03 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:58 marostegui: Stop mysql on db1125 for PDU mainteanance, lag will appear on s2, s4, s6 and s7 on labsdb hosts T261459
  • 10:58 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=codfw
  • 10:51 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=codfw
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12626 and previous config saved to /var/cache/conftool/dbconfig/20200917-104816-marostegui.json
  • 10:46 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:40 oblivian@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=wikifeeds
  • 10:34 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:20 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:17 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:14 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 08:49 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - T262527
  • 08:43 jayme: uncordoned kubestage1002 after kernel upgrade - T262527
  • 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:37 godog: graphite compress /var/log/carbon logs older than 2d
  • 08:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1002 for kernel upgrade - T262527
  • 08:24 godog: graphite add 300G to /srv
  • 07:55 jayme: draining kubestage1002 for kernel upgrade - T262527
  • 07:55 jayme: cordoning kubestage1002 for kernel upgrade - T262527
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12624 and previous config saved to /var/cache/conftool/dbconfig/20200917-070145-marostegui.json
  • 06:55 hashar: Taking a heap dump of Gerrit JVM
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12623 and previous config saved to /var/cache/conftool/dbconfig/20200917-061931-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12622 and previous config saved to /var/cache/conftool/dbconfig/20200917-060312-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12621 and previous config saved to /var/cache/conftool/dbconfig/20200917-055219-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for on-site maintenace', diff saved to https://phabricator.wikimedia.org/P12620 and previous config saved to /var/cache/conftool/dbconfig/20200917-055158-marostegui.json
  • 05:46 marostegui: Stop mysql on db1131 - T262901
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2031 on es2 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12619 and previous config saved to /var/cache/conftool/dbconfig/20200917-054226-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12618 and previous config saved to /var/cache/conftool/dbconfig/20200917-053503-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12617 and previous config saved to /var/cache/conftool/dbconfig/20200917-052347-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2011 as es1 master and es2017 as es3 master and then depool es2018 and es2012 to clone es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12616 and previous config saved to /var/cache/conftool/dbconfig/20200917-051741-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12615 and previous config saved to /var/cache/conftool/dbconfig/20200917-050739-marostegui.json
  • 04:53 marostegui: Deploy schema change on s1 eqiad primary master - T238966
  • 01:22 Krinkle: krinkle@mwmaint1002 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
  • 01:22 Krinkle: krinkle@mwmaint2001 synced docroot/noc – https://gerrit.wikimedia.org/r/620138

2020-09-16

  • 23:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:37 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/GrowthExperiments/: Fix styling for mobile start module (T258008); Revert wider task card on desktop (T263042, T258704); Fix width of sidebar modules in narrow mode in variant A (T263068) (duration: 01m 09s)
  • 22:24 shdubsh: install prometheus-icinga-exporter 0.11 on icinga2001
  • 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:10 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Vector search in header on testwiki and officewiki (T262207) (duration: 01m 04s)
  • 18:00 brennen@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend: Backport: Check $coords matched some nodes before comparing contents (T263034) (duration: 01m 06s)
  • 17:58 joal@deploy1001: Finished deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0] (duration: 00m 08s)
  • 17:58 joal@deploy1001: Started deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0]
  • 17:51 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:50 joal@deploy1001: Started deploy [analytics/refinery@07056b0]: Regular analytics weekly train [analytics/refinery@07056b0]
  • 17:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:11 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:03 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:40 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:13 marostegui: Start mysql on db1093, db1109 and db1123 after pdu work is done
  • 16:12 ryankemper: `wdqs` deploy complete, service is healthy
  • 16:09 elukey: reinstall buster on an-tool1009 after a lot of tests (ganeti VM, so it is a manual work)
  • 16:00 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:49 ryankemper: sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'; sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'
  • 15:49 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b7e2d0b]: 0.3.48 (duration: 14m 40s)
  • 15:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Rename wmgWikibaseClientLocalEntitySourceName to wmgWikibaseClientItemAndPropertySourceName on Beta (T258060) (production no-op) (duration: 01m 04s)
  • 15:35 ryankemper: Canary `wdqs1003` query tests looks good, proceeding to wdqs deploy for rest of fleet
  • 15:33 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b7e2d0b]: 0.3.48
  • 15:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php (T258060) (duration: 01m 05s)
  • 15:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Use `wmgWikibaseClientItemAndPropertySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php (T258060) (duration: 01m 02s)
  • 15:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add `wmgWikibaseClientItemAndPropertySourceName` to InitialiseSettings.php (T258060) (duration: 01m 06s)
  • 14:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:41 volans: uploaded spicerack_0.0.43 to apt.wikimedia.org buster-wikimedia
  • 14:39 cmjohnson1: pdu swap rack d7-eqiad, missed this in earlier log entry
  • 14:34 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 14:02 Urbanecm: Change email address of User:Oversight@enwiki to oversight-en-wp@wikipedia.org as OTRS is back up (T262733)
  • 13:48 marostegui: Start mysql on db1121 after PDU work
  • 13:46 James_F: Restarting CI Jenkins for T262827
  • 13:08 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet
  • 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.9
  • 12:58 elukey: upload hue_4.7.1-1+deb10u1 to buster-wikimedia
  • 12:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 12:56 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 12:49 cmjohnson1: start pdu swap in racks c6 and c7, d8
  • 12:36 moritzm: powercycling mw2256 (went down with overheated CPU)
  • 12:29 moritzm: restarting exim on MXes to pick up GNUTLS update
  • 11:29 moritzm: restarting slapd on LDAP replicas to pick up GNUTLS update
  • 11:18 moritzm: installing gnutls28 security updates on remaining stretch hosts
  • 11:12 jforrester@deploy1001: Synchronized php-1.36.0-wmf.9/includes/filerepo/file: T263014 Revert "Remove support for (Archived|OldLocal)File::userCan without a user" (duration: 01m 04s)
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2027 and es2028 T261717', diff saved to https://phabricator.wikimedia.org/P12606 and previous config saved to /var/cache/conftool/dbconfig/20200916-103324-marostegui.json
  • 10:20 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.9
  • 10:14 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.9 (duration: 46m 07s)
  • 10:10 ema: upload python-acme 0.31.0-2wm1 to buster-wikimedia T263006
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12605 and previous config saved to /var/cache/conftool/dbconfig/20200916-100548-marostegui.json
  • 10:01 akosiaris: T187984 Shutdown mendelevium.
  • 09:43 jynus: deploying max_packet_size change to m3 instances, too
  • 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.9
  • 09:26 liw: moving train 1.36.0-wmf.9 to testwikis
  • 09:22 jynus: restarting gerrit service on gerrit1001, unresponsive
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12603 and previous config saved to /var/cache/conftool/dbconfig/20200916-091535-marostegui.json
  • 09:13 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 0 - T262290
  • 09:08 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 1 - T262290
  • 08:52 marostegui: Stop mysql on db1121, db1123, db1093 and db1109 for PDU work T261454 T261457
  • 08:52 XioNoX: asw-d-codfw> request system snapshot slice alternate all-members - T262290
  • 08:50 jynus: deploy new max_allowed_packet configuration to m1, m2 and m5 dbs
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12601 and previous config saved to /var/cache/conftool/dbconfig/20200916-084916-marostegui.json
  • 08:42 awight: finished security backport for https://phabricator.wikimedia.org/T262628
  • 08:41 awight@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FileImporter/src/Services/ImportPlanValidator.php: Security patch for T262628 (duration: 00m 59s)
  • 08:41 XioNoX: asw-c-codfw> request system snapshot slice alternate all-members - T262290
  • 08:27 XioNoX: asw-b-codfw> request system snapshot slice alternate all-members - T262290
  • 08:26 awight: beginning security backport for https://phabricator.wikimedia.org/T262628
  • 08:17 XioNoX: asw-a-codfw> request system snapshot slice alternate all-members - T262290
  • 08:04 akosiaris: T187984 Validated that ticket.wikimedia.org works, proceeding with a wider announcement
  • 08:02 XioNoX: asw2-d-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:49 akosiaris: T187984 Switch over ticket.discovery.wmnet to otrs1001
  • 07:48 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:44 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 07:40 XioNoX: asw2-c-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:37 akosiaris: T187984 Tested inbound email successfully
  • 07:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:26 akosiaris: T187984 Tested outbound email, switching inbound email configuration and performing tests
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12600 and previous config saved to /var/cache/conftool/dbconfig/20200916-072614-marostegui.json
  • 07:22 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 jayme@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 07:21 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:12 akosiaris: T187984 Disable gravatar in system configuration to avoid leaking agent PII through a 3rd party service
  • 07:03 akosiaris: T187984 validated that the OTRS installation is functional over SSH
  • 07:02 akosiaris: T187984 migration script done. Config updates, rebuilds, package upgrades/reinstall and index rebuilds done
  • 06:28 godog: codfw-prod: bump weight for ms-be2057 - T261633
  • 06:20 kart_: Updated cxserver to 2020-08-30-011854-production (T253439, T260557)
  • 06:20 XioNoX: asw2-b-eqiad> request system snapshot slice alternate all-members - T262290
  • 06:15 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:11 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12599 and previous config saved to /var/cache/conftool/dbconfig/20200916-061013-marostegui.json
  • 06:08 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12598 and previous config saved to /var/cache/conftool/dbconfig/20200916-060717-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 to clone es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12597 and previous config saved to /var/cache/conftool/dbconfig/20200916-055535-marostegui.json
  • 05:53 XioNoX: asw2-a-eqiad> request system snapshot slice alternate all-members - T262290
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12596 and previous config saved to /var/cache/conftool/dbconfig/20200916-055108-marostegui.json
  • 05:50 XioNoX: msw1-codfw> request system snapshot slice alternate - T262290
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2027 and es2028 to dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12595 and previous config saved to /var/cache/conftool/dbconfig/20200916-053918-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12594 and previous config saved to /var/cache/conftool/dbconfig/20200916-053507-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow', diff saved to https://phabricator.wikimedia.org/P12593 and previous config saved to /var/cache/conftool/dbconfig/20200916-052343-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12592 and previous config saved to /var/cache/conftool/dbconfig/20200916-052241-marostegui.json
  • 05:07 marostegui: Repool labsdb1010
  • 02:22 mutante: deneb - sudo systemctl start package_builder_Clean_up_build_directory to fix icinga alert after failed build attempts

2020-09-15

  • 23:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 1c0b0d1: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 56s)
  • 23:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 5beace3: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 58s)
  • 23:14 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: ac8bd38: flaggedrevs: Remove non-existent config options (duration: 00m 58s)
  • 23:07 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 23:00 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 62b21d5: Revert "Remove abusefilter-view right grant from wmf-config" (T255506) (duration: 00m 59s)
  • 20:44 brennen: removing extraneous recursive symlink /srv/mediawiki-staging/php-1.36.0-wmf.9/php-1.36.0-wmf.8
  • 18:32 Urbanecm: Morning B&C done
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 084729b: Remove abusefilter-view right grant from wmf-config (T255506) (duration: 00m 56s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1d34565: Enable MediaWiki client errors on frwiki (T255585) (duration: 00m 57s)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 79004b7: Enable the reverted tag on all wikis (T164307) (duration: 00m 56s)
  • 17:59 krinkle@deploy1001: Synchronized src/ServiceConfig.php: If727ae4335 (duration: 00m 56s)
  • 17:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out (duration: 37m 42s)
  • 17:05 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out
  • 17:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint (duration: 86m 46s)
  • 17:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:38 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:26 shdubsh: manual install prometheus-icinga-exporter upgrade on icinga2001
  • 14:53 godog: switch grafana to eqiad - T259143
  • 14:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:38 XioNoX: remove old SNMP community from all network devices
  • 14:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 56s)
  • 14:21 otto@deploy1001: sync-file aborted: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 06s)
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:18 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:14 cmjohnson1: beginning work inside racks c2, c3, c4 and c5 eqiad
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, s8, add db1092 temporarily', diff saved to https://phabricator.wikimedia.org/P12589 and previous config saved to /var/cache/conftool/dbconfig/20200915-121849-marostegui.json
  • 12:18 jbond42: update libxml2 on stretch and jessie
  • 12:08 jbond42: rolling restart of php7.2-fpm
  • 12:05 elukey: roll restart cassandra on aqs* to pick up openjdk upgrades
  • 12:05 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:44 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 294931f: Revert "Disable DynamicPageList on ruwikinews" (T262240; T262391) (duration: 00m 58s)
  • 11:17 effie: roll out scap 3.15.0-1 to all - T261234
  • 11:12 XioNoX: mass update SCS SNMP community in LibreNMS - T246890
  • 10:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:56 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:54 XioNoX: mass update PDU SNMP community in LibreNMS - T246890
  • 10:48 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 10:36 moritzm: uploaded libxml2 2.9.1+dfsg1-5+deb8u8+wmf1 for jessie-wikimedia
  • 10:33 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:22 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "testwikiswikis to 1.36.0-wmf.9"
  • 10:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 09:22 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:04 gehel: restart elasticsearch on elastic2029 (high GC
  • 09:01 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 08:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:53 elukey: roll restart druid zookeeper clusters for openjdk upgrades
  • 08:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:13 marostegui: Stop MySQL on labsdb1010 for PDU maintenance T261456
  • 08:05 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 11m 10s)
  • 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:01 akosiaris: T187984 migration script on otrs1001 proceeding as expected. Still in step 31/44, but that's what we saw in the test migration
  • 07:54 liw@deploy1001: Started scap: testwikis to 1.36.0-wmf.9
  • 07:24 godog: swift codfw add ms-be2057 at object weight 100 - T261633
  • 07:19 elukey: roll restart druid cluster to pick up openjdk updates
  • 07:19 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 07:16 XioNoX: pre-configure SGIX port on cr2-eqsin
  • 06:57 liw: 1.36.0-wmf.9 was branched at 7269b6b for T257977
  • 06:08 marostegui: Stop mysql on es2011 to clone es2028
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 to clone es2028', diff saved to https://phabricator.wikimedia.org/P12585 and previous config saved to /var/cache/conftool/dbconfig/20200915-060623-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2012 as es1 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12584 and previous config saved to /var/cache/conftool/dbconfig/20200915-060508-marostegui.json
  • 05:33 marostegui: Depool labsdb1010 for PDU maintenance
  • 05:10 marostegui: Restart sanitarium hosts on eqiad and codfw T262832

2020-09-14

  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:45 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 21:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:32 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:30 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'enable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:24 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'disable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:03 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a588eb0c6 T262087 modify wgEventStreams to reference NEL schema (duration: 00m 56s)
  • 19:00 Urbanecm: Morning B&C done
  • 18:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a5d56ed: e2f4798: Enable Special:Investigate on eswiki (T262436) (duration: 00m 56s)
  • 18:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:47 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:38 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 7d19393: Remove investigate from $wgAvailableRights (T260175) (duration: 00m 56s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d2fa653: Remove the investigate right from testwiki and frwiki (T260175) (duration: 00m 56s)
  • 18:30 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/EventStreamConfig/includes/: a4c8608: Default to using API json formatversion=2 (T251609) (duration: 00m 57s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 27ba5a1: add new parse* servers to $wgLinterSubmitterWhitelist (T247441) (duration: 00m 56s)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: 720e6cb: flaggedrevs: Move setting of wgFlaggedRevsAutopromote and wgFlaggedRevsAutoconfirm out of wgExtensionFunctions (T237191) (duration: 00m 56s)
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 55s)
  • 18:08 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 56s)
  • 17:51 mutante: all new parse* parsoid hardware pooled now and set to active in netbox, deploy in 10 min will add to $wgLinterSubmitterWhitelist (T247441)
  • 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 17:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 17:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2002.codfw.wmnet
  • 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:36 mutante: pooled the first of the new parsoid servers - parse2001 (T247441)
  • 16:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 16:04 elukey: completed the rollout of restrictive kafka ferm rules on the Kafka jumbo cluster
  • 16:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 16:01 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[0-2][0-9].codfw.wmnet
  • 15:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 15:58 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 15:54 moritzm: restarting apache on webperf* to pick up GNU TLS security update
  • 15:45 moritzm: restarting apache/FPM on mw2271/m2272 (codfw canaries) to pick up GNU TLS update
  • 15:35 moritzm: installing gnutls28 security updates on stretch
  • 15:23 elukey: enable stricter ferm rules on kafka-jumbo1007 and kafka-jumbo1005
  • 15:17 cicalese@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Allow public access to API Portal main page for private launch (duration: 00m 57s)
  • 15:17 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:11 cmjohnson1: completed pdu swap in eqiad racks d5/d6
  • 14:55 elukey: ferm rules added to kafka-jumbo1009, 1006 and 1008 up to now
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:16 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:42 moritzm: installing dbus security updates on stretch
  • 13:42 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:32 moritzm: installing websockify stretch updates
  • 13:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:51 cmjohnson1: correction it's replacing the pdu's in racks d5 and d6
  • 12:50 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1438 --new-data-type external-id (T262198)
  • 12:49 cmjohnson1: replacing pdu's in racks d4 and d5 eqiad
  • 12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-snmp (exit_code=1)
  • 12:30 ayounsi@cumin1001: START - Cookbook sre.pdus.rotate-snmp
  • 12:30 XioNoX: rotate SNMP community on all the PDUs - T246890
  • 12:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:24 moritzm: rebooting sodium for kernel update
  • 12:09 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:08 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:06 akosiaris: T187984 migration script on otrs1001 now in step 31/44
  • 12:03 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fea8861: Follow-up 0ee0d8f: [frwiktionary] Create `conj` alias (T262298) (duration: 00m 56s)
  • 11:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:48 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:48 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:46 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:45 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:41 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:41 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:40 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:39 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:36 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:35 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:27 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for MCR', diff saved to https://phabricator.wikimedia.org/P12578 and previous config saved to /var/cache/conftool/dbconfig/20200914-112648-marostegui.json
  • 11:24 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:20 marostegui: Remove triggers from db1124:3311 - T238966
  • 11:19 marostegui: Deploy MCR schema change on s1, this will generate lag on s1 labsdb - T238966
  • 11:13 Urbanecm: EU B&C window done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 47fe87c: [itwiki] Increase $wgAutoConfirmAge and $wgAutoConfirmCount (T262738) (duration: 00m 56s)
  • 11:09 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 11:05 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262298 # P12576
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0ee0d8f: [frwiktionary] Create new namespace "Conjugaison" & associated talk (T262298) (duration: 00m 56s)
  • 11:00 volans: Mass importing IPs from PuppetDB into Netbox T244153
  • 10:59 XioNoX: create LACP bundle to labtestvirt2003
  • 10:50 jbond42: enable git protocol version2 fleet wide
  • 10:43 effie: deploy scap 3.15.0-1 to canaries - T261234
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 09:27 akosiaris: T187984 migration script on otrs1001 now in step 8/44 (correction)
  • 09:26 akosiaris: T187984 migration script on otrs1001 now in step 8/41
  • 09:09 akosiaris: db1077. stop slave ; show slave status > /home/akosiaris/show_slave_status; reset slave all T187984
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2026 on es2 T261717', diff saved to https://phabricator.wikimedia.org/P12575 and previous config saved to /var/cache/conftool/dbconfig/20200914-085842-marostegui.json
  • 08:49 akosiaris: start the OTRS upgrade to 6.0.29 T187984
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12574 and previous config saved to /var/cache/conftool/dbconfig/20200914-084509-marostegui.json
  • 08:42 moritzm: upgrading remaining stretch systems to git 2.20 T262244
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12573 and previous config saved to /var/cache/conftool/dbconfig/20200914-083525-marostegui.json
  • 08:17 _joe_: restarting pybal on lvs2009
  • 08:16 _joe_: repooling mw2297
  • 08:14 _joe_: restarting php on mw2297, php-fpm stuck in SIGILL
  • 08:14 marostegui: Stop MySQL on db2125 for on-site maintenance - T260670
  • 08:12 _joe_: restarting pybal on lvs2010
  • 08:09 _joe_: restarting pybal on lvs1015
  • 08:05 godog: prometheus codfw ops, extend the lv by 100G
  • 08:04 marostegui: Stop MySQL on es2017 to clone es2027
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 to clone es2027 - T261717', diff saved to https://phabricator.wikimedia.org/P12572 and previous config saved to /var/cache/conftool/dbconfig/20200914-080344-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2018 as es3 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12571 and previous config saved to /var/cache/conftool/dbconfig/20200914-080239-marostegui.json
  • 07:58 _joe_: restarting pybal on lvs1015
  • 07:52 _joe_: restarting pybal on lvs1016
  • 07:40 jayme: shutting down etcd100[1-3] (sheduled for decommission, replaced by kubetcd100[4-6])
  • 07:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12570 and previous config saved to /var/cache/conftool/dbconfig/20200914-073919-marostegui.json
  • 06:56 elukey: slowly rollout ferm rules on Kafka-Jumbo hosts (see https://gerrit.wikimedia.org/r/611168)
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 05:54 elukey: execute "gnt-instance modify -B vcpus=4 an-tool1009.eqiad.wmnet" on ganeti1011 - T258768
  • 05:54 marostegui: Truncate tendril.general_log_sampled on db1115 - T262782
  • 05:47 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12569 and previous config saved to /var/cache/conftool/dbconfig/20200914-053844-marostegui.json

2020-09-13

  • 23:47 Urbanecm: Change email address of User:Oversight@enwiki to oversight-l@lists.wikimedia.org as part of OTRS downtime preparation (T262733)
  • 05:51 effie: sudo -i depool mw2297

2020-09-12

  • 01:07 mutante: people2001 - rsyncing user home dirs from people1002
  • 00:38 mutante: all issues with hosts doing stuff "on every run" have been fixed except one is left: analytics1034

2020-09-11

  • 22:54 mutante: starting people2001 VM
  • 17:30 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:22 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:12 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:55 jynus: starting snapshot of m2 from db1117
  • 08:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:59 XioNoX: remove BGP to AS64271 in AMS-IX (see peering@ email)
  • 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:17 moritzm: rebootin ldap-corp server for kernel update
  • 07:02 moritzm: remove git-core from stretch systems, it's a transition package no longer provided by the 2.20 backport from Buster
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:54 mutante: downtimes 48h for parse* hosts not in production yet but getting icinga checks from applied role
  • 01:53 mutante: ACKed alerts for eqiad power switches after making T262629
  • 01:53 mutante: initial puppet runs on parse2010 - parse2020, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:45 mutante: mw2296 - restarted php7.2-fpm
  • 01:42 mutante: mw2296 - systemctl restart apache2 - rescheduled icinga alerts for apache and php-fpm
  • 01:33 mutante: initial puppet runs on parse2001 - parse2010, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix (duration: 00m 07s)
  • 01:32 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20]: Simple hql syntax fix (duration: 08m 09s)
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:24 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20]: Simple hql syntax fix
  • 00:41 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca] (duration: 00m 08s)
  • 00:41 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca]
  • 00:40 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca] (duration: 08m 25s)
  • 00:38 mutante: generating mcrouter certs for parse2001 - parse2019 - mcrouter_generate_certs on puppetmaster1001 (T247441)
  • 00:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca]
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:01 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-09-10

  • 23:44 ejegg: updated payments-wiki from e41ab173e0 to 3c073a6a56
  • 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:43 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:31 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:11 ejegg: updated payments-wiki from be81063168 to e41ab173e0
  • 22:06 mutante: added mcrouter cert for parse2020, ran mcrouter_generate_certs
  • 21:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.8
  • 20:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:20 longma: correction: T257976 - 1.36.0-wmf.8 to all wikis
  • 20:20 longma: deploying 1.36.0-wmf.8 to all wikis
  • 20:02 krinkle@deploy1001: Synchronized php-1.36.0-wmf.8/includes/resourceloader/ResourceLoaderSkinModule.php: Ibe2c9f8d024f6 (duration: 01m 05s)
  • 19:44 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:12 mholloway-shell@deploy1001: Started restart [recommendation-api/deploy@db7fd80]: (no justification provided)
  • 19:07 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d2b57: Set $wgCategoryCollation = uca-tr on trwiktionary (T262163) (duration: 01m 05s)
  • 18:58 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262398
  • 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 09e487e: Add a new namespace to frwiktionary (T262398) (duration: 01m 04s)
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 03s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 07s)
  • 18:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 0cde0b1: Add throttle rule for Czech senior citizens course (T262415) (duration: 01m 05s)
  • 18:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:00 mutante: helium (former backup host) is being removed from ferm rules on all hosts, it was replaced by backup1001 (T260717)
  • 17:33 bblack: dns servers: upgrading remainder of fleet to gdnsd-3.3.0-1~wmf1
  • 16:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:25 bblack: authdns1001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:06 bblack: dns4001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:04 bblack: reprepro: uploaded gdnsd-3.3.0-1~wmf1 - T261340
  • 15:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:04 volans: uploaded cumin_4.0.0 to apt.wikimedia.org buster-wikimedia (no code changes)
  • 13:58 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:52 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:42 moritzm: rebooting etherpad1002 (etherpad.wikimedia.org) for kernel update
  • 13:24 moritzm: installing rake security updates on stretch
  • 13:10 ebernhardson: delete lldwiki_{content|general} indices from search.svc.{eqiad|codfw}.wmnet:9643 (psi), they should be on 9443 (omega)
  • 12:57 klausman: Ran puppet-merge to get my dotfiles from https://gerrit.wikimedia.org/r/c/operations/puppet/+/626367 out
  • 12:34 moritzm: installing firejail updates on maps/thumbor/restbase
  • 12:01 moritzm: upgrading deployment servers to git 2.20 T262244
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P12557 and previous config saved to /var/cache/conftool/dbconfig/20200910-113758-marostegui.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P12556 and previous config saved to /var/cache/conftool/dbconfig/20200910-113426-marostegui.json
  • 11:13 matthiasmullie: Euro B&C done
  • 11:13 moritzm: uploaded git 2.20.1-2+deb10u3~wmf1 to stretch-wikimedia/main T262244
  • 11:11 mlitn@deploy1001: Synchronized php-1.36.0-wmf.8//extensions/WikimediaEvents/: WikimediaEvents: Enable MediaSearch A/B test (duration: 01m 06s)
  • 10:42 duesen_: daniel@mwmaint2001:~$ mwscript maintenance/findBadBlobs.php jvwiki --revisions 214173 --mark T262457
  • 10:34 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:32 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 XioNoX: move VRRP master to cr2-esams
  • 10:21 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:45 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12555 and previous config saved to /var/cache/conftool/dbconfig/20200910-093106-marostegui.json
  • 09:26 dcausse: creating missing cirrus indices for jawikivoyage T262518
  • 09:24 dcausse: creating missing cirrus indices for jawikivoyage T260228
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12554 and previous config saved to /var/cache/conftool/dbconfig/20200910-091335-marostegui.json
  • 08:49 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:47 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12551 and previous config saved to /var/cache/conftool/dbconfig/20200910-082304-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12550 and previous config saved to /var/cache/conftool/dbconfig/20200910-073107-marostegui.json
  • 07:03 elukey: resize search-loader vms (+4 vcores +4GB of ram) on Ganeti - T262385
  • 05:29 marostegui: Deploy schema change on s3 master - T260476
  • 00:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master (duration: 06m 42s)
  • 00:24 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master
  • 00:23 twentyafterfour: done. Phabricator update complete
  • 00:23 twentyafterfour: applying database migrations to phabricator db
  • 00:09 twentyafterfour: deploying phabricator update 2020-09-10 https://phabricator.wikimedia.org/project/view/4755/

2020-09-09

  • 23:51 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915 (duration: 00m 05s)
  • 23:51 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915
  • 23:37 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/CirrusSearch/includes/Search/InterleavedResultSet.php: Repair passing interleaved search metrics from backend to frontend (duration: 01m 04s)
  • 20:13 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:625914 (duration: 01m 03s)
  • 20:03 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:626190 T261425 (duration: 01m 03s)
  • 20:01 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.8/skins/WikimediaApiPortal: Backport gerrit:626044, T261425 (duration: 01m 12s)
  • 19:11 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.8 (duration: 01m 03s)
  • 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.8
  • 18:19 _joe_: banning urls ^/api/rest_v1/page/mobile-html-offline-resources/ from varnish caches
  • 18:19 Urbanecm: Morning B&C window done
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b226330: Enable $wgAllowCrossOrigin on all wikis (T262425) (duration: 01m 04s)
  • 18:15 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 85e36ae: Enable MediaWiki client errors on commonswiki and metawiki (T255585) (duration: 01m 06s)
  • 18:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 02m 55s)
  • 17:59 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 06m 47s)
  • 17:52 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2 (duration: 09m 38s)
  • 17:42 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2
  • 17:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437 (duration: 06m 00s)
  • 17:35 ppchelko@deploy1001: Started deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:24 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:22 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:15 marostegui: Stop mysql on db2125 for on-site maintenance T260670
  • 16:10 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3] (duration: 00m 11s)
  • 16:10 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3]
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 bd808: scap3 of Striker to labweb1001 failing. Will investigate.
  • 16:05 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2] (duration: 00m 11s)
  • 16:05 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2]
  • 16:04 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) (duration: 01m 21s)
  • 16:03 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111)
  • 15:54 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:11 herron: prometheus1003: systemctl restart thanos-sidecar@ops.service
  • 14:29 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:22 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:00 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:57 marostegui: Restart mysql on db1115 T231769
  • 13:54 bblack: deployed https://gerrit.wikimedia.org/r/626153
  • 12:47 _joe_: restarting php-fpm on wtp2003
  • 12:46 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:37 cmjohnson1: beginning scheduled PDU maintenance racks D5 and D6 in eqiad
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12545 and previous config saved to /var/cache/conftool/dbconfig/20200909-123634-kormat.json
  • 12:31 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12544 and previous config saved to /var/cache/conftool/dbconfig/20200909-123109-kormat.json
  • 12:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 moritzm: installing zeromq security updates on Buster
  • 12:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:37 awight: EU Bacon complete
  • 11:34 awight@deploy1001: Synchronized wmf-config: Config: api-portal: required extended configuration (T261425) (duration: 01m 08s)
  • 11:15 moritzm: added Tobias Klausmann to pwstore
  • 11:14 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:03 marostegui: Stop MySQL on s2 eqiad master to prepare for the PDU maintenance (this will generate lag on s2 on labsdb) T261453
  • 10:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:28 volans: restarting ferm on failed hosts: an-test-master1001.eqiad.wmnet,an-worker1116.eqiad.wmnet,db[1075,1101,1116].eqiad.wmnet,labstore1007.wikimedia.org,logstash[1025,1030].eqiad.wmnet leftover from yesterday network issue
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:11 klausman: Rebooting stat1005 for clearing GPU status and testing new DKMS driver (T260442)
  • 10:09 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12542 and previous config saved to /var/cache/conftool/dbconfig/20200909-100157-kormat.json
  • 09:52 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12541 and previous config saved to /var/cache/conftool/dbconfig/20200909-095219-kormat.json
  • 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12540 and previous config saved to /var/cache/conftool/dbconfig/20200909-093353-kormat.json
  • 09:26 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12539 and previous config saved to /var/cache/conftool/dbconfig/20200909-092621-kormat.json
  • 09:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:11 moritzm: installing qemu security updates on Buster
  • 09:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:53 _joe_: restarting restbase on rb2009 (depooled)
  • 08:53 godog: upgrade kibana to 7.9.1 on the logstash7 cluster
  • 08:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12538 and previous config saved to /var/cache/conftool/dbconfig/20200909-085147-kormat.json
  • 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12537 and previous config saved to /var/cache/conftool/dbconfig/20200909-084433-kormat.json
  • 08:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12536 and previous config saved to /var/cache/conftool/dbconfig/20200909-083616-kormat.json
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:30 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12535 and previous config saved to /var/cache/conftool/dbconfig/20200909-083038-kormat.json
  • 08:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable DynamicPageList on ruwikinews (T262240) (duration: 01m 22s)
  • 07:25 elukey: restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage
  • 06:21 XioNoX: push new pfw policies - T262297
  • 01:58 eileen: civicrm revision changed from 4e40a59d42 to cc1f7e6d13, config revision is 4845a229dc

2020-09-08

  • 23:47 eileen: civicrm revision is 4e40a59d42, config revision is d26334fa36
  • 23:25 eileen: civicrm revision changed from 5e7352e2c3 to 4e40a59d42, config revision is 3cf0913789
  • 22:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:12 andrew@deploy1001: Finished deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update (duration: 03m 35s)
  • 22:08 andrew@deploy1001: Started deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update
  • 22:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:57 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks (duration: 00m 13s)
  • 21:57 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks
  • 19:19 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.8
  • 19:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s)
  • 18:22 elukey: rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305
  • 18:00 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.8
  • 17:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:54 Amir1: Deployed patch for T262240
  • 17:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:23 andrewbogott: rebooting cloudvirt1033
  • 17:03 klausman: attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs
  • 16:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - T251609 (duration: 00m 58s)
  • 16:34 herron: increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag)
  • 16:12 longma: 1.36.0-wmf.8 was branched at e81e81e for T257976
  • 16:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:03 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:02 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:34 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1004.*
  • 15:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.*
  • 15:30 elukey: roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed
  • 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 15:20 _joe_: restarted celery-ores-worker.service on ores1007
  • 15:19 _joe_: restarted ferm on wdqs1011
  • 15:18 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 15:16 _joe_: starting wdqs-updater on wdqs1005
  • 15:15 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
  • 15:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet
  • 15:14 bblack: repool cp1087-90 (eqiad row D)
  • 15:13 herron: rolling restart of elk5 logstashes
  • 15:10 marostegui: Start mysql on db1106 after PDU maintenance is done
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: service=kubesvc,name=kubernetes1013.*
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes1004.*
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 4 port 0
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 0 member 2 port 50
  • 15:02 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 1 port 1
  • 14:53 marostegui: Reload dbproxy1016 to recover the alert
  • 14:45 jynus: restarting bacula-dir @ backup1001
  • 14:44 XioNoX: reboot asw2-d3-eqiad
  • 14:33 moritzm: bouncing ferm on hosts where ferm.service failed due to DNS resolution issues for prometheus hosts
  • 14:31 volans: restarted ssh on mc1033 from console
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 1 member 4 port 0
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 0 member 2 port 50
  • 14:13 akosiaris: drain kubernetes1013, kubernetes1004. They are on row D
  • 14:13 bblack: dns1002 - disable puppet + bird service (stop advertising recdns from row D)
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1090.eqiad.wmnet
  • 13:59 bblack: depooling cp1087-1090
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp108[789].eqiad.wmnet
  • 13:57 XioNoX: asw2-d-eqiad> request system reboot member 3
  • 13:35 cmjohnson1: the power cable was not properly seated and lost power to asw2-d3-eqiad
  • 13:34 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 13:30 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:25 mateusbs17: Restarted puppetdb on deployment-puppetdb03 (T248041)
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:20 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:18 cmjohnson1: swapping pdu's in eqiad, mgmt for racks d3 and d4 will go down
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 13:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 13:14 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:13 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:12 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12523 and previous config saved to /var/cache/conftool/dbconfig/20200908-123546-kormat.json
  • 12:34 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:27 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12522 and previous config saved to /var/cache/conftool/dbconfig/20200908-122702-kormat.json
  • 12:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12521 and previous config saved to /var/cache/conftool/dbconfig/20200908-121139-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12520 and previous config saved to /var/cache/conftool/dbconfig/20200908-120419-kormat.json
  • 12:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:18 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:53 marostegui: Deploy schema change on s3 eqiad master - T253276
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:20 marostegui: Deploy schema change on s4 eqiad master - T253276
  • 10:14 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:11 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:08 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12519 and previous config saved to /var/cache/conftool/dbconfig/20200908-100852-kormat.json
  • 09:52 akosiaris: enable puppet, run it on all k8s eqiad nodes and double check that calico-node is fine T239835
  • 09:43 akosiaris: stopped calico-node and kube-apiserver on k8s nodes/masters T239835
  • 09:43 marostegui: Stop mysql on es2014 to clone es2026 T261717
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 - T261717', diff saved to https://phabricator.wikimedia.org/P12517 and previous config saved to /var/cache/conftool/dbconfig/20200908-093957-marostegui.json
  • 09:37 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs (#2), T261489"
  • 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:28 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12515 and previous config saved to /var/cache/conftool/dbconfig/20200908-092755-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:20 jayme: disabling puppted on argon.eqiad.wmnet,chlorine.eqiad.wmnet,kubernetes[1001-1016].eqiad.wmnet - Reinitialize eqiad k8s cluster with new etcd - T239835
  • 08:55 marostegui: Deploy schema change on s7 eqiad master - T253276
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2127's weight', diff saved to https://phabricator.wikimedia.org/P12514 and previous config saved to /var/cache/conftool/dbconfig/20200908-084834-marostegui.json
  • 08:45 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs, T261489"
  • 08:23 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
  • 08:22 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 08:21 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
  • 08:20 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 08:16 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:44 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (T250887; T262242) (duration: 00m 59s)
  • 07:44 elukey: roll restart kafka daemons on kafka-jumbo100[7-9] to pick up opendjk upgrades
  • 07:40 XioNoX: move HE from ix to transit BGP group on cr3-eqsin
  • 07:00 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:58 marostegui: Deploy schema change on s2 eqiad master - T253276
  • 06:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P12513 and previous config saved to /var/cache/conftool/dbconfig/20200908-065022-marostegui.json
  • 06:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:31 marostegui: Deploy schema change on s5 eqiad master - T253276
  • 06:23 elukey: roll restart of Hadoop master daemons on an-master100[1,2] to pick up new opejdk settings
  • 06:14 marostegui: Stop MySQL on db1106 for PDU maintenance T261452
  • 05:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime

2020-09-07

  • 23:35 Reedy: Deployed patch for T262213
  • 21:19 reedy@deploy1001: Synchronized private/PrivateSettings.php: Remove old mitigation (duration: 00m 55s)
  • 18:04 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12511 and previous config saved to /var/cache/conftool/dbconfig/20200907-153857-kormat.json
  • 15:32 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12510 and previous config saved to /var/cache/conftool/dbconfig/20200907-153206-kormat.json
  • 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12509 and previous config saved to /var/cache/conftool/dbconfig/20200907-152117-kormat.json
  • 15:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12508 and previous config saved to /var/cache/conftool/dbconfig/20200907-151718-kormat.json
  • 15:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:09 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12507 and previous config saved to /var/cache/conftool/dbconfig/20200907-150901-kormat.json
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 moritzm: rebooting poolcounter1004/1005
  • 15:03 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12506 and previous config saved to /var/cache/conftool/dbconfig/20200907-150310-kormat.json
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1133 from dbctl T253217', diff saved to https://phabricator.wikimedia.org/P12504 and previous config saved to /var/cache/conftool/dbconfig/20200907-143507-marostegui.json
  • 14:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 _joe_: restarting pybal in codfw to pick up the new mobileapps TLS endpoint
  • 13:44 _joe_: restarting pybal in eqiad to pick up the new mobileapps TLS endpoint
  • 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:28 hashar@deploy1001: Finished deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924 (duration: 00m 05s)
  • 13:27 hashar@deploy1001: Started deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:22 hashar@deploy1001: Finished deploy [integration/docroot@11ab4a0]: (no justification provided) (duration: 00m 10s)
  • 13:22 hashar@deploy1001: Started deploy [integration/docroot@11ab4a0]: (no justification provided)
  • 13:14 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:04 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:43 kormat@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 12:42 kormat@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:29 marostegui: Upgrade and reboot db2094 and db2095 (sanitarium hosts in codfw)
  • 12:18 gehel: restarting elasticsearch on elastic2029 (high GC)
  • 12:01 volans: restart uwsgi on debmonitor1002 to test db reconnection
  • 11:58 marostegui: Reboot pc1008 for upgrade
  • 11:36 Urbanecm: EU B&C done
  • 11:30 urbanecm@deploy1001: Synchronized docroot/noc/index.html: bbfe2ce: noc: Remove link to outdated blog (T259978) (duration: 00m 57s)
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: ff9f104: Update help URL (T256623) (duration: 00m 56s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b512d3: [hewiktionary] Enable wikilove (T262181) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 2/2) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 1/2) (duration: 01m 20s)
  • 11:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewiktionary wikilove # T262181
  • 11:01 marostegui: Reboot pc1007 for upgrade
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:36 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:12 dcausse@deploy1001: Finished deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server) (duration: 00m 33s)
  • 09:11 dcausse@deploy1001: Started deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server)
  • 09:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:49 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:29 jayme@deploy2001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:19 marostegui: Upgrade and restart pc1010
  • 08:18 jayme@deploy2001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:10 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:03 marostegui: Compress InnoDB on s8 eqiad master (db1109) - T232446
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P12501 and previous config saved to /var/cache/conftool/dbconfig/20200907-051157-marostegui.json
  • 04:56 marostegui: Compress InnoDB on s1 eqiad master - this will generate a few day of lag on s1 and labsdb for enwiki T254462
  • 04:53 marostegui: Deploy schema change on db1109 (eqiad wikidata master) - T256685

2020-09-06

  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2127's weight a bit', diff saved to https://phabricator.wikimedia.org/P12496 and previous config saved to /var/cache/conftool/dbconfig/20200906-194512-marostegui.json
  • 08:20 elukey: powercycle mw1360 (mgmt console available, network errors while running anything)
  • 08:04 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1360.eqiad.wmnet
  • 08:01 elukey: executed "sudo ipmitool -I lanplus -H mw1360.mgmt.eqiad.wmnet -U root mc reset cold" from cumin (mgmt not available for mw1360)

2020-09-05

  • 00:23 foks: removing 2 files for legal compliance

2020-09-04

  • 22:15 ryankemper: wdqs deploy complete, service is healthy
  • 21:54 ryankemper: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:52 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:49 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@c7e6b35]: 0.3.47 (duration: 12m 55s)
  • 21:37 ryankemper: Tests on canary `wdqs1003` passing, beginning full wdqs deploy
  • 21:36 ryankemper@deploy1001: Started deploy [wdqs/wdqs@c7e6b35]: 0.3.47
  • 21:31 ryankemper: `ryankemper@wdqs2002:~$ sudo systemctl restart wdqs-blazegraph`
  • 21:06 mutante: apt1001 - removed all libnginx-mod* packages except libnginx-mod-http-echo ; sudo apt-get autoremove ; run puppet ; restarted nginx - apt.wikimedia.org switched to nginx-light (T261962)
  • 21:02 mutante: apt1001 - remove all libnginx-mod* packages except libnginx-mod-http-echo
  • 20:59 mutante: apt2001 - sudo apt-get autoremove
  • 20:51 mutante: apt2001 - apt-get remove --purge libnginx* and run puppet to replace nginx-full with nginx-light (T261962)
  • 20:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 mutante: Icinga - ACKing with sticky - alerts on test and dev hosts
  • 18:10 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing (duration: 07m 35s)
  • 18:02 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing
  • 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12492 and previous config saved to /var/cache/conftool/dbconfig/20200904-102955-marostegui.json
  • 10:28 marostegui: Deploy MCR schema change on db1087 (sanitarium master), this will generate lag (probably a few days) on s8 labsdb hosts T238966
  • 09:48 marostegui: Restart prometheus-mysqld-exporter on db2125
  • 09:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 08:31 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:29 elukey: roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades
  • 08:08 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:30 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 05:13 marostegui: Deploy MCR schema change on s4 eqiad master T238966
  • 01:51 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints (duration: 63m 18s)
  • 01:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:30 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 01:23 ryankemper: (Following the restart of blazegraph, service has been restored to `wdqs2003`. See https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599182219699&to=1599182547699)
  • 01:16 ryankemper: Glancing at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599170628749&to=1599182011243, looks like `wdqs2003`'s blazegaph isn't happy based off the null data entries. Restarting blazegraph: `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph`
  • 00:48 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints

2020-09-03

  • 23:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9394739: Start logging log-ins on select wikis (T253802) (duration: 00m 56s)
  • 21:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:55 milimetric@deploy1001: deploy aborted: AQS: Deploying new geoeditors endpoints (duration: 00m 13s)
  • 19:54 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints
  • 19:07 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149] (duration: 00m 08s)
  • 19:07 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149]
  • 19:06 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149] (duration: 09m 06s)
  • 18:57 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149]
  • 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:46 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:02 papaul: power down ores2009 for DIMM upgrade
  • 16:45 papaul: power down ores2008 for DIMM upgrade
  • 16:33 papaul: power down ores2007 for DIMM upgrade
  • 16:24 elukey: roll restart aqs on aqs1* to pick up new druid settings
  • 16:05 papaul: power down ores2006 for DIMM upgrade
  • 15:51 papaul: power down ores2005 for DIMM upgrade
  • 15:33 papaul: power down ores2004 for DIMM upgrade
  • 15:30 moritzm: installing nginx updates on apt* and htmldumper1001
  • 15:25 moritzm: installing firejail update (along with restarts) on thumbor1001, maps1001, restbase1016 (and -dev)
  • 15:22 papaul: power down ores2003 for DIMM upgrade
  • 15:17 moritzm: installing firejail security updates on parsoid servers
  • 15:08 papaul: power down ores2002 for DIMM upgrade
  • 14:53 papaul: power down ores2001 for DIMM upgrade
  • 14:36 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:30 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:29 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 06s)
  • 14:29 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 14:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 marostegui: Failover m5 (wikitech) master - T260324
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:43 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 18s)
  • 13:43 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 13:40 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me (duration: 01m 29s)
  • 13:39 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me
  • 13:32 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host (duration: 00m 05s)
  • 13:32 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host
  • 13:08 marostegui: Start pre m5 failover steps T260324
  • 12:46 marostegui: Deploy MCR schema change on s7 eqiad master (lag might show up) - T238966
  • 12:30 hnowlan: enabling puppet on appservers, finished rollout of api.wikimedia.org https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'Shift weights in s2 codfw to account for db2125 being down T260670', diff saved to https://phabricator.wikimedia.org/P12485 and previous config saved to /var/cache/conftool/dbconfig/20200903-121916-kormat.json
  • 12:17 moritzm: installing openexr security updates for stretch
  • 12:03 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2125 after hw issue', diff saved to https://phabricator.wikimedia.org/P12483 and previous config saved to /var/cache/conftool/dbconfig/20200903-120304-kormat.json
  • 11:45 moritzm: installing net-snmp security updates on Stretch
  • 11:45 moritzm: installing net-snmp security updates on Buster
  • 11:33 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix | phaste # T260320 # P12481
  • 11:28 moritzm: installing PHP 7.0 security updates
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 04281a0: Add extra namespaces for jawikivoyage (T260320) (duration: 01m 01s)
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 976d735: Lift IP cap on 2020-09-08 for Senior Citizen Write Wikipedia course - cs.wikipedia (T261882) (duration: 01m 01s)
  • 11:21 gilles@deploy1001: Synchronized static/images/project-logos: T252108 Deploying lossily optimised Wikipedia logos (duration: 01m 20s)
  • 10:50 hnowlan: disabling apache on appservers for rollout of https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 10:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:07 XioNoX: re-apply vlan 1118 firewall filter and update OSPF/bootp on cr1/2-eqiad - T261866
  • 09:57 XioNoX: rectification: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 on cr1-eqiad - T261866
  • 09:56 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12480 and previous config saved to /var/cache/conftool/dbconfig/20200903-095510-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12479 and previous config saved to /var/cache/conftool/dbconfig/20200903-095015-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12478 and previous config saved to /var/cache/conftool/dbconfig/20200903-094857-marostegui.json
  • 09:48 XioNoX: move VRRP master from cr1-eqiad:ae2.1118 to cr2-eqiad:xe-3/0/4.1118 - T261866
  • 09:46 XioNoX: move vlan 1118 IPv4 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12477 and previous config saved to /var/cache/conftool/dbconfig/20200903-094435-marostegui.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12476 and previous config saved to /var/cache/conftool/dbconfig/20200903-094043-marostegui.json
  • 09:38 XioNoX: move vlan 1118 IPv6 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12475 and previous config saved to /var/cache/conftool/dbconfig/20200903-093629-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12474 and previous config saved to /var/cache/conftool/dbconfig/20200903-093454-marostegui.json
  • 09:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12473 and previous config saved to /var/cache/conftool/dbconfig/20200903-092549-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316 db2087:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12472 and previous config saved to /var/cache/conftool/dbconfig/20200903-092028-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12471 and previous config saved to /var/cache/conftool/dbconfig/20200903-091834-marostegui.json
  • 09:13 XioNoX: rolled back: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2122', diff saved to https://phabricator.wikimedia.org/P12470 and previous config saved to /var/cache/conftool/dbconfig/20200903-090901-marostegui.json
  • 09:06 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P12469 and previous config saved to /var/cache/conftool/dbconfig/20200903-090419-marostegui.json
  • 09:01 XioNoX: force ae2.1118 VRRP master on cr1-eqiad - T261866
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317, db1098:3316', diff saved to https://phabricator.wikimedia.org/P12468 and previous config saved to /var/cache/conftool/dbconfig/20200903-090007-marostegui.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3317', diff saved to https://phabricator.wikimedia.org/P12467 and previous config saved to /var/cache/conftool/dbconfig/20200903-085838-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12466 and previous config saved to /var/cache/conftool/dbconfig/20200903-085708-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12465 and previous config saved to /var/cache/conftool/dbconfig/20200903-084910-marostegui.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P12464 and previous config saved to /var/cache/conftool/dbconfig/20200903-084836-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317, db1090:3312', diff saved to https://phabricator.wikimedia.org/P12463 and previous config saved to /var/cache/conftool/dbconfig/20200903-084358-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12462 and previous config saved to /var/cache/conftool/dbconfig/20200903-084147-marostegui.json
  • 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 T261917', diff saved to https://phabricator.wikimedia.org/P12461 and previous config saved to /var/cache/conftool/dbconfig/20200903-082956-marostegui.json
  • 08:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:28 moritzm: rebooting mwmaint1002 for kernel update
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12460 and previous config saved to /var/cache/conftool/dbconfig/20200903-082655-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12459 and previous config saved to /var/cache/conftool/dbconfig/20200903-082034-marostegui.json
  • 08:16 marostegui: Upgrade db1101 (s7 and s8)
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12458 and previous config saved to /var/cache/conftool/dbconfig/20200903-081543-marostegui.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1101:3317', diff saved to https://phabricator.wikimedia.org/P12457 and previous config saved to /var/cache/conftool/dbconfig/20200903-081503-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12456 and previous config saved to /var/cache/conftool/dbconfig/20200903-081337-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12455 and previous config saved to /var/cache/conftool/dbconfig/20200903-080714-marostegui.json
  • 08:06 marostegui: Upgrade and reboot db1127
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12454 and previous config saved to /var/cache/conftool/dbconfig/20200903-080634-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12453 and previous config saved to /var/cache/conftool/dbconfig/20200903-080024-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12452 and previous config saved to /var/cache/conftool/dbconfig/20200903-075443-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12451 and previous config saved to /var/cache/conftool/dbconfig/20200903-074922-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12450 and previous config saved to /var/cache/conftool/dbconfig/20200903-074827-marostegui.json
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:45 marostegui: Upgrade and reboot db1094
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12449 and previous config saved to /var/cache/conftool/dbconfig/20200903-074426-marostegui.json
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12448 and previous config saved to /var/cache/conftool/dbconfig/20200903-073718-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12447 and previous config saved to /var/cache/conftool/dbconfig/20200903-073116-marostegui.json
  • 07:29 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12446 and previous config saved to /var/cache/conftool/dbconfig/20200903-072716-marostegui.json
  • 07:24 hashar: contint2001: restarting CI Jenkins for plugins upgrade
  • 07:19 marostegui: Deploy schema change on s8 eqiad master T237120
  • 07:18 marostegui: Stop slave on s8 eqiad master (lag will appear on s8 eqiad) - T237120
  • 07:02 marostegui: Stop db2100:3317 and db2121 in sync to reload metawiki.content T261869
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12445 and previous config saved to /var/cache/conftool/dbconfig/20200903-070104-marostegui.json
  • 06:56 hashar: contint2001: restarting CI Jenkins
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 _joe_: deployment of mobileapps to pick up changes to envoy config, new helmfile layout
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12444 and previous config saved to /var/cache/conftool/dbconfig/20200903-065105-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12443 and previous config saved to /var/cache/conftool/dbconfig/20200903-064804-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12442 and previous config saved to /var/cache/conftool/dbconfig/20200903-064623-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12441 and previous config saved to /var/cache/conftool/dbconfig/20200903-064334-marostegui.json
  • 06:24 marostegui: Disconnect eqiad -> codfw replication

2020-09-02

  • 22:55 shdubsh: restart rsyslog on centrallog[12]001
  • 22:27 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 22:26 ryankemper: Puppet finished on all external wdqs codfw nodes, nginx automatically reloaded as intended
  • 22:24 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo run-puppet-agent"`
  • 21:48 bd808@deploy1001: Finished deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810) (duration: 01m 34s)
  • 21:46 bd808@deploy1001: Started deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810)
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart nginx.service"`
  • 21:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 ryankemper: Restarted nginx on `wdqs2007`
  • 21:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 ryankemper: restarted blazegraph on `wdqs2001` as well
  • 20:46 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal and not P{wdqs2001.codfw.wmnet}' "sudo systemctl restart wdqs-blazegraph.service"` (restarted everything but 2001, will restart 2001 next)
  • 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:20 robh: scs-c1-eqiad firmware update complete and back online T238036
  • 19:14 robh: updating firmware on scs-c1-eqiad via T238036
  • 19:14 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (duration: 00m 32s)
  • 18:58 herron: freeing some disk space on centrallog1001 with 'tune2fs -m 0 /dev/centrallog1001-vg/data'
  • 18:43 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled, ouch, forgot to rebase (duration: 00m 55s)
  • 18:40 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled (duration: 00m 55s)
  • 18:38 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:37 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-codfw - T261865
  • 18:36 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:622897 Install OAuthRateLimiter extension II: Add flag to IS (duration: 00m 56s)
  • 18:34 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-eqiad - T261865
  • 18:33 ppchelko@deploy1001: Synchronized wmf-config/extension-list: (no justification provided) (duration: 00m 54s)
  • 18:32 ottomata: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:28 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Fix parsing localised digits in PHP discussion parser (duration: 00m 56s)
  • 18:19 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Re-apply new reply API patches (again) (duration: 00m 58s)
  • 17:34 bstorm: re-enabled puppet on labsdb10[09-12]
  • 17:28 bstorm: disabled puppet on labsdb10[09-12]
  • 17:18 herron: restarted elasticsearch on logstash1012
  • 16:39 Pchelolo: creating oauth_ratelimit_client_tier table T258711
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:32 hnowlan: Temporarily disabling apache for configuration change T246945
  • 15:24 godog: prometheus codfw lvextend --resizefs --size +50G /dev/mapper/vg--ssd-prometheus--k8s
  • 15:19 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:16 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main
  • 15:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:31 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main eqiad - T261865
  • 14:29 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main codfw - T261865
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12434 and previous config saved to /var/cache/conftool/dbconfig/20200902-141854-marostegui.json
  • 13:05 elukey: run kafka preferred-replica-election on kafka-main codfw
  • 12:07 XioNoX: move vrrp master from cr2-codfw to cr1-codfw
  • 11:52 duesen__: daniel@mwmaint2001:/srv/mediawiki/php-1.36.0-wmf.6$ mwscript findBadBlobs.php testwiki --mark T251778
  • 11:36 Urbanecm: EU B&C done
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 796b4fa: Add title for apiportalwiki (T246945) (duration: 00m 56s)
  • 11:34 Urbanecm: Fetched extra commits to deploy1001's stagging dir, commit messages explains it's an accident, continuing; cc Krinkle
  • 11:31 duesen__: Deployed second security fix for T260485
  • 11:07 XioNoX: repool cr1-eqiad
  • 10:58 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:49 XioNoX: reboot cr1-eqiad:re0 (backup)
  • 10:45 jbond42: install apache updates on buster
  • 10:36 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:35 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 10:34 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 10:32 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 10:31 jbond42: install apache updates on jessie
  • 10:27 XioNoX: reboot cr1-eqiad:re1 (backup)
  • 10:18 XioNoX: move VRRP master from cr1 to cr2
  • 10:16 XioNoX: drain cr1-eqiad transit/transport/IX
  • 10:13 XioNoX: drain cr1-eqiad-pfw3-eqiad link
  • 10:04 XioNoX: repool cr2-eqiad
  • 09:55 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:46 XioNoX: reboot cr2-eqiad:re0 (backup) - T259621
  • 09:28 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:18 XioNoX: reboot cr2-eqiad:re1 (backup) - T259621
  • 09:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:13 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:13 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:12 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:11 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:07 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 09:01 elukey: reimage kafka-jumbo1004 to Buster
  • 08:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from s10 - T260324', diff saved to https://phabricator.wikimedia.org/P12432 and previous config saved to /var/cache/conftool/dbconfig/20200902-085705-marostegui.json
  • 08:52 XioNoX: deactivate cr2-eqiad transit/IX - T259621
  • 08:50 XioNoX: drain cr2-eqiad transport links - T259621
  • 08:20 XioNoX: activate Telia BGP in eqiad
  • 07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 elukey: reimage kafka-jumbo1003 to buster
  • 07:28 marostegui: Reboot dbstore1003 for kernel upgrade - T261389
  • 07:12 XioNoX: configure cr2-eqiad:ae5 as single LACP link to Telia
  • 07:05 marostegui: Drop unused grants on m5 T261152
  • 07:02 elukey: reboot kafka-jumbo1002 to pick up new kernel settings
  • 07:00 XioNoX: deactivate Telia BGP in eqiad
  • 06:38 elukey: powercycle analytics1059 - cpu soft locks on multiple CPUs
  • 06:30 elukey: reboot kafka-jumbo1001 to pick up new kernel settings
  • 06:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .

2020-09-01

  • 22:39 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=sysop_itwiki Pierpao (T261722)
  • 17:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:36 ryankemper: wdqs [canary] rollback complete, tests passing now. Will need to dig into source of failure
  • 17:35 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@7920fbe]: 0.3.46 (duration: 03m 43s)
  • 17:35 ryankemper: `wdqs1003` (the canary instance) is failing tests now, going to rollback
  • 17:32 ryankemper@deploy1001: Started deploy [wdqs/wdqs@7920fbe]: 0.3.46
  • 17:30 ryankemper: Starting wdqs deploy
  • 15:56 chasemp: labsdb* puppet agent --test; sudo /usr/local/sbin/maintain-views --all-databases --table user --replace-all; sudo /usr/local/sbin/maintain-views --all-databases --table user_old --replace-all
  • 15:25 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:28 _joe_: restarting envoy on all eqiad jobrunners
  • 14:22 _joe_: restarted confd on mwmaint1002
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:18 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2083 weight', diff saved to https://phabricator.wikimedia.org/P12429 and previous config saved to /var/cache/conftool/dbconfig/20200901-141521-marostegui.json
  • 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:07 rzl@cumin1001: MediaWiki read-only period ends at: 2020-09-01 14:07:36.305500
  • 14:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: MediaWiki read-only period starts at: 2020-09-01 14:02:04.851006
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:58 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:58 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:51 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:45 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:44 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 10:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:48 XioNoX: reserve cr2-eqiad:xe-3/3/7 for new Telia port
  • 09:38 jayme: systemctl restart docker-reporter-releng-images.service on deneb to clear out alert because of temporary HTTP 504 from debmonitor
  • 09:01 moritzm: installing Java 8 sec updates on contint*
  • 08:51 moritzm: uploaded apache 2.4.10-10+deb8u16+wmf1 for jessie-wikimedia
  • 07:11 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 07:05 moritzm: restarting jenkins on releases1002 to pick up Java security updates
  • 06:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:44 elukey: reimage kafka-jumbo1002 to Buster
  • 06:20 marostegui: Install query killers on db2137:3314 T243373
  • 01:17 chaomodus: updated the pynetbox package to 5.0.7 and uploaded to buster
  • 00:02 mutante: wb2-grrrri was not running and wikibugs had no more Gerrit updates since a while
  • 00:01 mutante: restarting wikibugs

2020-08-31

  • 23:38 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final) (duration: 00m 17s)
  • 23:38 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final)
  • 23:37 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001 (duration: 01m 12s)
  • 23:36 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001
  • 23:36 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001 (duration: 00m 58s)
  • 23:35 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2 (duration: 00m 05s)
  • 23:31 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next (duration: 00m 57s)
  • 23:30 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next
  • 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable (future) mw-reverted tag for all wikis except testwiki (T254074) (duration: 00m 57s)
  • 21:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:20 ryankemper: `sudo systemctl restart elasticsearch_6@production-search-psi-eqiad.service` on `elastic1052.eqiad.wmnet`
  • 18:38 Urbanecm: Morning B&C done
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 16197aa: Add two domains to wgCopyUploadsDomains for commonswiki (T261562; T261575) (duration: 00m 54s)
  • 18:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb28e9d: itwiki: Assign patrol right to autopatrolled instead of autoconfirmed (T261587) (duration: 00m 53s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: a1b0d6e: b609cd5: CommonSettings.php: limit new Echos `push-subscription-manager` group to Meta-Wiki (T261625) (duration: 00m 54s)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 846c544: wgEventStreams: Stream for MEP-iOS pilot (T260382) (duration: 00m 55s)
  • 17:21 volans: uploaded spicerack_0.0.42 to apt.wikimedia.org buster-wikimedia
  • 15:50 rzl@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
  • 15:49 ejegg: updated payments-wiki from ef7ebd08cb to be81063168
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=99)
  • 15:32 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 14:58 ema: Traffic: depool eqiad from user traffic T243316
  • 14:38 moritzm: installing rake security updates on stretch
  • 14:33 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:21 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:20 rzl@cumin1001: Switching services apertium, termbox, search, api-gateway, ores, sessionstore, eventgate-main, graphoid, eventstreams, wikifeeds, wdqs, parsoid, eventgate-logging-external, wdqs-internal, echostore, mathoid, mobileapps, proton, restbase, kartotherian, recommendation-api, eventgate-analytics-external, restbase-async, citoid, schema, cxserver, eventgate-analytics, zotero: eqiad => codfw
  • 14:20 rzl@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:13 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:12 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=99)
  • 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 13:41 andrewbogott: dropping many databases from m5, as per T261152
  • 13:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:07 marostegui: Failover m3 (phabricator) proxy from dbproxy1016 to dbproxy1020 - T261459
  • 13:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:54 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:54 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:53 oblivian@cumin2001: Switching services parsoid: eqiad => codfw
  • 12:53 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:48 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:45 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:45 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:44 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:44 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:44 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:43 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:37 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:14 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:14 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:13 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:13 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:13 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:10 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:05 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 11:58 elukey: reimage kafka-jumbo1001 to Buster
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: 5d583d9: Disable MediaSearch A/B test (duration: 00m 55s)
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 81f88fd: Enable Signature button on Wikiproject for hywiki (T261550) (duration: 00m 54s)
  • 11:22 jbond42: removing old hiera version 1 and 3 backends
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b74893f: Enable sitenotice on mobile for closed wikis (T261357) (duration: 00m 56s)
  • 11:02 volans: upgraded spicerack to 0.0.41 on cumin hosts
  • 10:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:51 elukey: executed /srv/phab/phabricator/bin/remove destroy @klausman on phab1001 (following https://wikitech.wikimedia.org/wiki/Phabricator#Delete_a_user) to clear incosistent state of new account (wrong email address)
  • 08:43 moritzm: installing bind9 security updates on stretch/buster (client-side tools/libs only)
  • 07:53 volans: uploaded spicerack_0.0.41 to apt.wikimedia.org buster-wikimedia
  • 07:30 moritzm: installing squid security updates
  • 07:24 moritzm: installing openexr security updates on buster
  • 07:12 marostegui: Sanitize jawikivoyage on db2094:3325 and db1124:3325 T260482
  • 06:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:06 elukey: reimage kafka-jumbo1005 to Debian Buster
  • 05:21 marostegui: Reload haproxy on dbproxy1017 and dbproxy1021 to test db1128

2020-08-30

  • 16:13 herron: restarted eqiad v5 logstashes

2020-08-29

  • 18:05 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)
  • 17:45 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)

2020-08-28

  • 21:53 ryankemper: `sudo systemctl reload nginx.service` on `cloudelastic100[5,6].wikimedia.org` to try to resolve certificate warning issues
  • 19:11 andrewbogott: rebooting cloudvirt1006. It's a spare, unused system but showing a bus error and icinga alerts; not worth saving if it needs saving
  • 17:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:39 mutante: shutting down mw2196
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:40 rzl: switchdc live test complete
  • 16:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 16:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 16:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 16:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 16:33 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-28 16:29:24.432463
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-28 16:28:07.882663
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 16:19 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 16:19 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 16:13 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 16:12 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 16:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 16:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 16:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 16:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 16:06 rzl: starting one more live test of the data center switchover automation, no production impact is expected but there will be some SAL noise
  • 14:22 moritzm: installing Java security updates on kafka/main and Logstash(5) clusters
  • 13:35 hashar@deploy1001: Finished deploy [integration/docroot@65ec92c]: noop, sync up for README.md (duration: 00m 07s)
  • 13:35 hashar@deploy1001: Started deploy [integration/docroot@65ec92c]: noop, sync up for README.md
  • 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 elukey: stop kafka on kafka-jumbo1006 and reimage to buster
  • 12:56 moritzm: installing debmonitor1002 T261492
  • 12:46 moritzm: installing debmonitor2002 T261492
  • 11:50 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:27 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 09:48 jayme: updated helm to 2.16.9-3 on chartmuseum*, contint*, deploy*
  • 09:19 jayme: imported helm_2.16.9-3 to buster-wikimedia, stretch-wikimedia, jessie-wikimedia
  • 08:22 kormat: enabling replication from db2112 to db1083 (s1) T243373
  • 07:41 jynus: restart backup2001,backup1002
  • 07:10 jynus: restart db2139
  • 07:07 marostegui: Warm up parsercache in codfw - T260042
  • 06:47 jynus: restart db2102
  • 06:28 jynus: restart db2100
  • 06:07 jynus: restart db2099
  • 05:50 jynus: restart db2098
  • 00:06 eileen: process-control config revision is dd541a25dc

2020-08-27

  • 23:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:48 eileen: civicrm revision changed from a942537984 to 3d501e71d9, config revision is dd541a25dc
  • 22:54 eileen: civicrm revision changed from 481ab742db to a942537984, config revision is e2ab4d7c1f
  • 22:28 tzatziki: removing one file for legal compliance
  • 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 22:18 volans: uploaded spicerack_0.0.40-1_amd64.deb to apt.wikimedia.org buster-wikimedia
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:29 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:22 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:10 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw221[0-4].codfw.wmnet
  • 20:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw220[0-9].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw214[0-7].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw213[0-9].codfw.wmnet
  • 20:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Streams for testing MEP-based analytics instruments - T259714 (duration: 00m 55s)
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 marxarelli: 1.36.0-wmf.6 promoted to all wikis (T257974). new errors appear to be related to T261345 but are known since 1.36.0-wmf.5
  • 19:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw21[8-9][0-9]*.codfw.wmnet
  • 19:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.6
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
  • 19:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:16 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating apiportalwiki (T246945)
  • 19:15 urbanecm@deploy1001: Synchronized dblists: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:14 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:11 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 18:54 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 00m 08s)
  • 18:54 mforns@deploy1001: Started deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:53 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 10m 01s)
  • 18:43 mforns@deploy1001: Started deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:43 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Assign all homepage users to variant A (duration: 01m 03s)
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on ruwiki (T257490) (duration: 01m 03s)
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2250.codfw.wmnet,service=canary
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2249.codfw.wmnet,service=canary
  • 18:16 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:16 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:14 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=jobrunner,name=mw1318.eqiad.wmnet
  • 18:07 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw229[1-9].codfw.wmnet,cluster=api_appserver
  • 18:06 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2290.codfw.wmnet,cluster=api_appserver
  • 18:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw22[6-8][0-9].codfw.wmnet,cluster=api_appserver
  • 18:03 Urbanecm: Creating jawikivoyage is done (T260320)
  • 18:02 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
  • 18:02 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[0-9].codfw.wmnet,cluster=api_appserver
  • 18:00 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating jawikivoyage (T260320) (duration: 01m 02s)
  • 17:59 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw224[4-5].codfw.wmnet,service=canary
  • 17:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[4-5].codfw.wmnet
  • 17:59 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:58 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating jawikivoyage (T260320)
  • 17:57 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[0-3].codfw.wmnet
  • 17:56 urbanecm@deploy1001: Synchronized dblists: Creating jawikivoyage (T260320) (duration: 00m 58s)
  • 17:56 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw221[5-9].codfw.wmnet,service=canary
  • 17:55 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw221[5-9].codfw.wmnet
  • 17:55 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating jawikivoyage (T260320) (duration: 01m 07s)
  • 17:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw220[1-9].codfw.wmnet
  • 17:52 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw220[1-9].codfw.wmnet
  • 17:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2200.codfw.wmnet
  • 17:50 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2200.codfw.wmnet
  • 17:48 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw214[0-7].codfw.wmnet
  • 17:47 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw213[5-9].codfw.wmnet
  • 17:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw214[0-7].codfw.wmnet
  • 17:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw213[5-9].codfw.wmnet
  • 17:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw23[0-7][0-9].codfw.wmnet
  • 17:31 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[0-7].codfw.wmnet,service=canary
  • 17:30 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw227[0-7].codfw.wmnet
  • 17:29 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:29 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 17:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw226[8-9].codfw.wmnet
  • 17:13 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[4-8].codfw.wmnet
  • 17:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:11 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[0-2].codfw.wmnet
  • 17:04 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw223[2-9].codfw.wmnet
  • 17:01 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2231.codfw.wmnet
  • 16:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2230.codfw.wmnet
  • 16:54 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[4-9].codfw.wmnet
  • 16:49 mutante: re-weighted appservers and api appservers in eqiad - hardware type G = weight 25, all other types = weight 30 (T261159)
  • 16:48 mutante: depooling mw2187 - mw2199 - old codfw appservers of type A to be decom'ed, previously weight 10 (T260654)
  • 16:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw219[0-9].codfw.wmnet
  • 16:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw218[7-9].codfw.wmnet
  • 16:35 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1297.eqiad.wmnet
  • 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:21 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[0-5].eqiad.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw126[1-5].eqiad.wmnet,service=canary
  • 16:14 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw126[1-9].eqiad.wmnet
  • 16:12 elukey: remove some old/stale terms from analytics-in4 on cr1/cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622746, https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622744)
  • 16:09 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[6-9].eqiad.wmnet,service=canary
  • 16:08 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[6-9].eqiad.wmnet
  • 16:06 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1290.eqiad.wmnet
  • 16:05 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw128[0-9].eqiad.wmnet
  • 15:52 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1290.eqiad.wmnet
  • 15:51 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw128[0-9].eqiad.wmnet
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[7-9].eqiad.wmnet,service=canary
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1276.eqiad.wmnet,service=canary
  • 15:41 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw127[6-9].eqiad.wmnet
  • 15:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1297.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1269.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1267.eqiad.wmnet
  • 14:48 moritzm: installing Java security updates on aqs, hadoop and kafka-jumbo
  • 14:44 moritzm: restarting tomcat on idp-test* hosts to pick up Java update
  • 14:42 elukey: add eventgate-related terms to analytics-in4 filter on cr1/cr2-eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622705)
  • 14:37 moritzm: imported openjdk 8u265-b01-1~deb10u1 to buster-wikimedia (forward port of latest Java 8 security update)
  • 14:31 papaul: replacing msw-c5,c6,c7 and fmsw-c8
  • 13:58 kormat: disabling GTID on pc2007 (pc1), pc2008 (pc2), pc2009 (pc3) T243373
  • 13:56 kormat: disabling GTID on db2096 (x1), es2021 (es4), es2023 (es5) T243373
  • 13:54 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:53 kormat: disabling GTID on db2129 (s6), db2118 (s7), db2079 (s8) T243373
  • 13:52 kormat: disabling GTID on db2123 (s5) T243373
  • 13:52 kormat: disabling GTID on db2090 (s4)