Server Admin Log/Archive 41

From Wikitech
Jump to navigation Jump to search

2020-07-31

  • 23:48 ejegg: updated payments-wiki from c365c136d2 to cd012f37f1
  • 22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:03 mutante: wtp2019 - parsoid could not start after reimaging - was missing /etc/parsoid/config.yaml which is a symbolic link deep onto /srv/deployment/parsoid/deploy-cache/.. like in some other cases before manually deleted deploy-cache dir and ran puppet again .. T258775
  • 21:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2020.codfw.wmnet
  • 21:57 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp2019.codfw.wmnet
  • 21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
  • 21:36 mutante: [wtp2019:~] $ sudo rm -rf /srv/deployment/parsoid/deploy-cache
  • 21:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2018.codfw.wmnet
  • 20:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:11 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2017.codfw.wmnet
  • 18:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2016.codfw.wmnet
  • 17:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:45 mutante: rebooting / reinstalling OS on xhgui1001
  • 17:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:13 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 elukey: update cr1/cr2-eqiad's analytics filters (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/617649/)
  • 13:51 moritzm: installing cups security updates (client-side tools/libs only)
  • 13:20 moritzm: installing openjpeg2 security updates
  • 13:04 kormat: proudly uploaded version 0.1 of python3-wmfmariadbpy + wmfmariadbpy
  • 11:55 moritzm: installing mercurial security updates
  • 11:21 jynus: restart dbstore1004
  • 11:19 moritzm: installing ffmpeg security updates for jessie (standard version from security.debian.org, not the VP9-enabled component)
  • 11:16 moritzm: imported ffmpeg 3.2.15-0+deb9u1+wmf1 to component/vp9 for stretch-wikimedia T259336
  • 07:51 moritzm: updating lilypond on mw* servers
  • 07:50 moritzm: uploaded lilypond 2.19.81+really-2.18.2-13~bpo9+1+wmf1 to stretch-wikimedia T256877
  • 07:07 elukey: stop mysql replication on db1108; update port config for mysql instances and restart them; restart replication on instances
  • 06:32 elukey: roll restart of druid brokers on druid100[4-8] to pick up new changes
  • 06:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 05:59 moritzm: installing qemu updates on stretch
  • 04:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:57 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 02:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:47 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disable lilypond execution again (duration: 01m 10s)
  • 00:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/Echo/modules/mobile/notificationsFilterOverlay.js: T258954 (duration: 01m 06s)
  • 00:06 catrope@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Echo/modules/mobile/notificationsFilterOverlay.js: T258954 (duration: 01m 10s)
  • 00:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2015.codfw.wmnet

2020-07-30

  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2014.codfw.wmnet
  • 22:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.1
  • 21:52 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 21:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2013.codfw.wmnet
  • 21:41 mutante: revoking and resigning puppet cert for xhgui2001.codfw.wmnet T259206
  • 21:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/GrowthExperiments/: T258609 (duration: 01m 06s)
  • 21:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:39 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/skins/MinervaNeue/: T258939 (duration: 01m 08s)
  • 21:30 mutante: reinstalling xhgui2001
  • 21:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:10 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e797cf0]: 0.3.42 (duration: 24m 41s)
  • 20:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2012.codfw.wmnet
  • 20:45 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e797cf0]: 0.3.42
  • 20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:08 mutante: [wtp2012:~] $ sudo rm -rf /srv/deployment/parsoid/deploy-cache
  • 19:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2011.codfw.wmnet
  • 19:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.1
  • 19:10 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 19:01 mforns@deploy1001: Finished deploy [analytics/refinery@adb0d09]: Regular analytics weekly train [analytics/refinery@adb0d09b6584a7a26143623cf6173ae8983423e3] (duration: 10m 41s)
  • 18:59 mutante: imported twig (php-twig) into APT repo
  • 18:50 mforns@deploy1001: Started deploy [analytics/refinery@adb0d09]: Regular analytics weekly train [analytics/refinery@adb0d09b6584a7a26143623cf6173ae8983423e3]
  • 18:34 Urbanecm: Morning B&C done
  • 18:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/Kartographer/modules/box/Map.js: aa3dbd5: Disable panning and zooming until ready (T257872) (duration: 01m 06s)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617516: Add import sources for yuewiktionary | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617516; 617518: Fix definition of yuewiktionary import sources | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617518 # T258913 (duration: 01m 06s)
  • 18:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 14ef2ec: sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg (T259243; 2/2) (duration: 01m 06s)
  • 18:04 urbanecm@deploy1001: Synchronized static/favicon/wmf-blue.ico: 14ef2ec: sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg (T259243; 1/2) (duration: 01m 06s)
  • 17:04 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d3ab874]: airflow: refinery_drop_hive_partitions: Fix kerberos token passing (duration: 00m 55s)
  • 17:03 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d3ab874]: airflow: refinery_drop_hive_partitions: Fix kerberos token passing
  • 16:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:44 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:30 moritzm: installing squid security updates
  • 14:04 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:47 vgutierrez: upgrade acme-chief to version 0.27 - T255249
  • 13:47 vgutierrez: upload acme-chief 0.27 to apt.wm.o (buster) - T255249
  • 13:46 moritzm: installing qemu security updates on Buster
  • 13:02 jayme: imported chartmuseum_0.12.0-3 to buster-wikimedia
  • 12:07 elukey: upgrade of the druid public cluster (serving AQS) from 0.12.3 to 0.19
  • 11:53 urbanecm@deploy1001: Synchronized static/favicon/: c08f774: Revert "sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg" (T259243) (duration: 01m 06s)
  • 11:48 urbanecm@deploy1001: Synchronized static/favicon/wmf-blue.ico: 399e9c5: sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg (T259243; 1/2) (duration: 01m 06s)
  • 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fc48441: Add import sources to sysop_itwiki (T259243) (duration: 01m 08s)
  • 11:44 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: fc5de15: ClosedWikiProvider: Do not run when $wmgUseCentralAuth is false (T259246) (duration: 01m 07s)
  • 11:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7aa0c23: sysop_itwiki: Add several pages to wgWhitelistRead (T259243) (duration: 01m 06s)
  • 11:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5ea4bc8: sysop_itwiki: Add WP as an alias for NS_PROJECT (T259243) (duration: 01m 08s)
  • 10:49 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.2 (duration: 01m 07s)
  • 10:48 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.2
  • 10:38 marostegui: Reload haproxy on dbproxy1013 and dbproxy1015
  • 08:43 godog: flip smokeping/librenms from netmon2001 to netmon1002 - T247967
  • 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:31 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P12129 and previous config saved to /var/cache/conftool/dbconfig/20200730-071633-marostegui.json
  • 06:57 elukey: upload druid_0.19.0-1 packages to buster-wikimedia
  • 05:26 marostegui: Deploy MCR schema change on labswiki (wikitech) T238966
  • 02:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2010.codfw.wmnet
  • 01:53 dpifke@deploy1001: Finished deploy [performance/arc-lamp@ad87f69]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/615302 (duration: 00m 05s)
  • 01:53 dpifke@deploy1001: Started deploy [performance/arc-lamp@ad87f69]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/615302
  • 01:37 eileen: civicrm revision changed from cc5d17fbaf to 150c3476c4, config revision is b6ece03513
  • 01:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2009.codfw.wmnet
  • 01:22 mutante: imported in apt.wikimedia.org for buster: php-slim, php-slim-views, php-perftools-xhgui-collector, php-pimple, php-psr-http-server-middleware, php-psr-http-server-handler, xhgui
  • 01:07 mholloway-shell@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/JsonConfig: Backport: Implement GetContentModels hook (T259126) (duration: 01m 07s)
  • 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2020-07-29

  • 23:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2008.codfw.wmnet
  • 23:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=mswiktionary --fix (T255391)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 396a395: Add several extra namespaces for mswiktionary (T255391) (duration: 01m 07s)
  • 22:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2006.codfw.wmnet
  • 22:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2007.codfw.wmnet
  • 22:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:35 crusnov@deploy1001: Finished deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next pt2 (duration: 00m 05s)
  • 20:35 crusnov@deploy1001: Started deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next pt2
  • 20:35 crusnov@deploy1001: Finished deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next (duration: 01m 12s)
  • 20:34 crusnov@deploy1001: Started deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next
  • 20:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2004.codfw.wmnet
  • 19:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:44 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:41 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:29 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:27 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 19:19 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:18 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:04 qchris: Restarting Gerrit on gerrit2001 (gerrit-replica) to make security fix effective.
  • 19:04 qchris@deploy1001: Finished deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit2001 (duration: 00m 09s)
  • 19:03 qchris@deploy1001: Started deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit2001
  • 19:00 qchris: Restarting Gerrit on gerrit1001 to make security fix effective.
  • 19:00 qchris@deploy1001: Finished deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit1001 (duration: 00m 08s)
  • 19:00 qchris@deploy1001: Started deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit1001
  • 18:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:32 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 Urbanecm: Morning B&C window is done
  • 18:13 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/DiscussionTools/: 00ecec8: Revert new reply API for now (T252558) (duration: 01m 06s)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d54f041: Enable Translate extension at plwikimedia (T259087) (duration: 01m 08s)
  • 18:07 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: a237f5b: Move VisualEditor from beta to default on enwikiversity (T258992) (duration: 01m 06s)
  • 18:05 Urbanecm: Create tables for Translate extension in plwikimedia (T259087)
  • 18:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2003.codfw.wmnet
  • 17:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet
  • 17:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:16 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:02 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617167: Revert "Set muswiki to read only" | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617167 (T259004) (duration: 01m 06s)
  • 15:44 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:33 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[0|1] wikis to 1.36.0-wmf.1"
  • 15:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617152: Set muswiki to read only | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617152 (T259004) (duration: 01m 08s)
  • 15:10 jayme: imported docker-report_0.0.8-1 to buster-wikimedia
  • 14:49 moritzm: installing ruby-json security updates
  • 14:34 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:30 jbond42: install curl security update for jessie
  • 14:29 moritzm: installing exiv2 security updates
  • 14:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:55 volans: migrating *all* codfw mgmt DNS records to the autogenerated ones via Netbox - T233183
  • 13:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet
  • 13:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:05 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.2 (duration: 01m 07s)
  • 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.2
  • 13:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:58 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:56 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:49 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 12:44 moritzm: imported curl 7.38.0-4+deb8u16+wmf1 to apt.wikimedia.org (jessie-wikimedia) T259102
  • 12:30 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 21s)
  • 12:28 urbanecm@deploy1001: Synchronized langlist: Creating avkwiki (T257943) (duration: 01m 05s)
  • 12:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating avkwiki (T257943) (duration: 01m 03s)
  • 12:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating avkwiki (T257943) (duration: 01m 06s)
  • 12:24 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating avkwiki (T257943)
  • 12:15 urbanecm@deploy1001: Synchronized dblists: Creating avkwiki (T257943) (duration: 01m 06s)
  • 12:14 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating avkwiki (T257943) (duration: 01m 06s)
  • 12:12 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating avkwiki (T257943) (duration: 01m 05s)
  • 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:07 moritzm: rebooting idp2001 for kernel update
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 252bb6c: Add Wikipedia wordmark for trwiki (T255489; sync 2/2) (duration: 01m 05s)
  • 11:39 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-tr.svg: 252bb6c: Add Wikipedia wordmark for trwiki (T255489; sync 1/2) (duration: 01m 06s)
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9f7e032: Fix overindentation (duration: 01m 08s)
  • 11:11 Lucas_WMDE: EU B&C window done
  • 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' 'wuuwiki.png' 'wuuwiki-1.5x.png' 'wuuwiki-2x.png' | mwscript purgeList.php # T259005
  • 11:08 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: Change the logo for Wu Wikipedia (T259005) (duration: 01m 08s)
  • 10:40 vgutierrez: rolling upgrade of ATS to version 8.0.8-1wm2
  • 10:21 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: do not offer .ly downloads (duration: 01m 07s)
  • 10:19 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/extension.json: do not offer .ly downloads (duration: 01m 20s)
  • 10:12 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp3064 and cp3065
  • 09:44 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp5006 and cp5012
  • 09:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:20 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:16 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp4026 and cp4032
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112', diff saved to https://phabricator.wikimedia.org/P12115 and previous config saved to /var/cache/conftool/dbconfig/20200729-091528-marostegui.json
  • 09:15 vgutierrez: upload trafficserver 8.0.8-1wm2 to apt.wm.o (buster)
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P12114 and previous config saved to /var/cache/conftool/dbconfig/20200729-091319-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P12113 and previous config saved to /var/cache/conftool/dbconfig/20200729-091006-marostegui.json
  • 08:55 marostegui: The above was db1112
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121', diff saved to https://phabricator.wikimedia.org/P12112 and previous config saved to /var/cache/conftool/dbconfig/20200729-085504-marostegui.json
  • 08:42 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp2001.codfw.wmnet
  • 08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:05 marostegui: Deploy MCR schema change on db1121 (lag will show up on s4), also remove triggers on db1124:3314
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P12111 and previous config saved to /var/cache/conftool/dbconfig/20200729-080442-marostegui.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1141', diff saved to https://phabricator.wikimedia.org/P12110 and previous config saved to /var/cache/conftool/dbconfig/20200729-080318-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12109 and previous config saved to /var/cache/conftool/dbconfig/20200729-075558-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12108 and previous config saved to /var/cache/conftool/dbconfig/20200729-074828-marostegui.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12107 and previous config saved to /var/cache/conftool/dbconfig/20200729-074414-marostegui.json
  • 06:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:26 XioNoX: standardize mr1-eqiad interfaces
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P12106 and previous config saved to /var/cache/conftool/dbconfig/20200729-062224-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P12105 and previous config saved to /var/cache/conftool/dbconfig/20200729-062009-marostegui.json
  • 06:16 XioNoX: standardize mr1-codfw interfaces
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P12104 and previous config saved to /var/cache/conftool/dbconfig/20200729-061450-marostegui.json
  • 06:05 XioNoX: standardize mr1-ulsfo interfaces
  • 06:01 legoktm: ssh doc1001.eqiad.wmnet sudo -u doc-uploader git -C /srv/docroot pull
  • 05:52 XioNoX: standardize mr1-eqsin interfaces
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P12103 and previous config saved to /var/cache/conftool/dbconfig/20200729-050346-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P12102 and previous config saved to /var/cache/conftool/dbconfig/20200729-050247-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1142', diff saved to https://phabricator.wikimedia.org/P12101 and previous config saved to /var/cache/conftool/dbconfig/20200729-050204-marostegui.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P12100 and previous config saved to /var/cache/conftool/dbconfig/20200729-045859-marostegui.json
  • 02:19 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enable lilypond in safe mode (duration: 01m 09s)
  • 01:47 tstarling@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/Score/includes/Score.php: work around firejail bug (duration: 01m 07s)
  • 01:45 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: work around firejail bug (duration: 01m 08s)
  • 01:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1048.eqiad.wmnet
  • 01:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1047.eqiad.wmnet
  • 00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1046.eqiad.wmnet
  • 00:48 ryankemper: sudo -E cumin -b 10 'A:wdqs-all' 'sudo run-puppet-agent'
  • 00:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime

2020-07-28

  • 23:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:37 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: reduce mlr window size on enwiki (duration: 01m 05s)
  • 23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:34 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: cirrus: reduce mlr window size on enwiki (duration: 01m 06s)
  • 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unused setting $wgGEHomepageSuggestedEditsNewAccountInitiatedPercentage (no-op) (duration: 01m 06s)
  • 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp1046.eqiad.wmnet
  • 22:19 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1044.eqiad.wmnet
  • 21:27 dancy@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:24 dancy@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 21:17 dancy@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 20:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:02 eileen: process-control config revision is b6ece03513
  • 19:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:25 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:24 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 19:23 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P12097 and previous config saved to /var/cache/conftool/dbconfig/20200728-191926-marostegui.json
  • 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1147', diff saved to https://phabricator.wikimedia.org/P12096 and previous config saved to /var/cache/conftool/dbconfig/20200728-191237-marostegui.json
  • 19:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@69bbbbb]: airflow: drop_old_data_daily: top_queries table renamed to fulltext_head_queries (duration: 00m 53s)
  • 19:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@69bbbbb]: airflow: drop_old_data_daily: top_queries table renamed to fulltext_head_queries
  • 19:09 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12095 and previous config saved to /var/cache/conftool/dbconfig/20200728-190933-marostegui.json
  • 19:06 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12094 and previous config saved to /var/cache/conftool/dbconfig/20200728-190517-marostegui.json
  • 19:03 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12093 and previous config saved to /var/cache/conftool/dbconfig/20200728-190137-marostegui.json
  • 18:35 cdanis: βœ”οΈ cdanis@lvs1015.eqiad.wmnet ~ πŸ•β˜• sudo ipvsadm -D -t 10.2.2.51:9283
  • 18:29 cdanis: ❌cdanis@lvs1016.eqiad.wmnet ~ πŸ•β˜• sudo ipvsadm -D -t 10.2.2.51:9283
  • 18:29 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/GrowthExperiments/extension.json: Fix reference to MentorChangeLogFormatter (T259041) (duration: 01m 05s)
  • 18:20 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op sync for wmgUseWikimediaApiPortal and wmgUseWikimediaApiPortalOAuth (2 of 2) (duration: 00m 58s)
  • 18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No-op sync for wmgUseWikimediaApiPortal and wmgUseWikimediaApiPortalOAuth (1 of 2) (duration: 01m 05s)
  • 18:16 cdanis: primary pybal restart βœ”οΈ cdanis@lvs1015.eqiad.wmnet ~ πŸ•‘β˜• sudo systemctl restart pybal.service
  • 18:14 cdanis: backup pybal restart: βœ”οΈ cdanis@lvs1016.eqiad.wmnet ~ πŸ•‘β˜• sudo systemctl restart pybal.service
  • 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:05 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/includes/libs/filebackend/SwiftFileBackend.php: Fix index error in SwiftFileBackend (T259023) (duration: 01m 07s)
  • 17:46 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 17:46 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 17:41 volans: run apt-get clean on wtp[1046,1048].eqiad.wmnet and wtp2001.codfw.wmnet to free ~`2GB as they were 100% - T258775
  • 17:33 XioNoX: standardize mr1-esams interfaces
  • 17:30 brennen@deploy1001: sync aborted: (no justification provided) (duration: 28m 53s)
  • 17:03 brennen: prior scap sync for https://gerrit.wikimedia.org/r/c/mediawiki/core/+/616842 (T259023)
  • 17:02 brennen@deploy1001: Started scap: (no justification provided)
  • 16:51 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@0982d4e]: convert_to_esbulk: repair variable ref before assign (duration: 04m 33s)
  • 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:47 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@0982d4e]: convert_to_esbulk: repair variable ref before assign
  • 16:45 XioNoX: remove mr1-codfw source NAT (not used)
  • 16:43 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1045.eqiad.wmnet
  • 16:39 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:33 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet
  • 16:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet
  • 16:31 XioNoX: mr1-eqiad# delete security nat source rule-set mgmt-to-untrust (unused, no matching ACL)
  • 16:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 hnowlan: imported envoyproxy 1.15.0-1 deb into component/envoy-future for buster-wikimedia
  • 16:11 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1042.eqiad.wmnet
  • 16:09 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1043.eqiad.wmnet
  • 15:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:50 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:45 jayme@cumin1001: conftool action : set/pooled=no; selector: name=wtp1035.*
  • 15:44 jayme@cumin1001: conftool action : set/pooled=no; selector: name=wtp1034.*
  • 15:35 ayounsi@deploy1001: Finished deploy [homer/deploy@5e999c8]: once more (duration: 03m 06s)
  • 15:32 ayounsi@deploy1001: Started deploy [homer/deploy@5e999c8]: once more
  • 15:32 ayounsi@deploy1001: Finished deploy [homer/deploy@5e999c8]: CR613642 (duration: 03m 38s)
  • 15:31 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1045.eqiad.wmnet
  • 15:30 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1041.eqiad.wmnet
  • 15:30 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1044.eqiad.wmnet
  • 15:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1039.eqiad.wmnet
  • 15:28 ayounsi@deploy1001: Started deploy [homer/deploy@5e999c8]: CR613642
  • 15:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:16 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 15:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:14 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 15:13 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 15:08 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR613642 (duration: 02m 14s)
  • 15:06 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR613642
  • 15:01 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR613642 (duration: 00m 11s)
  • 15:01 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR613642
  • 14:58 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1043.eqiad.wmnet
  • 14:58 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1040.eqiad.wmnet
  • 14:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 14:55 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1042.eqiad.wmnet
  • 14:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet
  • 14:52 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 14:48 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 14:23 herron: bounced centrallog rsyslog services in codfw/eqiad
  • 14:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P12087 and previous config saved to /var/cache/conftool/dbconfig/20200728-140313-marostegui.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P12086 and previous config saved to /var/cache/conftool/dbconfig/20200728-140249-marostegui.json
  • 14:02 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148', diff saved to https://phabricator.wikimedia.org/P12085 and previous config saved to /var/cache/conftool/dbconfig/20200728-140220-marostegui.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148', diff saved to https://phabricator.wikimedia.org/P12084 and previous config saved to /var/cache/conftool/dbconfig/20200728-140207-marostegui.json
  • 14:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:58 moritzm: installing perl security updates
  • 13:56 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:55 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1041.eqiad.wmnet
  • 13:55 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1037.eqiad.wmnet
  • 13:50 godog: remove stale ipvs thanos-query service on port 80
  • 13:39 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1040.eqiad.wmnet
  • 13:38 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1036.eqiad.wmnet
  • 13:38 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1039.eqiad.wmnet
  • 13:37 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet
  • 13:37 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1038.eqiad.wmnet
  • 13:36 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet
  • 13:29 godog: roll-restart pybal on eqiad lvs low-traffic to change port for thanos-query
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P12083 and previous config saved to /var/cache/conftool/dbconfig/20200728-132520-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 with less weight', diff saved to https://phabricator.wikimedia.org/P12082 and previous config saved to /var/cache/conftool/dbconfig/20200728-132023-marostegui.json
  • 13:09 godog: roll-restart pybal on lvs low-traffic to apply thanos-query changes
  • 13:04 XioNoX: standardize cr3-esams interfaces
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.2
  • 12:41 XioNoX: standardize cr2-esams interfaces
  • 12:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:36 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P12081 and previous config saved to /var/cache/conftool/dbconfig/20200728-123201-marostegui.json
  • 12:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:26 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:24 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:17 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1037.eqiad.wmnet
  • 12:14 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1036.eqiad.wmnet
  • 12:08 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1035.eqiad.wmnet
  • 12:07 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1032.eqiad.wmnet
  • 12:07 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1033.eqiad.wmnet
  • 12:05 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet
  • 12:04 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1034.eqiad.wmnet
  • 12:04 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disabling lilypond rendering in Score again due to error running gs (duration: 01m 05s)
  • 11:56 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling Score in safe mode (duration: 01m 04s)
  • 11:50 Urbanecm: EU B&C window done
  • 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1a56726: Add Turkish powered by MW and Wikimedia project icons (T257732) (duration: 00m 59s)
  • 11:46 urbanecm@deploy1001: Synchronized static/images/footer/: 1a56726: Add Turkish powered by MW and Wikimedia project icons (T257732) (duration: 01m 01s)
  • 11:43 urbanecm@deploy1001: Synchronized static/images: df9b9ac: Move footer logos to /static/images/footer (T257732) (duration: 01m 02s)
  • 11:38 marostegui: Deploy schema change on s3 codfw, this will generate lag on codfw T256682
  • 11:38 ema: A:cp-text varnish ban pt.wikiversity.org T256750
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: df9b9ac: Move footer logos to /static/images/footer (T257732) (duration: 00m 58s)
  • 11:36 ema: A:cp-text varnish ban fr.wiktionary.org T256750
  • 11:35 urbanecm@deploy1001: Synchronized static/images/footer: df9b9ac: Move footer logos to /static/images/footer (T257732) (duration: 01m 05s)
  • 11:34 ema: A:cp-text varnish ban eu.wikipedia.org T256750
  • 11:32 ema: A:cp-text varnish ban he.wikipedia.org T256750
  • 11:30 marostegui: Deploy MCR change on db1143, db1148, db1146:3314
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12079 and previous config saved to /var/cache/conftool/dbconfig/20200728-113009-marostegui.json
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 04c7ef9: Undeploy graphoid for phase 2 wikis (T258463) (duration: 01m 00s)
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1143', diff saved to https://phabricator.wikimedia.org/P12078 and previous config saved to /var/cache/conftool/dbconfig/20200728-112850-marostegui.json
  • 11:25 ema: A:cp-text varnish ban fa.wikipedia.org T256750
  • 11:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] use more neutral config var names (duration: 01m 06s)
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12077 and previous config saved to /var/cache/conftool/dbconfig/20200728-112046-marostegui.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12076 and previous config saved to /var/cache/conftool/dbconfig/20200728-111522-marostegui.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12075 and previous config saved to /var/cache/conftool/dbconfig/20200728-111226-marostegui.json
  • 11:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:10 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: gerrit:614890 desktop improvements by default for testing group (round 2) (T254227) (duration: 01m 06s)
  • 11:09 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 hashar@deploy1001: Finished deploy [integration/docroot@ba85bdf]: Catch up with HEAD and support DOCUMENT_ROOT being a symbolic link for T149924 (duration: 00m 06s)
  • 10:56 hashar@deploy1001: Started deploy [integration/docroot@ba85bdf]: Catch up with HEAD and support DOCUMENT_ROOT being a symbolic link for T149924
  • 10:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:53 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:50 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1033.eqiad.wmnet
  • 10:48 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1030.eqiad.wmnet
  • 10:48 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1029.eqiad.wmnet
  • 10:47 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1032.eqiad.wmnet
  • 10:47 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1028.eqiad.wmnet
  • 10:33 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1031.eqiad.wmnet
  • 10:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12074 and previous config saved to /var/cache/conftool/dbconfig/20200728-102342-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12072 and previous config saved to /var/cache/conftool/dbconfig/20200728-100412-marostegui.json
  • 09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:55 XioNoX: standardize cr2-esams interfaces
  • 09:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:50 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:47 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:43 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:35 moritzm: imported libmysqlclient18 to component/cloudera T258768
  • 09:31 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1030.eqiad.wmnet
  • 09:28 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1029.eqiad.wmnet
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12070 and previous config saved to /var/cache/conftool/dbconfig/20200728-092606-marostegui.json
  • 09:24 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1028.eqiad.wmnet
  • 09:19 XioNoX: standardize cr3-eqsin interfaces
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12069 and previous config saved to /var/cache/conftool/dbconfig/20200728-091849-marostegui.json
  • 09:18 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1027.eqiad.wmnet
  • 09:10 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet
  • 09:07 ema: cp3050: restart varnishmtail.service, stuck on "Condition(c->offset <= c->vtx->len) not true."
  • 08:39 XioNoX: standardize cr2-eqsin interfaces
  • 08:38 godog: temporary downgrade prometheus-snmp-exporter on netmon2001
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12067 and previous config saved to /var/cache/conftool/dbconfig/20200728-083336-marostegui.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P12066 and previous config saved to /var/cache/conftool/dbconfig/20200728-083209-marostegui.json
  • 08:20 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.2 (duration: 53m 11s)
  • 08:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:06 godog: failover librenms/smokeping to netmon2001 - T247967
  • 08:04 marostegui: Reduce labsdb1009 weight
  • 07:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:48 jayme: depooled wtp1026.eqiad.wmnet for reimage
  • 07:48 moritzm: switched superset to CAS
  • 07:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:46 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:43 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:31 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet
  • 07:27 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.2
  • 07:03 liw: 1.36.0-wmf.2 was branched at 04e863f for T257970
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12065 and previous config saved to /var/cache/conftool/dbconfig/20200728-051928-marostegui.json
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314 and restore db1146:3314 original weight', diff saved to https://phabricator.wikimedia.org/P12064 and previous config saved to /var/cache/conftool/dbconfig/20200728-051813-marostegui.json
  • 02:17 eileen: process-control config revision is 6811ca294a - just delayed silverpop_daily a bit as clashing with dedupe
  • 00:18 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephmon1003.eqiad.wmnet
  • 00:17 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephmon1003.eqiad.wmnet

2020-07-27

  • 23:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ac8e5d0]: airflow: head queries report, managed variables, refinery-drop-hive-partitions support (duration: 00m 54s)
  • 23:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ac8e5d0]: airflow: head queries report, managed variables, refinery-drop-hive-partitions support
  • 23:28 mutante: otrs1001 - ran puppet (it was alerting in icinga that puppet failed, but it was neither disabled nor failing and changed nothing when it ran)
  • 21:31 sbassett@deploy1001: Synchronized wmf-config/CommonSettings.php: Deployed CentralNotice CSP conifg change for T258459 (duration: 00m 57s)
  • 21:10 sbassett: Deployed mitigations for T238075
  • 20:41 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/InterwikiSorting/: c5f6c97: Use LanguageLinksHook to sort interwiki links (T257625) (duration: 00m 59s)
  • 19:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:44 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:36 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 19:23 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 19:19 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 19:11 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 19:06 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 19:00 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 18:57 urbanecm@deploy1001: sync-file aborted: 3833b13: Move footer logos to /static/images/footer (T257732) (duration: 00m 04s)
  • 18:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: c6a9674: Move footer logos to wmg* variables (T257732) (duration: 00m 56s)
  • 18:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 00m 57s)
  • 18:49 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 18:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c6a9674: Move footer logos to wmg* variables (T257732) (duration: 00m 57s)
  • 18:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable desktop web UI click tracking instrumentation on frwiki, hewiki, fawiki (T258058) (duration: 00m 56s)
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove WPBSkinBlacklist (T254675) (duration: 00m 57s)
  • 17:42 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.1
  • 17:30 liw: promoting train to group2
  • 17:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 17:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:14 dpifke@deploy1001: Finished deploy [performance/arc-lamp@f14888b]: Deploying arclamp-compress-logs (T235456) (duration: 00m 05s)
  • 17:14 dpifke@deploy1001: Started deploy [performance/arc-lamp@f14888b]: Deploying arclamp-compress-logs (T235456)
  • 16:59 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1002.eqiad.wmnet
  • 16:58 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephmon1002.eqiad.wmnet
  • 16:57 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephmon1002.eqiad.wmnet
  • 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1003.eqiad.wmnet
  • 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1002.eqiad.wmnet
  • 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1001.eqiad.wmnet
  • 16:50 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1003.eqiad.wmnet
  • 16:50 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1002.eqiad.wmnet
  • 16:49 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1001.eqiad.wmnet
  • 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1003.wikimedia.org
  • 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1002.wikimedia.org
  • 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1001.wikimedia.org
  • 16:48 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1001.eqiad.wmnet
  • 16:48 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1002.eqiad.wmnet
  • 16:47 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1003.eqiad.wmnet
  • 16:44 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cumin1001.eqiad.wmnet
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3316, db2087:3317 after on-site maintenance T258587', diff saved to https://phabricator.wikimedia.org/P12063 and previous config saved to /var/cache/conftool/dbconfig/20200727-163311-marostegui.json
  • 16:05 marostegui: Will show up on labsdb hosts for s5
  • 16:04 marostegui: Stop MySQL on db1082 for onsite maintenance - T258910
  • 15:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1146:3314 weight while db1144:3314 is depooled', diff saved to https://phabricator.wikimedia.org/P12060 and previous config saved to /var/cache/conftool/dbconfig/20200727-145010-marostegui.json
  • 14:48 marostegui: Deploy MCR change on db1144:3314
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12059 and previous config saved to /var/cache/conftool/dbconfig/20200727-144807-marostegui.json
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149', diff saved to https://phabricator.wikimedia.org/P12058 and previous config saved to /var/cache/conftool/dbconfig/20200727-144034-marostegui.json
  • 14:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:19 XioNoX: standardize cr1-codfw interfaces
  • 14:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 moritzm: upgrading idp2001 to CAS 6.1.7.1
  • 13:19 XioNoX: standardize some cr2-esams interfaces
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 in main traffic', diff saved to https://phabricator.wikimedia.org/P12057 and previous config saved to /var/cache/conftool/dbconfig/20200727-131123-marostegui.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with normal weight and pool db1089 into vslow', diff saved to https://phabricator.wikimedia.org/P12056 and previous config saved to /var/cache/conftool/dbconfig/20200727-130954-marostegui.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12055 and previous config saved to /var/cache/conftool/dbconfig/20200727-130713-marostegui.json
  • 13:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with less weight', diff saved to https://phabricator.wikimedia.org/P12054 and previous config saved to /var/cache/conftool/dbconfig/20200727-125824-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12053 and previous config saved to /var/cache/conftool/dbconfig/20200727-125351-marostegui.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with less weight', diff saved to https://phabricator.wikimedia.org/P12052 and previous config saved to /var/cache/conftool/dbconfig/20200727-125207-marostegui.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12051 and previous config saved to /var/cache/conftool/dbconfig/20200727-125045-marostegui.json
  • 12:41 marostegui: Compress innodb on db1106, this will generate lag on enwiki on labsdb hosts (wiki replicas) T254462
  • 12:38 moritzm: disable puppet on idp1001/2001
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 and pool db1105:3311 as vslow T254462', diff saved to https://phabricator.wikimedia.org/P12050 and previous config saved to /var/cache/conftool/dbconfig/20200727-123833-marostegui.json
  • 12:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
  • 12:37 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
  • 12:37 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
  • 12:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
  • 12:36 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
  • 12:31 XioNoX: standardize cr2-codfw interfaces
  • 12:28 volans@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: Release v0.2.7 (duration: 00m 27s)
  • 12:28 volans@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: Release v0.2.7
  • 12:25 jbond42: upload new cas package to buster-wikimedia
  • 12:25 jbond42: upload new cas package
  • 12:23 ema: A:cp rolling varnish-frontend restart to actually discard old VCL still pointing at varnishcheck/check T255015 T236754
  • 12:21 moritzm: installing ruby-json security updates
  • 12:16 moritzm: installing batik security updates
  • 11:59 marostegui: Deploy MCR schema change on db1149
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12049 and previous config saved to /var/cache/conftool/dbconfig/20200727-115818-marostegui.json
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1138', diff saved to https://phabricator.wikimedia.org/P12048 and previous config saved to /var/cache/conftool/dbconfig/20200727-115739-marostegui.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138', diff saved to https://phabricator.wikimedia.org/P12047 and previous config saved to /var/cache/conftool/dbconfig/20200727-115258-marostegui.json
  • 11:28 moritzm: installing an-tool1009 T258768
  • 10:54 ema: upload atskafka 0.10 to buster-wikimedia, upgrade cp3050 T254317
  • 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (616463) (duration: 01m 05s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (616463) (duration: 01m 10s)
  • 10:33 XioNoX: make cr*-ulsfo interfaces netbox compliant
  • 08:39 XioNoX: push "Add 185.71.138.0/24 to wikimedia4" to all routers
  • 07:00 marostegui: Deploy schema change on s5 codfw T256682
  • 06:44 elukey: truncate big log file on an-launcher1002 that is filling up the /srv partition
  • 06:36 elukey: apt-get clean on netbox1001 to free some space
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12043 and previous config saved to /var/cache/conftool/dbconfig/20200727-051156-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for on-site maintenance T258587', diff saved to https://phabricator.wikimedia.org/P12042 and previous config saved to /var/cache/conftool/dbconfig/20200727-050058-marostegui.json
  • 04:58 marostegui: Stop MySQL on db2087 for on-site maintenance T258587

2020-07-25

  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1096:3315 into s5 api afte db1082 crashed T258336', diff saved to https://phabricator.wikimedia.org/P12041 and previous config saved to /var/cache/conftool/dbconfig/20200725-124104-marostegui.json
  • 09:16 oblivian@cumin1001: dbctl commit (dc=all): 'Depool db1082 T258336', diff saved to https://phabricator.wikimedia.org/P12040 and previous config saved to /var/cache/conftool/dbconfig/20200725-091616-oblivian.json
  • 01:52 mutante: ganeti - also removing (unmounted) disk 2 (100G) from webperf1002. T257931
  • 00:46 mutante: ganeti - removing disk 3 (20G) from webperf1002. the disks are 0-indexed, so the ones actually mounted are 0 (50G) and 1 (300G) (T257931)
  • 00:42 dpifke: Manually compressing some more data on webperf1002, using arclamp-compress-logs from https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/615904.

2020-07-24

  • 23:00 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 20:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 dpifke: Manually gzipping some older ArcLamp data on webperf1002, to free up space and verify new compression support.
  • 19:55 dpifke@deploy1001: Finished deploy [performance/arc-lamp@772b4a3]: Deploy CLs 611465 and 613740 to add compression support to ArcLamp (duration: 00m 05s)
  • 19:55 dpifke@deploy1001: Started deploy [performance/arc-lamp@772b4a3]: Deploy CLs 611465 and 613740 to add compression support to ArcLamp
  • 16:55 Amir1: deployment done
  • 16:49 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase/repo/includes/RepoHooks.php: Prevent onTitleGetRestrictionTypes changing ns0 protections, Part II (duration: 01m 07s)
  • 16:47 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase/repo/includes/WikibaseRepo.php: Prevent onTitleGetRestrictionTypes changing ns0 protections, Part I (duration: 01m 06s)
  • 15:06 reedy@deploy1001: Finished scap: Score backports (duration: 36m 50s)
  • 14:30 reedy@deploy1001: Started scap: Score backports
  • 13:31 XioNoX: advertise 185.71.138.0/24 from AMS
  • 13:17 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:00 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/includes/import/ImportableOldRevisionImporter.php: Import: use master DB for loading slots. (T258666) (duration: 01m 07s)
  • 12:34 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:04 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 11:48 hnowlan: bootstrapped restbase-dev1004-b
  • 11:13 hnowlan: started bootstrap of restbase-dev1004-a
  • 10:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:35 hnowlan: started reimage of restbase-dev1004
  • 09:59 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:48 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:40 kormat: restarting mariadb on all sanitarium hosts T258711
  • 08:35 akosiaris: start nagios-nrpe-server on kubernetes2002
  • 07:44 elukey: depool wtp1025 - disk full
  • 06:30 tstarling@deploy1001: Started scap: for Score
  • 02:36 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: removing superseded local patch for hard-coding lilypond version (duration: 01m 09s)
  • 01:19 ejegg: updated payments-wiki from 31a3de1130 to c365c136d2
  • 01:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 00:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 00:46 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:46 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 00:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:43 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 00:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 00:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission

2020-07-23

  • 23:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:52 mutante: stashbot quadruple log test
  • 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:51 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c99c626]: airflow: centralize installation specific airflow Variables (duration: 00m 34s)
  • 21:20 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c99c626]: airflow: centralize installation specific airflow Variables
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:13 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:11 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:09 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 18:51 ryankemper: restarted blazegraph on codfw wdqs2001
  • 18:44 ryankemper: Restarted blazegraph on following codfw wdqs nodes: 2007, 2003, and 2002
  • 18:39 Amir1: BACC is done
  • 18:29 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: Load WikibaseClient from extension.json file instead of php one (T257437 T256228 T88258) (duration: 01m 05s)
  • 18:21 mutante: testreduce1001 - rm -rf /srv/testreduce and run puppet to re-clone testreduce to it from the scandium branch (T257906)
  • 18:13 ryankemper: restarted blazegraph on 2001
  • 17:59 ryankemper: sudo -E cumin -b 10 'A:wdqs-all and not A:wdqs-test and not P{wdqs1003.eqiad.wmnet} and not P{wdqs2001.codfw.wmnet}' 'sudo systemctl restart wdqs-blazegraph.service'
  • 17:53 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ πŸ•‘β˜• sudo cumin -b10 'wdqs*' "run-puppet-agent --unless-version 1a4ae81"
  • 17:52 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs.*,name=codfw
  • 17:35 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs.*,name=codfw
  • 17:22 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 16:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 16:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 15:36 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 05s)
  • 13:49 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=.*
  • 12:29 marostegui: Decrease labsdb1009 weight a bit, as it is lagging again.
  • 12:23 XioNoX: remove bogus lo0 IPs from cr3-knams
  • 12:21 Urbanecm: Stagging at mwdebug1001 ended, run scap pull to clean changes
  • 12:17 Urbanecm: Stagging at mwdebug1001 again
  • 12:02 Urbanecm: Stagging at mwdebug1001 ended, run scap pull to clean changes
  • 12:00 Urbanecm: Stagging at mwdebug1001
  • 11:49 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 745ff20: Log ClosedWikiProviders start with info level (T258695) (duration: 01m 05s)
  • 11:48 marostegui: Deploy MCR schema change on db1145:3314
  • 11:36 dcausse: European mid-day backport window done
  • 11:31 dcausse@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase: T258507: Fix bug that causes wrong prefixes in RDF output (duration: 01m 11s)
  • 11:18 akosiaris: depool scb in mobileapps/eqiad. T218733
  • 11:17 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb.*
  • 11:13 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T258474: [sdoc] fix entity source base URIs (duration: 01m 07s)
  • 10:27 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=mobileapps,name=scb.*
  • 10:27 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=mobileapps,name=scb*
  • 10:25 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1002.*
  • 10:24 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1001.*
  • 10:18 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:14 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:11 akosiaris: poole kubernetes in mobileapps/eqiad. T218733
  • 10:11 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=mobileapps,name=kubernetes.*
  • 10:06 volans@deploy1001: Finished deploy [debmonitor/deploy@16d0c45]: Release v0.2.6 (duration: 00m 36s)
  • 10:06 volans@deploy1001: Started deploy [debmonitor/deploy@16d0c45]: Release v0.2.6
  • 10:05 volans@deploy1001: Finished deploy [debmonitor/deploy@44aa1ee]: Release v0.2.6 (duration: 00m 14s)
  • 10:05 volans@deploy1001: Started deploy [debmonitor/deploy@44aa1ee]: Release v0.2.6
  • 10:04 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 09:51 akosiaris: prepare for pooling kubernetes mobileapps capacity in eqiad. T218733
  • 09:51 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=mobileapps,name=kubernetes.*
  • 09:46 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:38 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:27 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:27 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 09:24 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 09:20 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 09:19 akosiaris: lower replica count back to 80 for mobileapps. T218733
  • 09:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:02 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 08:59 marostegui: transfer --type=xtrabackup from db1117:3322 to db1107 T257540
  • 08:45 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:42 godog: test librenms poller from netmon2001
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:40 XioNoX: remove pim-rp IPs from last routers - T257573
  • 08:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:29 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1107 from s1 T257540', diff saved to https://phabricator.wikimedia.org/P12025 and previous config saved to /var/cache/conftool/dbconfig/20200723-082647-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to move it to m2 T257540', diff saved to https://phabricator.wikimedia.org/P12024 and previous config saved to /var/cache/conftool/dbconfig/20200723-081650-marostegui.json
  • 05:29 marostegui: Restore labsdb1009's original weight
  • 00:24 legoktm@deploy1001: Synchronized php-1.35.0-wmf.41/includes/: T258664: Revert "Add a new type of database to the installer from extension" (2/2) (duration: 01m 08s)
  • 00:22 legoktm@deploy1001: Synchronized php-1.35.0-wmf.41/includes/libs/rdbms/database/Database.php: T258664: Revert "Add a new type of database to the installer from extension" (duration: 01m 05s)
  • 00:20 legoktm@deploy1001: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 00:16 legoktm@deploy1001: Synchronized php-1.36.0-wmf.1/includes/: T258664: Revert "Add a new type of database to the installer from extension" (duration: 01m 09s)
  • 00:11 legoktm@deploy1001: scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)

2020-07-22

  • 22:07 cdanis: remove downtime on api.svc.codfw.wmnet T258614
  • 19:26 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.1 (duration: 01m 03s)
  • 19:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.1
  • 19:15 urbanecm@deploy1001: Finished scap: 9529cf8: b66ec91: OOUI backport; 93755a6: i18n changes for OAuth, removal of spam messages (duration: 42m 26s)
  • 19:14 ejegg: updated payments-wiki from bf91f8adff to 31a3de1130
  • 19:11 mutante: mw2335 - mw2339 - scap pull
  • 18:39 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw233[5-9].codfw.wmnet
  • 18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw233[6-9].codfw.wmnet
  • 18:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw233[6-9].codfw.wmnet
  • 18:33 urbanecm@deploy1001: Started scap: 9529cf8: b66ec91: OOUI backport; 93755a6: i18n changes for OAuth, removal of spam messages
  • 18:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
  • 18:28 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw233[5-9].codfw.wmnet
  • 18:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
  • 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2338.codfw.wmnet
  • 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2337.codfw.wmnet
  • 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
  • 17:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2335.codfw.wmnet
  • 15:31 moritzm: updated stretch installer image to Stretch 9.13 release T258407
  • 15:27 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 15:27 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:52 XioNoX: add accept-data and remove bogus v6 IP from ulsfo sandbox vlan
  • 14:43 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
  • 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:35 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:35 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:12 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:12 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:04 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:50 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:49 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 13:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:20 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:19 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:18 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:18 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:16 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:16 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 12:36 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
  • 12:32 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 12:28 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps,name=scb.*
  • 12:20 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 12:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 12:17 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=codfw,service=mobileapps,name=scb.*
  • 12:05 ema: A:cp-text varnish ban ptwikiversity T256750
  • 12:01 ema: A:cp-text varnish ban frwiktionary T256750
  • 11:56 ema: A:cp-text varnish ban euwiki T256750
  • 11:54 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
  • 11:54 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 11:54 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 11:52 Urbanecm: EU B&C window done
  • 11:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
  • 11:49 ema: A:cp-text force puppet run to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/615446 T256750
  • 11:48 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 15s)
  • 11:42 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable desktop improvements by default for testing group (round 1) (T254227) (duration: 01m 05s)
  • 11:30 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable instrumentation for wikis in the desktop improvements testing group (T254228) (duration: 01m 04s)
  • 11:30 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 11:30 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 11:28 jdrewniak@deploy1001: Synchronized wmf-config/config: Config: Enable instrumentation for wikis in the desktop improvements testing group (T254228) (duration: 01m 05s)
  • 11:20 jdrewniak@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Config: Enable instrumentation for wikis in the desktop improvements testing group (T254228) (duration: 01m 05s)
  • 11:18 jdrewniak@deploy1001: Synchronized dblists/desktop-improvements.dblist: Config: Enable instrumentation for wikis in the desktop improvements testing group (T254228) (duration: 01m 18s)
  • 11:13 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 11:13 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 10:39 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:24 jbond42: upload prometheus-swagger-exporter_0.3-1+deb10u1 to apt1001 buster repo
  • 10:24 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:22 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 10:19 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 10:19 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
  • 10:08 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
  • 10:04 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:58 marostegui: Deploy MCR schema change on s4 codfw master (lag will appear on codfw) - T238966
  • 09:55 akosiaris: bump memory in codfw mobileapps another 20% T218733
  • 09:55 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:55 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:52 godog: centrallog1001 lvextend /srv by 130G
  • 09:51 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:46 akosiaris: codfw mobileapps kubernetes traffic back to 96% T218733 again. scb pooled again.
  • 09:46 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
  • 09:43 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 09:43 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:43 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 09:40 akosiaris: increase codfw mobileapps kubernetes traffic to 100% T218733
  • 09:40 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
  • 09:34 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 09:27 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:27 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:25 akosiaris: bump memory limits for mobileapps by 25% T218733
  • 09:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 09:10 jayme: updated docker-report to 0.0.7-1 on deneb
  • 09:09 jayme: import docker-report 0.0.7-1 to buster-wikimedia
  • 09:06 gehel: restarting blazegraph on all wdqs nodes - new vocabulary
  • 08:48 dcausse: restarting blazegraph on wdqs1010 (testing new vocab)
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12017 and previous config saved to /var/cache/conftool/dbconfig/20200722-084613-marostegui.json
  • 08:42 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 100% pooled in es4, reduce es1021 to weight 0 T257284', diff saved to https://phabricator.wikimedia.org/P12016 and previous config saved to /var/cache/conftool/dbconfig/20200722-084159-kormat.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12015 and previous config saved to /var/cache/conftool/dbconfig/20200722-083926-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12014 and previous config saved to /var/cache/conftool/dbconfig/20200722-083535-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12013 and previous config saved to /var/cache/conftool/dbconfig/20200722-083140-marostegui.json
  • 08:30 kart_: Updated cxserver to 2020-07-20-200559-production (T257674)
  • 08:28 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 08:25 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 08:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12012 and previous config saved to /var/cache/conftool/dbconfig/20200722-082309-marostegui.json
  • 08:22 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12010 and previous config saved to /var/cache/conftool/dbconfig/20200722-082023-marostegui.json
  • 08:19 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 08:16 akosiaris: increase codfw mobileapps kubernetes traffic to 96% T218733. Take #2. Let's see if I can reproduce the weird increases in p99 latencies and figure out their cause
  • 08:15 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
  • 08:14 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 75% pooled in es4, reduce es1021 to weight 25 T257284', diff saved to https://phabricator.wikimedia.org/P12009 and previous config saved to /var/cache/conftool/dbconfig/20200722-081457-kormat.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12008 and previous config saved to /var/cache/conftool/dbconfig/20200722-081330-marostegui.json
  • 08:12 moritzm: Turnilo switched to CAS
  • 08:05 jayme: updated docker-report to 0.0.6-1 on deneb
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12007 and previous config saved to /var/cache/conftool/dbconfig/20200722-075749-marostegui.json
  • 07:53 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 50% pooled in es4 T257284', diff saved to https://phabricator.wikimedia.org/P12006 and previous config saved to /var/cache/conftool/dbconfig/20200722-075312-kormat.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1084 to s1, depooled T253217', diff saved to https://phabricator.wikimedia.org/P12005 and previous config saved to /var/cache/conftool/dbconfig/20200722-075040-marostegui.json
  • 07:49 jayme: import docker-report 0.0.6-1 to buster-wikimedia
  • 07:40 jynus: stop db1145 for hw maintenance T258249
  • 06:47 elukey: update analytics-in4/6 filters on cr1/cr2 eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/614702)
  • 06:26 marostegui: Stop MySQL on db1107
  • 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to clone db1084', diff saved to https://phabricator.wikimedia.org/P12003 and previous config saved to /var/cache/conftool/dbconfig/20200722-060432-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P12002 and previous config saved to /var/cache/conftool/dbconfig/20200722-051607-marostegui.json

2020-07-21

  • 23:37 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump cirrus MLR models to latest (duration: 01m 06s)
  • 23:13 Urbanecm: Evening backport window done
  • 23:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 7a50168: Updating UploadWizard template: PD-old-70-1923->PD-old-70-expired (T258523) (duration: 01m 06s)
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7acc9d9: Enable $wgWatchlistExpiry on testwiki (T257506) (duration: 01m 08s)
  • 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.1
  • 19:02 catrope@deploy1001: Synchronized php-1.36.0-wmf.1/includes/Storage/PageUpdater.php: Fix handling of null edits (T257766) (duration: 01m 06s)
  • 19:01 catrope@deploy1001: Synchronized php-1.35.0-wmf.41/includes/Storage/PageUpdater.php: Fix handling of null edits (T257766) (duration: 01m 11s)
  • 18:33 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.1 (duration: 41m 22s)
  • 18:27 ejegg: restored new URL for TY page in payments-wiki settings
  • 18:22 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 07s)
  • 18:22 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
  • 18:21 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 12s)
  • 18:21 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
  • 18:17 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 17s)
  • 18:16 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
  • 18:13 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 05m 32s)
  • 18:08 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
  • 17:52 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.1
  • 17:50 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:45 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:10 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.39 (duration: 16m 25s)
  • 16:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2 (duration: 04m 54s)
  • 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2
  • 16:27 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase (duration: 10m 37s)
  • 16:21 longma: 1.36.0-wmf.1 was branched at 3a1faac for T257969
  • 16:16 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase
  • 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:10 moritzm: draining restbase1027 for eventual reboot for kernel security update
  • 15:09 godog: poweroff ms-be1024 for bbu replacement - T257949
  • 15:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:01 vgutierrez: show a synthetic warning for traffic using ECDHE-RSA-AES128-SHA - T258405
  • 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:00 moritzm: draining restbase1026 for eventual reboot for kernel security update
  • 14:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:51 moritzm: draining restbase1025 for eventual reboot for kernel security update
  • 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps,name=scb.*
  • 14:35 akosiaris: decrease codfw mobileapps kubernetes traffic to 72% T218733. Weird latency patterns exhibited when 92% was reached. See https://grafana.wikimedia.org/d/5CmeRcnMz/mobileapps?panelId=34&fullscreen&orgId=1&from=1595338489749&to=1595342071227&var-dc=codfw%20prometheus%2Fk8s&var-service=mobileapps&var-container_name=All
  • 14:35 moritzm: draining restbase1024 for eventual reboot for kernel security update
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11994 and previous config saved to /var/cache/conftool/dbconfig/20200721-143204-marostegui.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11993 and previous config saved to /var/cache/conftool/dbconfig/20200721-142634-marostegui.json
  • 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11992 and previous config saved to /var/cache/conftool/dbconfig/20200721-141813-marostegui.json
  • 14:16 moritzm: draining restbase1023 for eventual reboot for kernel security update
  • 14:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:03 moritzm: draining restbase1022 for eventual reboot for kernel security update
  • 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:55 moritzm: draining restbase1021 for eventual reboot for kernel security update
  • 13:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11991 and previous config saved to /var/cache/conftool/dbconfig/20200721-135028-marostegui.json
  • 13:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:46 moritzm: draining restbase1020 for eventual reboot for kernel security update
  • 13:42 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
  • 13:41 akosiaris: increase codfw mobileapps kubernetes traffic to 96% T218733
  • 13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:15 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T258472 T258473)
  • 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:03 moritzm: draining restbase1019 for eventual reboot for kernel security update
  • 13:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:55 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T258472 T258473)
  • 12:54 marostegui: Stop haproxy on dbproxy1012 - T255408
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P11988 and previous config saved to /var/cache/conftool/dbconfig/20200721-121302-marostegui.json
  • 12:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:25 Urbanecm: EU B&C window done
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b96c7e: Enable botpasswords at checkuserwiki and stewardwiki (T258358, T258355) (duration: 00m 57s)
  • 11:11 Urbanecm: Create bot_passwords table at checkuserwiki (T258358)
  • 11:10 Urbanecm: Create bot_passwords table at stewardwiki (T258355)
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5d5bb37: Enable Vector opt in preference everywhere (T254228) (duration: 00m 57s)
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1085 T258360', diff saved to https://phabricator.wikimedia.org/P11987 and previous config saved to /var/cache/conftool/dbconfig/20200721-110854-marostegui.json
  • 11:00 effie: enable puppet on P:mediawiki::mcrouter_wancache - T247956
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 T258360', diff saved to https://phabricator.wikimedia.org/P11986 and previous config saved to /var/cache/conftool/dbconfig/20200721-105852-marostegui.json
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 T258360', diff saved to https://phabricator.wikimedia.org/P11985 and previous config saved to /var/cache/conftool/dbconfig/20200721-104546-marostegui.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P11984 and previous config saved to /var/cache/conftool/dbconfig/20200721-103430-marostegui.json
  • 10:20 effie: disable puppet on P:mediawiki::mcrouter_wancache - T247956
  • 10:13 effie: enable puppet on on wtp*
  • 10:02 marostegui: Analyze revision table on db1119 T258480
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T258480', diff saved to https://phabricator.wikimedia.org/P11983 and previous config saved to /var/cache/conftool/dbconfig/20200721-100159-marostegui.json
  • 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb T218733
  • 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb
  • 09:59 effie: disable puppet on wtp* to merge 613307
  • 09:58 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps
  • 09:58 akosiaris: increase codfw mobileapps kubernetes traffic to 72.727272% T218733
  • 09:57 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
  • 09:44 elukey: add term 'idp' to analytics-in4/6 filters on cr1-eqiad and cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/615160)
  • 09:21 kormat@cumin1001: dbctl commit (dc=all): 'Re-pool es1020 at 25% in es4 T257284', diff saved to https://phabricator.wikimedia.org/P11982 and previous config saved to /var/cache/conftool/dbconfig/20200721-092126-kormat.json
  • 08:37 akosiaris: increase codfw mobileapps kubernetes traffic to 47% T218733
  • 08:34 akosiaris@cumin1001: conftool action : set/weight=3; selector: dc=codfw,service=mobileapps,name=scb.*
  • 08:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11980 and previous config saved to /var/cache/conftool/dbconfig/20200721-080842-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11979 and previous config saved to /var/cache/conftool/dbconfig/20200721-075233-marostegui.json
  • 07:49 marostegui: Deploy schema change on db1087, lag will appear on s8 (wikidata) on labsdb hosts T256685
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T256685', diff saved to https://phabricator.wikimedia.org/P11978 and previous config saved to /var/cache/conftool/dbconfig/20200721-074843-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11977 and previous config saved to /var/cache/conftool/dbconfig/20200721-073757-marostegui.json
  • 07:29 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es4 T257847 (duration: 00m 57s)
  • 07:22 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1020 from es4 T257847', diff saved to https://phabricator.wikimedia.org/P11976 and previous config saved to /var/cache/conftool/dbconfig/20200721-072251-kormat.json
  • 07:21 kormat@cumin1001: dbctl commit (dc=all): 'Promote es1021 to es4 master T257847', diff saved to https://phabricator.wikimedia.org/P11975 and previous config saved to /var/cache/conftool/dbconfig/20200721-072127-kormat.json
  • 07:13 kormat: killing James_F('s script) on mwmaint1002
  • 07:06 _joe_: systemctl reset-failed on deneb, the usual known issue with releng image reporting
  • 07:03 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es4 T257847 (duration: 01m 00s)
  • 06:59 kormat: Starting es4 failover from es1020 to es1021 T257847
  • 06:54 kormat@cumin1001: dbctl commit (dc=all): 'Set es1021 to weight 50 T257847', diff saved to https://phabricator.wikimedia.org/P11974 and previous config saved to /var/cache/conftool/dbconfig/20200721-065457-kormat.json
  • 06:54 marostegui: Pool db1119 into enwiki with MCR schema change done - T238966
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11973 and previous config saved to /var/cache/conftool/dbconfig/20200721-065430-marostegui.json
  • 06:27 _joe_: systemctl reset-failed on lists1001, a network interface was failing since 1 month
  • 06:26 _joe_: enabling notifications for lists1001
  • 06:23 _joe_: systemctl reset-failed on both centrallogs
  • 02:43 eileen: civicrm revision changed from 7f1e7d8e38 to cc5d17fbaf, config revision is 23460676f6
  • 00:02 ryankemper: Began Elasticsearch reindex job on index `dewiki_content` across [`eqiad`, `codfw`, `cloudelastic`], on `rkemper@mwmaint1002` under tmux session `reindex`. Should complete in <24 hours

2020-07-20

  • 23:49 eileen: tools revision changed from b915d8efbd to 22550f38c5
  • 23:34 ejegg: updated fundraising CiviCRM from 8b09c87ce2 to 7f1e7d8e38
  • 23:12 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/ProofreadPage/ProofreadPage.namespaces.php: 03ed74f: Add ProofreadPage namespace translation for lij (T257672) (duration: 00m 57s)
  • 23:06 Urbanecm: run mwscript namespaceDupes.php --wiki=lijwikisource -- fix (T257672)
  • 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2147774: Add English aliases for WS-specific namespaces to lijwikisource (T257672) (duration: 00m 57s)
  • 22:59 ryankemper@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 613669: cirrussearch: Allow 2 dewiki->content shards/node | https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/613669 (duration: 00m 57s)
  • 21:53 eileen: tools revision changed from 40d52a0008 to b915d8efbd
  • 21:15 sbassett: Revised mitigation deployed for T257687
  • 20:07 eileen: tools revision changed from 711d671600 to 40d52a0008
  • 19:10 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 00m 07s)
  • 19:10 mforns@deploy1001: Started deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
  • 19:09 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 05m 46s)
  • 19:03 mforns@deploy1001: Started deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
  • 18:37 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: df2584f: Switch $wgUrlShortenerDomainsWhitelist --> $wgUrlShortenerAllowedDomains (T255491) (duration: 00m 57s)
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: dfed472: Adding rollbacker group for arzwiki (T258100) (duration: 00m 57s)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ee7ac95: Change of rollbacker group settings at jawiki (T258339) (duration: 00m 57s)
  • 17:36 ejegg: updated payments-wiki settings to point TY page at new URL
  • 16:32 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map (duration: 00m 25s)
  • 16:31 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map
  • 16:27 akosiaris: increase codfw mobileapps kubernetes traffic to 25% T218733. Take #2
  • 16:27 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
  • 15:59 elukey: restart airflow-webserver/scheduler to pick up TLS to mysql settings
  • 15:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:17 hnowlan: draining and restarting sessionstore2002
  • 15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:13 jynus: dropping and recreating nagios@localhost users on all m1 servers
  • 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:09 hnowlan: draining and restarting sessionstore2001
  • 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:09 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:08 moritzm: draining restbase2023 for eventual reboot for kernel security update
  • 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:56 moritzm: draining restbase2022 for eventual reboot for kernel security update
  • 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:52 hnowlan: draining and restarting sessionstore1003
  • 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:52 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:47 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 14:47 moritzm: draining restbase2021 for eventual reboot for kernel security update
  • 14:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:36 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to 0bf7bafa (duration: 03m 50s)
  • 14:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:34 hnowlan: starting drain and restart of sessionstore hosts for new kernel
  • 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:32 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to 0bf7bafa
  • 14:26 moritzm: draining restbase2020 for eventual reboot for kernel security update
  • 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:23 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:14 moritzm: draining restbase2019 for eventual reboot for kernel security update
  • 14:08 ema: lvs101[34] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 14:07 ema: lvs1016 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:59 ema: lvs300[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:57 ema: lvs3007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:50 ema: lvs500[12] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:48 moritzm: draining restbase2018 for eventual reboot for kernel security update
  • 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:47 ema: lvs5003 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:44 ema: lvs200[78] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:42 ema: lvs2010 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:31 ema: lvs400[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:27 moritzm: draining restbase2017 for eventual reboot for kernel security update
  • 13:24 ema: lvs4007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 T255015
  • 13:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:09 moritzm: draining restbase2016 for eventual reboot for kernel security update
  • 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:07 moritzm: reset broken ifup systemd states on puppetdb* hosts
  • 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:59 Urbanecm: creating arywiki (T257674), lijwikisource (T257672), sysop_itwiki (T256545) done
  • 12:59 moritzm: draining restbase2015 for eventual reboot for kernel security update
  • 12:56 Urbanecm: Create Daimona Eaytoy at sysop_itwiki (T256545)
  • 12:55 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
  • 12:50 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating sysop_itwiki (T256545) (duration: 00m 57s)
  • 12:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating sysop_itwiki (T256545) (duration: 00m 57s)
  • 12:48 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating sysop_itwiki (T256545)
  • 12:46 urbanecm@deploy1001: Synchronized dblists: Creating sysop_itwiki (T256545) (duration: 00m 57s)
  • 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:40 moritzm: draining restbase2014 for eventual reboot for kernel security update
  • 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lijwikisource (T257672) (duration: 00m 57s)
  • 12:32 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lijwikisource (T257672)
  • 12:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:30 urbanecm@deploy1001: Synchronized dblists: Creating lijwikisource (T257672) (duration: 00m 56s)
  • 12:28 urbanecm@deploy1001: Synchronized dblists/rtl.dblist: Add arywiki to rtl.dblist (T257674) (duration: 00m 57s)
  • 12:27 moritzm: draining restbase2013 for eventual reboot for kernel security update
  • 12:27 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 12:21 urbanecm@deploy1001: Synchronized langlist: Creating arywiki (T257674) (duration: 00m 56s)
  • 12:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arywiki (T257674) (duration: 00m 56s)
  • 12:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arywiki (T257674) (duration: 00m 57s)
  • 12:17 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arywiki (T257674)
  • 12:16 urbanecm@deploy1001: Synchronized dblists: Creating arywiki (T257674) (duration: 00m 57s)
  • 12:02 moritzm: installing qemu security updates on buster
  • 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 946bf3d: Update brwikimedia logo and add upscaled versions (config) (T257925) (duration: 00m 57s)
  • 11:49 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 11:49 Urbanecm: Purge 'https://en.wikipedia.org/static/images/project-logos/bnwikimedia.png'
  • 11:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: f7560b6: Update brwikimedia logo and add upscaled versions (T257925) (duration: 00m 56s)
  • 11:44 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5b97a06: Set $wgUrlShortenerAllowedDomains for all wikis (T258134) (duration: 00m 57s)
  • 11:42 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c12f1de: Remove wgPopupsPageBlacklist config setting (T254676) (duration: 00m 57s)
  • 11:35 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript createAndPromote.php testwikidatawiki --custom-groups=interface-admin --force 'Lucas Werkmeister (WMDE)'
  • 11:34 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 11:25 Urbanecm: mwscript namespaceDupes.php --wiki=kowikiquote --fix (T255031)
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3719668: Add NamespaceAliases for kowikiquote (T255031) (duration: 00m 57s)
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bc5671a: Add media.farsnews.ir to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T253800) (duration: 00m 57s)
  • 11:18 Urbanecm: Run mwscript updateCollation.php --wiki=bswiktionary --previous-collation=uppercase in a tmux session at mwmaint1002 (T258346)
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0c78478: Set $wgCategoryCollation to uca-bs-u-kn on Bosnian Wiktionary (T258346) (duration: 00m 58s)
  • 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6830723: Convert ukwikisource ns:250 and ns:251 to have subpages (T255930) (duration: 00m 57s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1c7a621: Create closer group at itwikinews (T257927) (duration: 00m 57s)
  • 10:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:48 moritzm: rebooting releases* hosts for kernel security update
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (614698) (duration: 00m 56s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (614698) (duration: 00m 59s)
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114', diff saved to https://phabricator.wikimedia.org/P11962 and previous config saved to /var/cache/conftool/dbconfig/20200720-103058-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11961 and previous config saved to /var/cache/conftool/dbconfig/20200720-094609-marostegui.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11960 and previous config saved to /var/cache/conftool/dbconfig/20200720-093154-marostegui.json
  • 09:25 godog: update compiler facts
  • 09:17 jayme: updating envoyproxy to 1.14.4-1 on all eqiad hosts
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11959 and previous config saved to /var/cache/conftool/dbconfig/20200720-091119-marostegui.json
  • 09:04 jayme: updating envoyproxy to 1.14.4-1 on all codfw hosts
  • 07:54 moritzm: installing libopenmpt security updates
  • 07:51 jayme: updating envoyproxy to 1.14.4-1 on all non mw and restbase hosts
  • 07:29 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 - T255408
  • 07:19 marostegui: Drop non used reviewdb database - T255715
  • 06:55 elukey: restart matomo1002's mariadb to pick up new TLS settings
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P11958 and previous config saved to /var/cache/conftool/dbconfig/20200720-065438-marostegui.json
  • 06:15 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score/includes/Score.php: reverting Reedy's temporary patch for hardcoding the lilypond version (duration: 00m 57s)
  • 06:07 tstarling@deploy1001: Finished scap: fixing missing message from previous sync-dir (duration: 29m 57s)
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082 after a crash T258336', diff saved to https://phabricator.wikimedia.org/P11957 and previous config saved to /var/cache/conftool/dbconfig/20200720-055614-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash T258336', diff saved to https://phabricator.wikimedia.org/P11956 and previous config saved to /var/cache/conftool/dbconfig/20200720-054747-marostegui.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash T258336', diff saved to https://phabricator.wikimedia.org/P11955 and previous config saved to /var/cache/conftool/dbconfig/20200720-053816-marostegui.json
  • 05:37 tstarling@deploy1001: Started scap: fixing missing message from previous sync-dir
  • 05:30 tstarling@deploy1001: scap sync-l10n completed (1.35.0-wmf.41) (duration: 02m 44s)
  • 05:25 marostegui: Deploy MCR schema change on enwiki on db1119 - T238966
  • 05:24 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disable lilypond with better error message (duration: 00m 57s)
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash T258336', diff saved to https://phabricator.wikimedia.org/P11953 and previous config saved to /var/cache/conftool/dbconfig/20200720-051846-marostegui.json
  • 05:18 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score: better error message for disabling of Score (duration: 01m 10s)

2020-07-19

  • 19:16 marostegui: Upgrade and reboot db1085 T258360
  • 18:57 marostegui: Start mysql on db1082 T258336
  • 18:51 marostegui: Upgrade and reboot db1082 T258336
  • 18:45 cdanis@cumin1001: dbctl commit (dc=all): 'db1085 also crashed', diff saved to https://phabricator.wikimedia.org/P11952 and previous config saved to /var/cache/conftool/dbconfig/20200719-184511-cdanis.json
  • 18:06 Urbanecm: Run mwscript emptyUserGroup.php --wiki=testwiki contestadmin (T256555)

2020-07-18

  • 21:41 shdubsh: restart logstash on logstash200[456]
  • 21:14 shdubsh: bounce logstash on logstash1007
  • 21:10 shdubsh: bounce logstash on logstash1008
  • 21:06 shdubsh: bounce logstash on logstash1009
  • 20:52 marostegui: Due to db1082 crash there will be replication lag on s5 on labsdb hosts - T258336
  • 20:37 cdanis@cumin1001: dbctl commit (dc=all): 'depool db1082, it crashed', diff saved to https://phabricator.wikimedia.org/P11951 and previous config saved to /var/cache/conftool/dbconfig/20200718-203704-cdanis.json
  • 00:13 dpifke: Performing one-time expiration of ArcLamp files older than 40 days (normal retention is 45 days), to solve disk space issue until either Ganeti issue is solved or compressed logfile support is merged.

2020-07-17

  • 21:16 dpifke: Removing MongoDB packages and data from webperf1002.
  • 17:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@a5d2fd3]: (no justification provided) (duration: 00m 05s)
  • 17:38 dpifke@deploy1001: Started deploy [performance/arc-lamp@a5d2fd3]: (no justification provided)
  • 13:53 akosiaris: powercycle kubernetes2002
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P11944 and previous config saved to /var/cache/conftool/dbconfig/20200717-122400-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11941 and previous config saved to /var/cache/conftool/dbconfig/20200717-120126-marostegui.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11940 and previous config saved to /var/cache/conftool/dbconfig/20200717-115155-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11939 and previous config saved to /var/cache/conftool/dbconfig/20200717-113800-marostegui.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11938 and previous config saved to /var/cache/conftool/dbconfig/20200717-113050-marostegui.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104', diff saved to https://phabricator.wikimedia.org/P11937 and previous config saved to /var/cache/conftool/dbconfig/20200717-112413-marostegui.json
  • 09:15 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
  • 09:12 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
  • 08:48 moritzm: imported prometheus-atlas-exporter 1.0+git20191204.ffafab7-2 to buster-wikimedia T247967
  • 08:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 08:05 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 07:54 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11936 and previous config saved to /var/cache/conftool/dbconfig/20200717-075124-marostegui.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P11935 and previous config saved to /var/cache/conftool/dbconfig/20200717-074335-marostegui.json
  • 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:30 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 06:30 XioNoX: rename msw1-codfw interface range
  • 06:28 XioNoX: rename msw1-eqiad interface range
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P11934 and previous config saved to /var/cache/conftool/dbconfig/20200717-044748-marostegui.json
  • 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092', diff saved to https://phabricator.wikimedia.org/P11933 and previous config saved to /var/cache/conftool/dbconfig/20200717-044658-marostegui.json

2020-07-16

  • 22:15 mutante: testreduce1001 manually git clone 'scandium' branch of integration/visualdiff into /srv/visualdiff (T257906)
  • 21:54 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3 (duration: 01m 49s)
  • 21:52 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3
  • 21:42 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2 (duration: 01m 33s)
  • 21:41 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2
  • 21:40 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 (duration: 01m 01s)
  • 21:39 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7
  • 21:08 cstone: payments-wiki revision changed from 91852dbc9b to bf91f8adff
  • 20:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client error logging on Catalan Wikipedia (T258073) (duration: 00m 57s)
  • 19:32 sbassett: Deployed mitigations for T257687
  • 19:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T248418 TimedMediaHandler: Make videojs the only player on all group0 (duration: 00m 57s)
  • 18:54 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:53 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:50 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:49 addshore: deployment windows finished with
  • 18:46 addshore@deploy1001: Synchronized wmf-config/extension-list: gerrit:611393 extension-list: Load WikibaseClient via JSON (duration: 00m 56s)
  • 18:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:613226 Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 2/2 (duration: 00m 56s)
  • 18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: gerrit:613226 Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 1/2 (duration: 00m 56s)
  • 18:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:613165 T138104 Wikibase: stop setting wmgWikibaseTmpSerializeEmptyListsAsObjects (duration: 00m 57s)
  • 18:23 addshore@deploy1001: Synchronized wmf-config/config/incubatorwiki.yaml: gerrit:613199 T256957 Move VisualEditor from beta to default on incubatorwiki PT2/2 (duration: 00m 57s)
  • 18:22 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: gerrit:613199 T256957 Move VisualEditor from beta to default on incubatorwiki PT1/2 (duration: 00m 56s)
  • 18:20 addshore@deploy1001: Synchronized wmf-config/config/nlwikimedia.yaml: gerrit:613198 T256142 Move VisualEditor from beta to default on nlwikimedia PT2/2 (duration: 00m 57s)
  • 18:18 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: gerrit:613198 T256142 Move VisualEditor from beta to default on nlwikimedia PT1/2 (duration: 00m 56s)
  • 18:14 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: gerrit:613164 T138104 Wikibase: stop setting wgWBRepoSettings tmpSerializeEmptyListsAsObjects (duration: 00m 57s)
  • 18:12 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:613192 T246420 Enable limited-width layout for Modern Vector (duration: 00m 56s)
  • 18:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:612870 T246977 Disable affinity quicksurveys for the following wikis (duration: 00m 57s)
  • 18:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:54 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 17:49 herron@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:17 XioNoX: msw1-eqiad delete unused VC-ports
  • 17:05 XioNoX: msw1-codfw - replace member-range with list of individual interfaces
  • 16:45 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: Re add OtherProjectsSidebarGenerator::buildProjectLinkSidebarFromItemId (T258184) (duration: 01m 02s)
  • 16:11 effie: reboot rdb1009 - T254990
  • 16:06 effie: Reboot rdb1010 - T254990
  • 15:51 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: Revert "Revert "Removes OtherProjectsSidebar hook"" (T258184) (duration: 01m 02s)
  • 15:40 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 7/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 15:15 akosiaris: lower codfw mobileapps kubernetes traffic to 10% T218733. Will open up task for it
  • 15:15 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
  • 15:07 XioNoX: repool eqsin - T257154
  • 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:00 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 14:54 XioNoX: load config on cr3-eqsin - T257154
  • 14:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: Avoid trying to register wikibase.Site twice (T258065) (duration: 01m 03s)
  • 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 14:31 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 14:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:12 moritzm: rebooting webperf hosts in eqiad for kernel update
  • 14:09 XioNoX: upgrade junos on cr3-eqsin - T257154
  • 14:03 jayme: published image docker-registry.discovery.wmnet/envoy:1.14.4-1
  • 13:47 XioNoX: remove nonstop-bridging from asw1-eqsin
  • 13:36 XioNoX: power-off cr3-eqsin - T257154
  • 13:36 akosiaris: increase codfw mobileapps kubernetes traffic to 25% T218733
  • 13:35 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
  • 13:30 XioNoX: deactivate BGP groups IX/Transit/PyBal on cr3-eqsin - T257154
  • 13:27 moritzm: installing an-tool1008
  • 13:23 XioNoX: depool eqsin for cr3 replacement - T257154
  • 13:13 volans@deploy1001: Finished deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin (duration: 01m 27s)
  • 13:12 volans@deploy1001: Started deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin
  • 13:04 kormat: restarting tendril to pick up new mariadb config T257816
  • 13:02 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.41
  • 13:02 akosiaris: increase codfw mobileapps kubernetes traffic to 10% T218733
  • 13:01 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092', diff saved to https://phabricator.wikimedia.org/P11926 and previous config saved to /var/cache/conftool/dbconfig/20200716-125643-marostegui.json
  • 12:56 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 04m 32s)
  • 12:52 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
  • 12:42 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 03m 42s)
  • 12:38 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
  • 12:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:36 akosiaris@cumin1001: conftool action : set/weight=50; selector: dc=codfw,service=mobileapps,name=scb.*
  • 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5% T218733
  • 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5%
  • 12:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 12:22 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 12:08 jayme: updated envoyproxy to 1.14.4-1 on mw-canary and restbase-canary
  • 11:44 XioNoX: remove BGP to AS396253 in eqdfw (peer left the IX)
  • 11:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: T258134 Fix config variables regex concatenation (duration: 01m 05s)
  • 11:23 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: T254315 gerrit:612670 Wikibase: remove wmgWikibaseLocalEntitySourceName (duration: 01m 05s)
  • 11:18 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T254315 T257266 gerrit:609988 Wikidata client wikis: Define entity sources configuration (take 3) (duration: 01m 08s)
  • 10:17 jbond42: upgrade to hiera5
  • 10:08 jbond42: disable puppet for hiera5 deployment
  • 09:37 jayme: updated envoyproxy to 1.14.4-1 on mw1325.eqiad.wmnet and restbase1026.eqiad.wmnet
  • 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:15 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:15 moritzm: rebooting flowspec1001
  • 08:52 jayme: updated envoyproxy to 1.14.4-1 on mwdebug1001.eqiad.wmnet
  • 08:41 moritzm: installing sqlite3 security updates
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P11924 and previous config saved to /var/cache/conftool/dbconfig/20200716-083954-marostegui.json
  • 08:35 XioNoX: Remove PIM/IGMP related CR stanza (acls) - T257573
  • 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:26 moritzm: installing dbus security updates
  • 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:24 XioNoX: remove igmp-snooping from access switches - T257573
  • 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:15 moritzm: installing python-urllib3 security updates
  • 08:15 XioNoX: remove PIM config from eqord/eqdfw/knams routers - T257573
  • 08:14 XioNoX: remove PIM config from eqiad routers - T257573
  • 08:11 XioNoX: remove PIM config from esams routers - T257573
  • 08:09 XioNoX: remove PIM config from eqsin routers - T257573
  • 08:08 jbond42: update mail delivery for phabricator to use phabricator.discovery.wmnet cname
  • 08:07 XioNoX: remove PIM config from codfw routers - T257573
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P11923 and previous config saved to /var/cache/conftool/dbconfig/20200716-080613-marostegui.json
  • 08:03 XioNoX: remove PIM config from ulsfo routers - T257573
  • 07:41 jayme: imported envoyproxy_1.14.4-1 to stretch-wikimedia
  • 07:31 jayme: imported envoyproxy_1.14.4-1 to buster-wikimedia
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1131', diff saved to https://phabricator.wikimedia.org/P11922 and previous config saved to /var/cache/conftool/dbconfig/20200716-072838-marostegui.json
  • 07:25 marostegui: Drop database reviewdb-test T255715
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11921 and previous config saved to /var/cache/conftool/dbconfig/20200716-070331-marostegui.json
  • 06:40 XioNoX: remove peering with AS8403 in eqsin (peer left the IX)
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11920 and previous config saved to /var/cache/conftool/dbconfig/20200716-051342-marostegui.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11919 and previous config saved to /var/cache/conftool/dbconfig/20200716-051109-marostegui.json

2020-07-15

  • 23:54 eileen: tools revision changed from 7b6018a16e to 711d671600
  • 23:50 eileen: process-control config revision is 1fc4a9686d
  • 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:04 bd808: tools.admin Removed valhallasw from maintainers (T255697)
  • 23:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:58 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:27 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:16 brennen: restarting jenkins for upgrade
  • 18:00 mutante: DNS - new language 'avk' has been added - This language is called Kotava and is "a proposed international auxiliary language (IAL) that focuses especially on the principle of cultural neutrality". Learn more at https://en.wikipedia.org/wiki/Kotava
  • 17:32 mutante: puppetmaster - revoking cert for planet.discovery.wmnet, add planet.wikimedia.org, remove planet.svc records, remove specific and outdated hostnames (T257840)
  • 16:11 moritzm: uploaded jenkins 2.235.2 to thirdparty/ci for stretch/buster T257614
  • 15:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:20 moritzm: rebooting webperf* hosts for kernel update
  • 14:58 addshore@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/repo: gerrit:612723 Stop checking if WikibaseLib is loaded T258062 (already on mwmaint1002) (duration: 01m 08s)
  • 14:51 addshore: pulled https://gerrit.wikimedia.org/r/612723 onto mwmaint 1002 ahead of syncing everywhere (and CI finishing)
  • 14:37 ema: A:cp: upgrade purged to 0.17 T257573
  • 14:30 ema: upload purged 0.17 to buster-wikimedia T257573
  • 14:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add exceptional wikitech VE/Parsoid config T241961 (duration: 01m 04s)
  • 14:26 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add exceptional wikitech VE/Parsoid config T241961 (duration: 01m 05s)
  • 14:25 gehel: repooling wdqs1006 - catched up on lag
  • 14:12 akosiaris: increase codfw mobileapps kubernetes traffic to 2% T218733
  • 14:10 akosiaris@cumin1001: conftool action : set/weight=132; selector: dc=codfw,service=mobileapps,name=scb.*
  • 13:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: T258056 Add temporary fix to ensure array is passed to array_map() (duration: 01m 08s)
  • 13:54 akosiaris: pool kubernetes nodes for mobileapps in codfw
  • 13:53 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=kubernetes.*
  • 13:53 akosiaris@cumin1001: conftool action : set/weight=264; selector: dc=codfw,service=mobileapps,name=scb.*
  • 13:51 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=kubernetes.*
  • 13:04 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.41 (duration: 01m 05s)
  • 13:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.41
  • 11:59 addshore: deploy window closed / done :)
  • 11:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:609987 Commons: Define entity sources configuration (take 2) T254315 (duration: 01m 03s)
  • 11:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:612668 Wikibase test: Client local entity sources are always testwikidata T254315 (duration: 01m 05s)
  • 11:27 addshore@deploy1001: Synchronized wmf-config: T254315 gerrit:612669 Wikidata test: Split client db lists. PT2/2 (duration: 01m 06s)
  • 11:26 addshore@deploy1001: Synchronized dblists/wikidataclient.dblist: T254315 gerrit:612669 Wikidata test: Split client db lists. PT1/2 (duration: 01m 05s)
  • 11:16 XioNoX: remove as-path prepending in esams
  • 11:11 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: LABS gerrit:612667 Wikibase labs: All client "local" entity sources are wikidata T254315 (duration: 01m 04s)
  • 11:08 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: gerrit:612666 Wikibase: Split localEntitySourceName config for repo and client T254315 (duration: 01m 16s)
  • 11:05 XioNoX: re-enable ping offload in esams
  • 11:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:56 XioNoX: disable ping offload in esams
  • 10:55 XioNoX: re-enable ping offload in codfw
  • 10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:45 XioNoX: disable ping offload in codfw
  • 10:44 XioNoX: re-enable ping offload in eqiad
  • 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:31 XioNoX: disable ping offload in eqiad
  • 10:31 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:30 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11916 and previous config saved to /var/cache/conftool/dbconfig/20200715-102605-marostegui.json
  • 10:20 jayme: updating python3-docker-report to 0.0.5-1 on deneb
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11915 and previous config saved to /var/cache/conftool/dbconfig/20200715-100855-marostegui.json
  • 10:07 jayme: imported docker-report_0.0.5-1 to buster-wikimedia
  • 09:48 marostegui: Deploy schema change on s8 codfw master, lag will appear on codfw T256685
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11914 and previous config saved to /var/cache/conftool/dbconfig/20200715-094226-marostegui.json
  • 09:22 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 09:19 akosiaris: deploy mobileapps in kubernetes to talk HTTPS to the mw API
  • 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:07 akosiaris: Correction: deploy eventgate-analytics-external in staging, eqiad, codfw for switching to using discovery records and HTTPS for talking to the API
  • 09:06 akosiaris: deploy eventgate-analytics in staging, eqiad, codfw for switching to using discovery records and HTTPS for talking to the API
  • 09:06 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 09:06 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P11913 and previous config saved to /var/cache/conftool/dbconfig/20200715-090545-marostegui.json
  • 09:04 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 09:04 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11912 and previous config saved to /var/cache/conftool/dbconfig/20200715-085032-marostegui.json
  • 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:19 moritzm: piwik.wikimedia.org switched to CAS authentication
  • 08:19 elukey: move piwik.wikimedia.org to CAS (idp.wikimedia.org)
  • 07:29 XioNoX: delete deprecated AS3209 AMS-IX router
  • 06:59 dcausse: depooling wdqs1006 (high lag)
  • 06:09 marostegui: Stop replication on db1120 to avoid having 10.4 -> 10.1 replication for long T254871
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 for reimage T254871', diff saved to https://phabricator.wikimedia.org/P11911 and previous config saved to /var/cache/conftool/dbconfig/20200715-060649-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 master T254871', diff saved to https://phabricator.wikimedia.org/P11910 and previous config saved to /var/cache/conftool/dbconfig/20200715-060145-marostegui.json
  • 06:00 marostegui: Starting x1 failover from db1120 to db1103 - T254871
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 ', diff saved to https://phabricator.wikimedia.org/P11909 and previous config saved to /var/cache/conftool/dbconfig/20200715-052939-marostegui.json
  • 04:46 marostegui: Start x1 pre failover steps T254871
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 weight to 0 before the switchover T254871', diff saved to https://phabricator.wikimedia.org/P11908 and previous config saved to /var/cache/conftool/dbconfig/20200715-044432-marostegui.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1135', diff saved to https://phabricator.wikimedia.org/P11907 and previous config saved to /var/cache/conftool/dbconfig/20200715-044332-marostegui.json
  • 01:45 eileen: tools revision changed from a9e7dc1559 to 7b6018a16e
  • 00:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8f6f660]: 0.3.41 (duration: 15m 10s)
  • 00:11 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8f6f660]: 0.3.41

2020-07-14

  • 19:52 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/vendor/wikimedia/parsoid/: T252448 T255190 Bump Parsoid to v0.12.0-a23 (duration: 01m 06s)
  • 18:13 ryankemper: all long-running elasticsearch reindex jobs are complete
  • 18:09 jforrester@deploy1001: Synchronized dblists/: T32405 T254287 Remove the mobilemainpagelegacy dblist (duration: 01m 04s)
  • 18:07 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: T32405 T254287 Stop loading the mobilemainpagelegacy dblist (duration: 01m 05s)
  • 18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T32405 T254287 Stop varying wgMFSpecialCaseMainPage (duration: 01m 05s)
  • 15:56 elukey: upgrade spark2 on stat100x to 2.4.4-bin-hadoop2.6-3
  • 15:40 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/skins/Vector/includes/SkinVector.php: T257914 Restore div wrapper around print footer (duration: 01m 03s)
  • 14:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:48 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: Fix case of directory name (duration: 01m 05s)
  • 14:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:48 moritzm: rebooting apt1001 for kernel update
  • 14:42 jynus: stopping db1117:3322 (m2) replication temp. for otrs db cloning T257928
  • 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:26 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:14 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:13 andrewbogott: upgrading wikitech-static to mw 1.34.2
  • 14:11 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11900 and previous config saved to /var/cache/conftool/dbconfig/20200714-132823-marostegui.json
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11899 and previous config saved to /var/cache/conftool/dbconfig/20200714-132742-marostegui.json
  • 13:27 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
  • 13:24 jbond42: reboot dns1001
  • 13:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:22 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
  • 13:22 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
  • 13:18 jbond42: reboot dns1002
  • 13:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
  • 13:16 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
  • 13:13 jbond42: reboot dns2002
  • 13:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:13 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
  • 13:13 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
  • 13:10 jbond42: reboot dns2001
  • 13:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
  • 13:09 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 13:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 13:01 jbond42: rebooting dns3002
  • 13:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:58 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 12:57 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps after fixing T257887 (duration: 01m 02s)
  • 12:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
  • 12:24 jbond42: route ns0.wikimedia.org to codfw for reboot
  • 12:20 moritzm: installing xen security updates (client-side tools/libs)
  • 12:19 jbond42: re-enable puppet fleet
  • 12:07 jbond42: disable puppet fleet wide to reboot puppetdb's
  • 12:07 jbond42: disable puppet ro reboot puppetdb's
  • 12:01 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.41
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for query plan checks T238966 ', diff saved to https://phabricator.wikimedia.org/P11898 and previous config saved to /var/cache/conftool/dbconfig/20200714-113612-marostegui.json
  • 11:35 _joe_: restart pybal on lvs2009 T257887
  • 11:31 _joe_: restart pybal on lvs2010 T257887
  • 11:25 _joe_: restart pybal on lvs1015 T257887
  • 11:22 _joe_: restart pybal on lvs1016
  • 11:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 11:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:59 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 10:56 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp2005.codfw.wmnet
  • 10:52 volans: powerdown wtp2005, hardware issue - T257903
  • 10:47 volans@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet
  • 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
  • 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid
  • 10:45 effie: depool wtp2005
  • 10:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 10:32 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 10:18 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 10:14 James_F: Running AbuseFilter's updateVarDumps for group1 T246539
  • 10:13 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P11897 and previous config saved to /var/cache/conftool/dbconfig/20200714-094449-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11896 and previous config saved to /var/cache/conftool/dbconfig/20200714-094354-marostegui.json
  • 09:39 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Add REL1_35 as a candidate release (duration: 01m 06s)
  • 09:05 jforrester@deploy1001: Finished scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it T256669 (duration: 51m 41s)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for PDU upgrade T257871', diff saved to https://phabricator.wikimedia.org/P11895 and previous config saved to /var/cache/conftool/dbconfig/20200714-084033-marostegui.json
  • 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:13 jforrester@deploy1001: Started scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it T256669
  • 08:05 akosiaris: restart pybal on lvs2009
  • 08:03 _joe_: restart pybal on lvs1016
  • 08:02 akosiaris: restart pybal on lvs2007
  • 08:01 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=restbase2009.codfw.wmnet
  • 08:00 _joe_: restart pybal on lvs1015
  • 08:00 akosiaris: restart pybal on lvs2010 after merging https://gerrit.wikimedia.org/r/612487
  • 07:52 jforrester@deploy1001: sync aborted: Re-start full scap to push out wmf.41 and switch testwikis to it T256669 (duration: 02m 14s)
  • 07:50 jforrester@deploy1001: Started scap: Re-start full scap to push out wmf.41 and switch testwikis to it T256669
  • 07:48 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps in an attempt to fix T257887 (duration: 01m 06s)
  • 07:32 oblivian@deploy1001: sync-file aborted: revert forcehttps in an attempt to fix T257887 (duration: 00m 20s)
  • 07:31 oblivian@deploy1001: Scap failed!: 7/9 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 07:27 moritzm: installing libtasn1-6 security updates
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P11894 and previous config saved to /var/cache/conftool/dbconfig/20200714-071233-marostegui.json
  • 07:04 marostegui: Drop gerrit, gerritro, gerrittest users from m2 databases - T255715
  • 06:58 marostegui: Stop mysql on db1131 for HW maintenance
  • 06:56 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 06:54 jforrester@deploy1001: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org) (duration: 24m 59s)
  • 06:54 jforrester@deploy1001: Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 06:53 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 06:53 marostegui: Deploy MCR schema change on s5 primary master T238966
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11893 and previous config saved to /var/cache/conftool/dbconfig/20200714-065229-marostegui.json
  • 06:29 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.41
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease a bit db1088 load', diff saved to https://phabricator.wikimedia.org/P11891 and previous config saved to /var/cache/conftool/dbconfig/20200714-051551-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for HW maintenance', diff saved to https://phabricator.wikimedia.org/P11890 and previous config saved to /var/cache/conftool/dbconfig/20200714-050931-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 from api', diff saved to https://phabricator.wikimedia.org/P11889 and previous config saved to /var/cache/conftool/dbconfig/20200714-050912-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1093 to s6 master and remove read-only from s6 T257253', diff saved to https://phabricator.wikimedia.org/P11888 and previous config saved to /var/cache/conftool/dbconfig/20200714-050157-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance T257253', diff saved to https://phabricator.wikimedia.org/P11887 and previous config saved to /var/cache/conftool/dbconfig/20200714-050039-marostegui.json
  • 05:00 marostegui: Starting s6 failover from db1131 to db1093 - T257253
  • 04:59 James_F: 1.35.0-wmf.41 branched at 7d04152
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P11886 and previous config saved to /var/cache/conftool/dbconfig/20200714-043907-marostegui.json
  • 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 in preparation for failover', diff saved to https://phabricator.wikimedia.org/P11885 and previous config saved to /var/cache/conftool/dbconfig/20200714-041548-marostegui.json
  • 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11884 and previous config saved to /var/cache/conftool/dbconfig/20200714-041440-marostegui.json
  • 01:23 ryankemper: Started long-running Elasticsearch reindex of `eqiad`, `codfw`, and `cloudelastic`. tmux session `reindex` under `ryankemper` on `mwmaint1002`
  • 01:20 cdanis: ❌cdanis@lvs1015.eqiad.wmnet ~ πŸ•€πŸΊ sudo systemctl restart pybal.service
  • 01:15 cdanis: βœ”οΈ cdanis@lvs1016.eqiad.wmnet ~ πŸ•˜πŸΊ sudo systemctl restart pybal.service
  • 01:14 cdanis: βœ”οΈ cdanis@lvs2009.codfw.wmnet ~ πŸ•˜πŸΊ sudo systemctl restart pybal.service
  • 01:01 cdanis: βœ”οΈ cdanis@lvs2010.codfw.wmnet ~ πŸ•˜πŸΊ sudo systemctl restart pybal.service

2020-07-13

  • 23:06 mutante: releases* delete /usr/local/sbin/sync-* scripts created by rsync::quickdatacopy and let puppet recreate the ones still needed
  • 22:27 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I80ca62643f5c (duration: 00m 58s)
  • 20:12 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding (duration: 00m 29s)
  • 20:12 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding
  • 20:03 mutante: rsynced reprepro data from releases1001 to releases1002, releases2002
  • 19:50 eileen: disable target smart job process-control config revision is b00e7680ca
  • 19:48 milimetric@deploy1001: Finished deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1] (duration: 00m 07s)
  • 19:47 milimetric@deploy1001: Started deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1]
  • 19:47 milimetric@deploy1001: Finished deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1] (duration: 06m 41s)
  • 19:41 milimetric@deploy1001: Started deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1]
  • 19:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:33 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I1a1212 (duration: 00m 57s)
  • 18:53 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T248343 Don't use the 'zeroconf' configuration for VisualEditor (duration: 00m 55s)
  • 18:43 dcausse: BACON done
  • 18:40 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T257745: Add rollbacker to elwiki (duration: 00m 56s)
  • 18:26 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T250810: Set proper language code for some wikis (duration: 00m 56s)
  • 18:18 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T256928: Scale largest shards to be closer to 30GB (duration: 00m 56s)
  • 16:17 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:17 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:56 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: Load WikibaseClient using extension registration in beta (T257435) (duration: 00m 55s)
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11882 and previous config saved to /var/cache/conftool/dbconfig/20200713-155240-marostegui.json
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11881 and previous config saved to /var/cache/conftool/dbconfig/20200713-154847-marostegui.json
  • 15:39 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 15:35 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 15:30 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 14:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting DiscussionToolsEnableVisual, default value (duration: 00m 57s)
  • 14:17 moritzm: removing lilypond from production T257066
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11880 and previous config saved to /var/cache/conftool/dbconfig/20200713-133604-marostegui.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11879 and previous config saved to /var/cache/conftool/dbconfig/20200713-133535-marostegui.json
  • 13:05 kormat@cumin1001: dbctl commit (dc=all): 'Fully repool es1022, and set es1020 to zero weight T257284', diff saved to https://phabricator.wikimedia.org/P11878 and previous config saved to /var/cache/conftool/dbconfig/20200713-130532-kormat.json
  • 12:08 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling es1022 after reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11873 and previous config saved to /var/cache/conftool/dbconfig/20200713-120818-kormat.json
  • 11:49 Urbanecm: Password reset for User:Alert5 (T257806)
  • 11:44 akosiaris: repool ganeti1007 T244530. Start emptying ganeti1008
  • 11:08 Urbanecm: EU B&C done
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 896c042: Enable SandboxLink extension in trwiki (T256782) (duration: 00m 56s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (612175) (duration: 00m 56s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (612175) (duration: 00m 56s)
  • 09:42 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 ema: cp: rolling ats-backend-restart to apply SyslogIdentifier changes -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/611311
  • 08:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T248343 Explicitly set visualeditor-enable to 0 when non-default (duration: 00m 57s)
  • 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1022 for reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11871 and previous config saved to /var/cache/conftool/dbconfig/20200713-084449-kormat.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1093', diff saved to https://phabricator.wikimedia.org/P11870 and previous config saved to /var/cache/conftool/dbconfig/20200713-083902-marostegui.json
  • 08:34 kormat@cumin1001: dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1022 T257284', diff saved to https://phabricator.wikimedia.org/P11869 and previous config saved to /var/cache/conftool/dbconfig/20200713-083414-kormat.json
  • 08:20 kormat: reimaging es1022 T257284
  • 06:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:52 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:51 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:16 marostegui: Reverse gerrit password on m2 master - T255715
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11868 and previous config saved to /var/cache/conftool/dbconfig/20200713-060410-marostegui.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11867 and previous config saved to /var/cache/conftool/dbconfig/20200713-055422-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for upgrade', diff saved to https://phabricator.wikimedia.org/P11866 and previous config saved to /var/cache/conftool/dbconfig/20200713-054840-marostegui.json
  • 05:34 marostegui: Deploy schema change on s3 codfw master, lag will appear on codfw T253276
  • 05:30 marostegui: Stop replication on db1082 for schema change and triggers removal T238966
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11865 and previous config saved to /var/cache/conftool/dbconfig/20200713-052928-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for innodb compression', diff saved to https://phabricator.wikimedia.org/P11864 and previous config saved to /var/cache/conftool/dbconfig/20200713-051428-marostegui.json

2020-07-11

  • 19:16 qchris: Restarting Gerrit on gerrit1001 to switch to new gerrit.war and zuul plugin
  • 19:16 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001 (duration: 00m 07s)
  • 19:15 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001
  • 19:08 qchris: Restarting Gerrit on gerrit2001 to switch to new gerrit.war and zuul plugin
  • 18:55 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001 (duration: 00m 10s)
  • 18:55 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001

2020-07-10

  • 21:52 ryankemper: Started long-running reindex of Elasticsearch indices in `eqiad`, `codfw`, and `dewiki` on `mwmaint1002` under tmux session `reindex` for user `ryankemper`
  • 20:26 jgleeson: updated fundraising-tools from 08ba1f6177 to f8e424fe32
  • 19:02 mutante: removing firewall hole for gerrit -> mysql servers on dbproxy servers for misc db's
  • 18:44 mutante: kubernetes1004 - started nagios-nrpe-server
  • 17:57 ebernhardson: change loginwiki password for Cindy-the-browser-test-bot, no email account was associated to allow for normal reset.
  • 17:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I63fcea7737 (duration: 00m 57s)
  • 16:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
  • 15:57 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN) (duration: 00m 08s)
  • 15:56 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN)
  • 15:44 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist (duration: 15m 17s)
  • 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 15:29 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist
  • 15:19 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
  • 14:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 14:37 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 13:41 godog: bounce ms-be1037, not quite responsive
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11860 and previous config saved to /var/cache/conftool/dbconfig/20200710-123604-marostegui.json
  • 12:20 reedy@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Score/: Make Score errors use a specific css class (duration: 00m 58s)
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'Finish repooling es1021, and remove weight from es1010 T257284', diff saved to https://phabricator.wikimedia.org/P11859 and previous config saved to /var/cache/conftool/dbconfig/20200710-102147-kormat.json
  • 09:49 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling es1021 after reimage @ 50% T257284', diff saved to https://phabricator.wikimedia.org/P11858 and previous config saved to /var/cache/conftool/dbconfig/20200710-094954-kormat.json
  • 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11857 and previous config saved to /var/cache/conftool/dbconfig/20200710-085157-marostegui.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P11856 and previous config saved to /var/cache/conftool/dbconfig/20200710-085112-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107', diff saved to https://phabricator.wikimedia.org/P11855 and previous config saved to /var/cache/conftool/dbconfig/20200710-085040-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P11853 and previous config saved to /var/cache/conftool/dbconfig/20200710-082346-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11852 and previous config saved to /var/cache/conftool/dbconfig/20200710-082329-marostegui.json
  • 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11851 and previous config saved to /var/cache/conftool/dbconfig/20200710-080912-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119', diff saved to https://phabricator.wikimedia.org/P11850 and previous config saved to /var/cache/conftool/dbconfig/20200710-080854-marostegui.json
  • 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1021 for reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11849 and previous config saved to /var/cache/conftool/dbconfig/20200710-080843-kormat.json
  • 08:01 kormat@cumin1001: dbctl commit (dc=all): 'Reset es2020/es2021 to correct weights after master switch T257284', diff saved to https://phabricator.wikimedia.org/P11848 and previous config saved to /var/cache/conftool/dbconfig/20200710-080133-kormat.json
  • 08:00 moritzm: installing cron security updates on jessie (stretch/buster already fixed)
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P11847 and previous config saved to /var/cache/conftool/dbconfig/20200710-075608-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11846 and previous config saved to /var/cache/conftool/dbconfig/20200710-075500-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079', diff saved to https://phabricator.wikimedia.org/P11845 and previous config saved to /var/cache/conftool/dbconfig/20200710-075431-marostegui.json
  • 07:44 kormat: reimaging es1021 to buster T257284
  • 07:43 kormat@cumin1001: dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1021 T257284', diff saved to https://phabricator.wikimedia.org/P11844 and previous config saved to /var/cache/conftool/dbconfig/20200710-074326-kormat.json
  • 07:41 jbond@deploy1001: Finished deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors (duration: 00m 05s)
  • 07:41 jbond@deploy1001: Started deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors
  • 07:32 moritzm: installing e2fsprogs security updates on jessie (stretch/buster already fixed)
  • 07:15 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 07:14 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 07:13 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11843 and previous config saved to /var/cache/conftool/dbconfig/20200710-065751-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11841 and previous config saved to /var/cache/conftool/dbconfig/20200710-063818-marostegui.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134', diff saved to https://phabricator.wikimedia.org/P11840 and previous config saved to /var/cache/conftool/dbconfig/20200710-063746-marostegui.json
  • 06:35 marostegui: Compress InnoDB on db1124:3311 (Sanitarium - lag will appear on s1 on labsdb) - T254462
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P11839 and previous config saved to /var/cache/conftool/dbconfig/20200710-044428-marostegui.json
  • 01:44 mutante: LDAP - adding coka to wmde and nda (T257038)
  • 00:47 Reedy: truncated labswiki.interwiki table (outdated and unnecessary)

2020-07-09

  • 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I2c2dea832 (duration: 00m 56s)
  • 21:52 tgr: all sessions have been invalidated due to T256395
  • 20:58 eileen: https://phabricator.wikimedia.org/T253152
  • 19:16 herron: upgraded eqiad elk7 cluster from 7.4.2 to 7.8.0 T234854
  • 19:05 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.40 refs T256668
  • 18:51 elukey: update spark2 to 2.4.4-bin-hadoop2.6-3 for buster-wikimedia
  • 18:44 mutante: stat1004, stat1006, stat1007 - upgrading git-review package from 1.25 to 1.27 so that it keeps working with new Gerrit 3.2 (T257609)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9f2557f: Updating config for Readers Web affinity quicksurvey (T246977) (duration: 01m 06s)
  • 17:42 chaomodus: codfw frack management dns automation deployment complete T233183
  • 17:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:36 James_F: Synchronized wmf-config/CommonSettings.php: ExtensionDistribution: Drop REL1_33, EOL'ed T256087
  • 17:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 17:35 moritzm: rebooting moscovium for kernel update
  • 17:33 chaomodus: deploying frack codfw management dns automation
  • 17:32 crusnov@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:28 crusnov@cumin2001: START - Cookbook sre.dns.netbox
  • 17:27 moritzm: rebooting planet1002 (planet.wikimedia.org) for kernel update
  • 17:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 17:10 krinkle@deploy1001: Synchronized wmf-config/: Ia2f5ed (duration: 01m 04s)
  • 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Ia2f5ed (duration: 01m 05s)
  • 15:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:29 papaul: replacing msw-b1,b2,b3 and b4
  • 14:03 moritzm: installing libtirpc security updates
  • 13:45 moritzm: installing gnutls28 security updates
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P11831 and previous config saved to /var/cache/conftool/dbconfig/20200709-133134-marostegui.json
  • 13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:29 moritzm: rebooting puppetboard1001 (puppetboard.wikimedia.org) for kernel update
  • 13:15 moritzm: installing ffmpeg security updates
  • 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P11830 and previous config saved to /var/cache/conftool/dbconfig/20200709-131039-marostegui.json
  • 13:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:57 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:56 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:56 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:54 moritzm: rebooting install* servers for kernel security update
  • 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:38 moritzm: rebooting urldownloader1001/2001 for kernel update (failed over, these are now the inactive ones)
  • 12:23 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 12:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:22 moritzm: rebooting dbmonitor1001 / tendril.wikimedia.org for kernek update
  • 12:11 XioNoX: enable asw2-b-eqiad:ae3 (to cloudsw1-c8) - T251632
  • 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:50 moritzm: rebooting debmonitor1001 for kernel update
  • 11:42 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Translate/tag/SpecialPageTranslation.php: 6541d3f: DeprecatablePropertyArray: Use MW_VERSION instead of array_key_exists (T257531) (duration: 01m 05s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3a7c1c3: Rename namespace on kn.wikipedia.org (T255337) (duration: 01m 04s)
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0a3c1f9: Add *.oireachtas.ie to the wgCopyUploadsDomains whitelist for commonswiki (T256543) (duration: 01m 04s)
  • 11:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:10 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e6f442c: Enable Quicksurveys for Desktop Improvements Project (T246977) (duration: 01m 06s)
  • 11:01 vgutierrez: restart ats-tls on cp1085
  • 10:55 _joe_: restarting php7.2-fpm on mw1282, workers failing with sigill
  • 10:54 _joe_: depool mw1282
  • 10:54 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:34 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:23 _joe_: rolling restart the remaining restbases in eqiad, and all of codfw
  • 10:22 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:09 _joe_: restarting restbase on rb1020-22
  • 09:53 _joe_: restarting restbase on restbase1024,1023
  • 09:36 _joe_: restarting restbase on rb1026,1027 to switch to proton on k8s
  • 09:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:28 _joe_: restarting restbase on restbase1025 to pick up the switch to k8s of proton
  • 09:27 godog: bounce thanos-compact on thanos-fe2001
  • 09:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P11828 and previous config saved to /var/cache/conftool/dbconfig/20200709-085228-marostegui.json
  • 08:44 marostegui: Stop haproxy on dbproxy1017 before upgrading to buster - T255408
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P11827 and previous config saved to /var/cache/conftool/dbconfig/20200709-082355-marostegui.json
  • 08:23 moritzm: imported osm2pgsql 0.96.0+ds-1~bpo9+1 to "main" component T256877
  • 08:22 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 08:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 08:13 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 08:11 XioNoX: disable igmp snooping on msw1-codfw
  • 07:59 marostegui: Stop db1117:3322 to clone db1084, this will trigger haproxy alerts - T257540
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P11825 and previous config saved to /var/cache/conftool/dbconfig/20200709-075749-marostegui.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11824 and previous config saved to /var/cache/conftool/dbconfig/20200709-053905-marostegui.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl', diff saved to https://phabricator.wikimedia.org/P11823 and previous config saved to /var/cache/conftool/dbconfig/20200709-053206-marostegui.json
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11822 and previous config saved to /var/cache/conftool/dbconfig/20200709-051826-marostegui.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11821 and previous config saved to /var/cache/conftool/dbconfig/20200709-051355-marostegui.json
  • 05:11 marostegui: Remove revision triggers from db2093:3315 T238966
  • 05:10 marostegui: Deploy schema change on s5 codfw, lag will be generated - T238966
  • 01:43 tzatziki: reset email for GseSro
  • 00:58 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜πŸΊ sudo cumin A:cp 'enable-puppet "cdanis deploying I6c1b646e T256395"'
  • 00:49 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜πŸΊ sudo cumin A:cp 'disable-puppet "cdanis deploying I6c1b646e T256395"'

2020-07-08

  • 21:56 mutante: deleting files from releases2001 that are not existing on releases1001 to make them mirrors. rsync with --delete and the command from quickdatacopy class (T247652)
  • 21:55 mutante: rsyncing releases files from releases1001 to releases2002 and releases1002. deleting files from releases2002 not existing on releases1002 to make them mirrors ( T247652_
  • 20:59 cstone: civicrm revision changed from d73ee2e73f to 8b09c87ce2,
  • 20:27 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T256012)
  • 20:08 Amir1_: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T256012)
  • 19:18 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.40 refs T256668 (duration: 01m 04s)
  • 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.40 refs T256668
  • 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 091442c: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T256518) (duration: 01m 04s)
  • 18:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2e5943d: Add scan-bugs.org to $wgCopyUploadsDomains (T256569) (duration: 01m 04s)
  • 18:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: f42cdf2: Change bnwiki logo (T255328) (duration: 01m 04s)
  • 18:27 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Cleanup: remove temporary wmgDisableHTCP variable gerrit:607596 T250781 IS.php (duration: 01m 01s)
  • 18:20 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable HTCP purging everywhere gerrit:607593 T250781 CS.php (duration: 01m 03s)
  • 18:18 ppchelko@deploy1001: Synchronized wmf-config/wikitech.php: Disable HTCP purging everywhere gerrit:607593 T250781 wikitech.php (duration: 01m 04s)
  • 18:17 ppchelko@deploy1001: Synchronized wmf-config/reverse-proxy.php: Disable HTCP purging everywhere gerrit:607593 T250781 reverse-proxy.php (duration: 01m 04s)
  • 18:11 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 T229863, IS.php (duration: 01m 03s)
  • 18:04 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 T229863 (duration: 01m 04s)
  • 17:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
  • 17:16 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 17:16 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 16:57 _joe_: restarting restbase across the fleet to transition to using envoy
  • 16:40 _joe_: restarting restbase on restbase2010 to route calls to mediawiki, parsoid via envoy
  • 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:22 jgleeson: updated fundraising-tools from a244e0e85f --> f5b8528214
  • 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:12 moritzm: rebooting people1002 (people.wikimedia.org) for kernel security update
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:46 moritzm: installing isc-dhcp security updates
  • 14:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
  • 14:31 moritzm: installing gdk-pixbuf security updates
  • 14:26 _joe_: repooling mw1346
  • 14:24 _joe_: php7adm /opcache-free on mw1346
  • 14:15 jbond42: switch icinga authentication to CAS SSO
  • 14:12 _joe_: depooling mw1346
  • 14:12 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:04 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 14:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:04 moritzm: rebooting idp-test1001 for kernel update
  • 13:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.stop-cluster (exit_code=97)
  • 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
  • 13:31 jynus: replacing ssh key for ci_docroot at deploy1001
  • 13:31 moritzm: imported git 2.20.1-2+deb10u3~wmf1 for stretch-wikimedia component/git T257308
  • 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 13:00 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 12:41 marostegui: Deploy schema change on s7 codfw, lag is expected
  • 12:17 xionox-tmp: rollout less frequent option-refresh-rate - T240658
  • 12:01 xionox-tmp: renumber eqiad NTT link - T254877
  • 11:42 awight: EU BACON complete
  • 11:41 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: Undeploy graphoid for phase 1 wikis (T257402) (duration: 01m 03s)
  • 11:31 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: Add nature.com to commonswiki wgCopyUploadDomains (T254342) (duration: 01m 03s)
  • 11:29 moritzm: installing freetype security updates
  • 11:26 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [hiwikibooks] Translate sitename for hi.wikibooks (T256587) (duration: 01m 03s)
  • 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [arwiki] Grant 'patrolmarks' to all (T257106) (duration: 01m 04s)
  • 11:18 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 11:18 moritzm: installing libgcrypt20 security updates
  • 11:16 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 11:07 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: Provision WMDE TeWΓΌ survey for prototype 1 (T257306), file 2/2 (duration: 01m 03s)
  • 11:06 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BACON: Provision WMDE TeWΓΌ survey for prototype 1 (T257306), file 1/2 (duration: 01m 16s)
  • 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P11818 and previous config saved to /var/cache/conftool/dbconfig/20200708-110546-marostegui.json
  • 10:51 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:50 akosiaris: apply calico egress policies
  • 10:50 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:45 moritzm: installing json-c security updates
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P11817 and previous config saved to /var/cache/conftool/dbconfig/20200708-102553-marostegui.json
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11816 and previous config saved to /var/cache/conftool/dbconfig/20200708-102500-marostegui.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11815 and previous config saved to /var/cache/conftool/dbconfig/20200708-101313-marostegui.json
  • 09:58 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 09:56 kormat@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:50 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1149', diff saved to https://phabricator.wikimedia.org/P11814 and previous config saved to /var/cache/conftool/dbconfig/20200708-094539-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149', diff saved to https://phabricator.wikimedia.org/P11813 and previous config saved to /var/cache/conftool/dbconfig/20200708-092650-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P11812 and previous config saved to /var/cache/conftool/dbconfig/20200708-092627-marostegui.json
  • 09:24 xionox-tmp: renumber eqord NTT link - T254877
  • 09:18 xionox-tmp: remove eqord-eqiad tunnel - T254877
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P11811 and previous config saved to /var/cache/conftool/dbconfig/20200708-091557-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1147', diff saved to https://phabricator.wikimedia.org/P11810 and previous config saved to /var/cache/conftool/dbconfig/20200708-085745-marostegui.json
  • 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P11809 and previous config saved to /var/cache/conftool/dbconfig/20200708-085024-marostegui.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074', diff saved to https://phabricator.wikimedia.org/P11808 and previous config saved to /var/cache/conftool/dbconfig/20200708-084227-marostegui.json
  • 08:40 moritzm: upgrading docker on remaining buster hosts
  • 08:38 hashar: Upgraded docker.io on contint1001 and contint2001
  • 08:28 marostegui: Remove dbproxy1003 grants from misc hosts T231280
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11807 and previous config saved to /var/cache/conftool/dbconfig/20200708-082624-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11806 and previous config saved to /var/cache/conftool/dbconfig/20200708-082040-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11805 and previous config saved to /var/cache/conftool/dbconfig/20200708-081647-marostegui.json
  • 08:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2020 for reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11804 and previous config saved to /var/cache/conftool/dbconfig/20200708-081519-kormat.json
  • 08:00 marostegui: Failover m1 from db1097 to db1080 - T256717
  • 07:57 kormat: reimaging es2020 to buster T257284
  • 07:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11803 and previous config saved to /var/cache/conftool/dbconfig/20200708-074939-marostegui.json
  • 07:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:48 jynus: stop bacula-director on backup1001 in preparation for m1 switchover T256717
  • 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:47 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:45 moritzm: installing PHP 7.3 security updates
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P11802 and previous config saved to /var/cache/conftool/dbconfig/20200708-073548-marostegui.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P11801 and previous config saved to /var/cache/conftool/dbconfig/20200708-073037-marostegui.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1142', diff saved to https://phabricator.wikimedia.org/P11800 and previous config saved to /var/cache/conftool/dbconfig/20200708-073011-marostegui.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P11799 and previous config saved to /var/cache/conftool/dbconfig/20200708-072431-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141', diff saved to https://phabricator.wikimedia.org/P11798 and previous config saved to /var/cache/conftool/dbconfig/20200708-070921-marostegui.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P11797 and previous config saved to /var/cache/conftool/dbconfig/20200708-070432-marostegui.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1138', diff saved to https://phabricator.wikimedia.org/P11796 and previous config saved to /var/cache/conftool/dbconfig/20200708-070403-marostegui.json
  • 06:47 marostegui: start topology changes on m1 T256717
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11795 and previous config saved to /var/cache/conftool/dbconfig/20200708-064354-marostegui.json
  • 06:36 marostegui: Deploy schema change on s2 primary master db1122 T238966
  • 06:18 _joe_: rolling restart of restbase to pick up the proton url change
  • 03:36 andrew@deploy1001: Finished deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130 (duration: 03m 44s)
  • 03:32 andrew@deploy1001: Started deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130

2020-07-07

  • 22:41 mutante: new Wikimedia Annual Report 2019 now available on annual.wikimedia.org
  • 21:29 andrew@deploy1001: Finished deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130 (duration: 03m 35s)
  • 21:25 andrew@deploy1001: Started deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130
  • 21:10 andrew@deploy1001: Finished deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130 (duration: 03m 26s)
  • 21:07 andrew@deploy1001: Started deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130
  • 20:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2 (duration: 09m 15s)
  • 20:32 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2
  • 20:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009 (duration: 14m 28s)
  • 20:24 mutante: kubernetes1003 - starting nagios-nrpe-server
  • 20:23 mutante: kubernetes1001 - starting nagios-nrpe-server
  • 20:17 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009
  • 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:27 mutante: destroying VM gerrit1002 - decom cookbook
  • 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.40 refs T256668
  • 19:04 mutante: contint2001 - move /var/lib/zuul/.ssh/known_hosts to root and run puppet to recreate it
  • 18:38 andrew@deploy1001: Finished deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130 (duration: 03m 18s)
  • 18:35 andrew@deploy1001: Started deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130
  • 18:27 andrew@deploy1001: Finished deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies (duration: 03m 26s)
  • 18:23 andrew@deploy1001: Started deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies
  • 18:10 krinkle@deploy1001: Synchronized w/: remove untracked test cookie file (duration: 01m 04s)
  • 18:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.40/includes/Revision/RevisionStore.php: I8f986daeab4 (duration: 01m 05s)
  • 17:59 herron: imported (logstash|kibana|elasticsearch)-oss-7.8.0 into buster-wikimedia thirdparty/elastic78
  • 17:54 hnowlan: finished removing restbase2009 from cassandra pool
  • 17:06 hnowlan: removed restbase2009-b from cassandra pool, removing restbase2009-c
  • 16:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Wikibase: Backport: Revert "Don’t load $wgWBClientSettings in WikibaseClient.php" (T257296) (duration: 01m 10s)
  • 15:49 hnowlan: running nodetool removenode for restbase2009-a
  • 15:38 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
  • 15:27 elukey: root-tmux on cumin1001 - cumin 'c:profile::mediawiki::mcrouter_wancache' '/usr/local/sbin/restart-mcrouter' -b 2 -s 5 - roll restart of mw-mcrouter to pick up new settings - T255511
  • 15:13 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
  • 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:06 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 15:01 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:58 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus (duration: 00m 04s)
  • 14:58 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus
  • 14:53 _joe_: restarted restbase on restbase2022 after removing restbase2009 from the cassandra seeds
  • 14:48 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:47 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:30 papaul: replacing msw-a5,a6,a7 and a8
  • 14:30 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:16 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: (no justification provided) (duration: 00m 09s)
  • 14:16 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: (no justification provided)
  • 13:38 _joe_: rolling restart of restbase to pick up using envoy
  • 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:29 XioNoX: cr2-eqiad> request vmhost snapshot routing-engine both - T257153
  • 13:24 XioNoX: cr1-eqiad> request vmhost snapshot routing-engine both - T257153
  • 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Promote es2021 to es4 master T257284', diff saved to https://phabricator.wikimedia.org/P11789 and previous config saved to /var/cache/conftool/dbconfig/20200707-131524-kormat.json
  • 12:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:44 kormat: starting (codfw) es5 failover from es2020 to es2021 T257284
  • 12:30 kormat@cumin1001: dbctl commit (dc=all): 'Set es2021 to weight 50 T257284', diff saved to https://phabricator.wikimedia.org/P11787 and previous config saved to /var/cache/conftool/dbconfig/20200707-123003-kormat.json
  • 12:12 jforrester@deploy1001: Finished scap: Full scap and testwikis to 1.35.0-wmf.40 for T256668 (duration: 33m 09s)
  • 12:01 marostegui: Deploy schema change on labswiki (wikitech) master - T253276
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11786 and previous config saved to /var/cache/conftool/dbconfig/20200707-115838-marostegui.json
  • 11:39 jforrester@deploy1001: Started scap: Full scap and testwikis to 1.35.0-wmf.40 for T256668
  • 11:38 jforrester@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "jforrester"; reason is "testwikis wikis to 1.35.0-wmf.40" (duration: 00m 00s)
  • 11:33 moritzm: installing PHP 7.0 security updates
  • 11:29 marostegui: Deploy schema change on db1082, this will create lag on s5 labs
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11784 and previous config saved to /var/cache/conftool/dbconfig/20200707-112926-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11783 and previous config saved to /var/cache/conftool/dbconfig/20200707-112830-marostegui.json
  • 11:26 godog: test bumping logstash7 batch size to 256
  • 11:17 moritzm: prune PHP 7.0 packages from mwdebug1001/2001/2002
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11782 and previous config saved to /var/cache/conftool/dbconfig/20200707-110506-marostegui.json
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11781 and previous config saved to /var/cache/conftool/dbconfig/20200707-110412-marostegui.json
  • 10:57 moritzm: prune PHP 7.0 packages from mw2190-mw2214
  • 10:46 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.40
  • 10:44 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.38 (duration: 17m 23s)
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11780 and previous config saved to /var/cache/conftool/dbconfig/20200707-103255-marostegui.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11779 and previous config saved to /var/cache/conftool/dbconfig/20200707-102757-marostegui.json
  • 10:26 moritzm: prune PHP 7.0 packages from mw2135-mw2147
  • 10:12 addshore@deploy1001: Synchronized wmf-config/config/testcommonswiki.yaml: gerrit:609985 Make testcommonswiki a testwikidata client T257266 PT2/2 (duration: 00m 55s)
  • 10:11 addshore@deploy1001: sync-file aborted: gerrit:609985 Make testcommonswiki a testwikidata client T257266 PT1/2 (duration: 00m 00s)
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11778 and previous config saved to /var/cache/conftool/dbconfig/20200707-101043-marostegui.json
  • 10:10 addshore@deploy1001: Synchronized dblists/wikidataclient-test.dblist: gerrit:609985 Make testcommonswiki a testwikidata client T257266 PT1/2 (duration: 00m 56s)
  • 10:08 addshore@deploy1001: sync-file aborted: gerrit:609985 Make testcommonswiki a testwikidata client T257266 PT1/2 (duration: 00m 36s)
  • 10:06 elukey: decommission archiva1001
  • 10:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11777 and previous config saved to /var/cache/conftool/dbconfig/20200707-100328-marostegui.json
  • 10:03 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:03 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11776 and previous config saved to /var/cache/conftool/dbconfig/20200707-095443-marostegui.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11775 and previous config saved to /var/cache/conftool/dbconfig/20200707-095428-marostegui.json
  • 09:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:609971 T257266 Enable sitelinks to testcommons from test wikidata sites (duration: 00m 56s)
  • 09:40 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2021 after reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11774 and previous config saved to /var/cache/conftool/dbconfig/20200707-094017-kormat.json
  • 09:37 addshore@deploy1001: Synchronized wmf-config: gerrit:609986 T257266 T241975 Wikibase: Remove config option wmgUseEntitySourceBasedFederation (take2) (duration: 00m 57s)
  • 09:36 _joe_: errata: restbase2010, not 2009
  • 09:36 _joe_: applying the new configuration using the service proxy to restbase2009 too
  • 09:34 godog: bounce logstash on logstash1023
  • 09:33 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: gerrit:609645 T257266 T241975 Wikibase: stop using wmgUseEntitySourceBasedFederation (take2) (duration: 00m 59s)
  • 09:33 _joe_: depooling restbase1025 while we fix the troubled relationship between envoy and proton
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11773 and previous config saved to /var/cache/conftool/dbconfig/20200707-093345-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1024 as it is the current master T255755', diff saved to https://phabricator.wikimedia.org/P11772 and previous config saved to /var/cache/conftool/dbconfig/20200707-092635-marostegui.json
  • 09:24 James_F: 1.35.0-wmf.40 was branched at 88ecd6d for T256668
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1023 after reimage T255755', diff saved to https://phabricator.wikimedia.org/P11771 and previous config saved to /var/cache/conftool/dbconfig/20200707-092357-marostegui.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage T255755', diff saved to https://phabricator.wikimedia.org/P11770 and previous config saved to /var/cache/conftool/dbconfig/20200707-091015-marostegui.json
  • 08:33 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage T255755', diff saved to https://phabricator.wikimedia.org/P11769 and previous config saved to /var/cache/conftool/dbconfig/20200707-083144-marostegui.json
  • 08:30 kormat@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:26 XioNoX: cr2-codfw> request vmhost snapshot routing-engine both - T257153
  • 08:22 XioNoX: cr2-eqsin> request vmhost snapshot - T257153
  • 08:19 XioNoX: cr2-eqord> request vmhost snapshot - T257153
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage T255755', diff saved to https://phabricator.wikimedia.org/P11768 and previous config saved to /var/cache/conftool/dbconfig/20200707-081909-marostegui.json
  • 08:18 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.change-distro (exit_code=97)
  • 08:17 XioNoX: cr2-eqdfw> request vmhost snapshot - T257153
  • 08:15 XioNoX: cr3-knams> request vmhost snapshot - T257153
  • 08:15 hashar: upgrading and restart CI Jenkins on contint2001 # T256978
  • 08:12 XioNoX: cr4-ulsfo> request vmhost snapshot - T257153
  • 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2021 for reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11767 and previous config saved to /var/cache/conftool/dbconfig/20200707-080914-kormat.json
  • 07:50 marostegui: Stop MySQL on db1074 to deploy schema change and remove triggers - T238966
  • 07:45 _joe_: restarting restbase again on rb1025
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P11766 and previous config saved to /var/cache/conftool/dbconfig/20200707-074435-marostegui.json
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 and db1136 T257216', diff saved to https://phabricator.wikimedia.org/P11765 and previous config saved to /var/cache/conftool/dbconfig/20200707-073918-marostegui.json
  • 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:31 _joe_: restarting restbase on restbase1025, reaching proton via envoy for now
  • 07:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266) (forgot to git rebase so the last sync was a no-op) (duration: 00m 56s)
  • 07:27 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 07:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266) (duration: 00m 53s)
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136 T257216', diff saved to https://phabricator.wikimedia.org/P11764 and previous config saved to /var/cache/conftool/dbconfig/20200707-072703-marostegui.json
  • 07:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: Config: Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266) (duration: 00m 56s)
  • 07:24 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 07:23 lucaswerkmeister-wmde@deploy1001: Synchronized dblists/wikidataclient.dblist: Config: Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266) (duration: 00m 56s)
  • 07:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Revert "Wikibase: stop using wmgUseEntitySourceBasedFederation" (T241975, T257266) (duration: 00m 55s)
  • 07:16 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 07:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Wikibase: Remove config option wmgUseEntitySourceBasedFederation" (T241975, T257266) (duration: 00m 57s)
  • 07:10 _joe_: restart restbase on restbase1025 to pick up the switch to https for cxserver
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136 T257216', diff saved to https://phabricator.wikimedia.org/P11762 and previous config saved to /var/cache/conftool/dbconfig/20200707-063737-marostegui.json
  • 06:29 marostegui: Reimage es1023 to Buster T255755
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1136 some weight back into main traffic T257216', diff saved to https://phabricator.wikimedia.org/P11761 and previous config saved to /var/cache/conftool/dbconfig/20200707-062008-marostegui.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 T257216', diff saved to https://phabricator.wikimedia.org/P11760 and previous config saved to /var/cache/conftool/dbconfig/20200707-061849-marostegui.json
  • 05:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Enable es5 writes T255755 (duration: 00m 56s)
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 entirely T255755', diff saved to https://phabricator.wikimedia.org/P11759 and previous config saved to /var/cache/conftool/dbconfig/20200707-051620-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1024 to es5 master T255755', diff saved to https://phabricator.wikimedia.org/P11758 and previous config saved to /var/cache/conftool/dbconfig/20200707-051236-marostegui.json
  • 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable es5 writes T255755 (duration: 00m 56s)
  • 05:01 marostegui: "Starting es failover from es1023 to es1024 - https://phabricator.wikimedia.org/T255755"
  • 01:05 ejegg: turned on debug logging for Adyen SmashPig
  • 00:22 cstone: civicrm revision changed from a48caf0f37 to d73ee2e73f

2020-07-06

  • 23:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable sidebar instrumentation on test wikipedia (duration: 00m 56s)
  • 23:32 eileen: process-control config revision is 3fe6753e56
  • 23:22 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change some zh canonical namespaces. Don't index NS_USER on hywiki (duration: 00m 58s)
  • 22:59 eileen: tools revision changed from e974147f27 to 73557b8038
  • 22:14 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@65502b2]: 0.3.40 (duration: 18m 58s)
  • 21:55 ryankemper@deploy1001: Started deploy [wdqs/wdqs@65502b2]: 0.3.40
  • 21:52 hashar: Upgraded Jenkins on releases1002 and releases2002 # T256978
  • 21:41 mutante: upgrading jenkins on releases1001 and releases2001 (T256980)
  • 21:37 mutante: importing jenkins 2.235.1 into APT repo for both stretch and buster T256980
  • 20:08 eileen: tools revision is e974147f27
  • 19:41 qchris: Enabling puppet on gerrit1002 again to catch up with puppetmaster.
  • 18:56 addshore: backport / deploy window done
  • 18:55 addshore@deploy1001: Synchronized wmf-config: gerrit:569263 T241975 Wikibase: Remove config option wmgUseEntitySourceBasedFederation (duration: 00m 58s)
  • 18:54 addshore@deploy1001: sync-file aborted: gerrit:569263 (duration: 00m 00s)
  • 18:51 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: gerrit:608944 T241975 Wikibase: stop using wmgUseEntitySourceBasedFederation (duration: 00m 56s)
  • 18:47 addshore@deploy1001: Synchronized dblists/wikidataclient.dblist: T254315 Wikidata client wikis: Define entity sources configuration (take 2) gerrit:608839 (duration: 00m 56s)
  • 18:45 addshore@deploy1001: Synchronized wmf-config: T254315 Wikidata client wikis: Define entity sources configuration (take 2) gerrit:608839 (duration: 00m 58s)
  • 18:38 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T256906 T256907 T256909 T254315 gerrit:569260 Commons: Define entity sources configuration (duration: 00m 56s)
  • 18:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adffbe6: Enable validation of new signatures (T248632) (duration: 00m 57s)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 8878c60: Add `abusefilter-view` as a default right for the CU log user (T255506) (duration: 00m 55s)
  • 18:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1398171: Add arbcom group to plwiki (T256572) (duration: 00m 56s)
  • 18:08 andrew@deploy1001: Finished deploy [horizon/deploy@bb176c2]: update proxy UI to support multiple pre-set domains (duration: 03m 39s)
  • 18:04 andrew@deploy1001: Started deploy [horizon/deploy@bb176c2]: update proxy UI to support multiple pre-set domains
  • 17:54 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on all wikis - T249261 - take 2 (duration: 00m 56s)
  • 17:50 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on all wikis - T249261 (duration: 00m 56s)
  • 16:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on group1 - T249261 (duration: 00m 58s)
  • 15:02 jynus: removing old snapshots for x1 on dbprov[12]002
  • 14:50 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:46 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:44 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 14:42 moritzm: installing PHP 7.0 security updates
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074', diff saved to https://phabricator.wikimedia.org/P11753 and previous config saved to /var/cache/conftool/dbconfig/20200706-143754-marostegui.json
  • 14:36 godog: reboot ms-be2025 for hw raid software upgrade - T257214
  • 14:28 godog: powercycle ms-be2025, no ssh available - T257214
  • 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:09 marostegui: Stop MySQL and poweroff db1079 T257216
  • 14:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:02 jynus@cumin1001: dbctl commit (dc=all): 'depool db1136 from main traffic as it is the only s7 api host right now', diff saved to https://phabricator.wikimedia.org/P11752 and previous config saved to /var/cache/conftool/dbconfig/20200706-140217-jynus.json
  • 13:56 marostegui: Downtime and reboot db1079 after BBU crash
  • 13:54 jynus@cumin1001: dbctl commit (dc=all): 'depool db1079', diff saved to https://phabricator.wikimedia.org/P11751 and previous config saved to /var/cache/conftool/dbconfig/20200706-135430-jynus.json
  • 13:30 marostegui: Deploy schema change on s5 codfw master T253276
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce es1024 weight in preparation for tomorrow's switchover T255755', diff saved to https://phabricator.wikimedia.org/P11750 and previous config saved to /var/cache/conftool/dbconfig/20200706-132634-marostegui.json
  • 13:03 elukey: force umount/mount of /mnt/hdfs on an-airflow1001 to unblock dpkg checks (fuse misbehaving, all checks hanging)
  • 12:53 elukey: kill hanging lsof processes on an-airflow to reduce cpu load
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P11748 and previous config saved to /var/cache/conftool/dbconfig/20200706-124237-marostegui.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P11747 and previous config saved to /var/cache/conftool/dbconfig/20200706-124105-marostegui.json
  • 11:17 Urbanecm: EU B&C window was done
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5d971dc: GrowthExperiments: Remove overrides to welcome survey privacy policy URL (T252572) (duration: 00m 56s)
  • 11:12 marostegui: Deploy schema changes on db1129
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P11746 and previous config saved to /var/cache/conftool/dbconfig/20200706-111221-marostegui.json
  • 11:09 marostegui: Compress InnoDB on db1107 T254462
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f4b5001: Add arxiv.org to commonswiki wgCopyUploadsDomains (T257036) (duration: 00m 56s)
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T254462', diff saved to https://phabricator.wikimedia.org/P11745 and previous config saved to /var/cache/conftool/dbconfig/20200706-110723-marostegui.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076', diff saved to https://phabricator.wikimedia.org/P11744 and previous config saved to /var/cache/conftool/dbconfig/20200706-110544-marostegui.json
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3bc1b46: Remove "Create a book" link from sidebar on Finnish Wikipedia (T257073) (duration: 00m 56s)
  • 10:52 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (609762) (duration: 00m 57s)
  • 10:51 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (609762) (duration: 00m 56s)
  • 10:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:28 moritzm: rebooting idp1001 for kernel update
  • 09:35 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 58s)
  • 08:51 XioNoX: cr1-codfw> request vmhost snapshot routing-engine both - T257153
  • 08:44 XioNoX: cr3-ulsfo> request vmhost snapshot - T257153
  • 08:24 kormat: restarting all mariadb instances on sanitarium hosts T256545
  • 08:09 elukey: roll restart aqs on aqs100[4-9] to pick up new druid settings
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P11742 and previous config saved to /var/cache/conftool/dbconfig/20200706-080509-marostegui.json
  • 07:58 qchris: Disable puppet on gerrit1002 (gerrit-test) to deploy Gerrit UI updates there to gather more feedback
  • 07:51 elukey: enable binlog on matomo's database on matomo1002
  • 07:46 XioNoX: repool eqsin - T257154
  • 07:11 XioNoX: reboot cr3-eqsin - T257154
  • 06:55 XioNoX: depool eqsin for cr3-eqsin reboot/investigation - T257154
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P11740 and previous config saved to /var/cache/conftool/dbconfig/20200706-065437-marostegui.json
  • 06:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
  • 06:22 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
  • 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 06:14 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 05:45 kart_: Updated cxserver to 2020-07-01-044435-production (T254143)
  • 05:40 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:36 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:32 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P11739 and previous config saved to /var/cache/conftool/dbconfig/20200706-051333-marostegui.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P11738 and previous config saved to /var/cache/conftool/dbconfig/20200706-050347-marostegui.json
  • 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P11737 and previous config saved to /var/cache/conftool/dbconfig/20200706-044908-marostegui.json

2020-07-05

  • 21:50 qchris: Restarting gerrit on gerrit1001 to pick up new war and jars.
  • 21:50 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001 (duration: 00m 07s)
  • 21:50 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001
  • 21:46 qchris: Restarting gerrit on gerrit2001 to pick up new war and jars.
  • 21:45 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001 (duration: 00m 10s)
  • 21:45 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001
  • 21:32 qchris: Restarting gerrit on gerrit1002 to pick up new wars and jars.
  • 21:32 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67 (duration: 00m 08s)
  • 21:32 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67
  • 21:20 qchris: Enable puppet on gerrit1002 (gerrit-test) again to let it catch up again
  • 16:01 gehel: restart elastic-psi on elastic1052 (high GC rate)
  • 15:56 gehel: restart blazegraph + updater on wdqs1007 and depool to allow catching up on lag

2020-07-04

  • 19:23 qchris@deploy1001: Finished deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002 (duration: 00m 08s)
  • 19:23 qchris@deploy1001: Started deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002
  • 14:05 qchris: Disable puppet on gerrit1002 (gerrit-test) to deploy Gerrit UI updates there to gather feedback
  • 12:42 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
  • 02:28 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/includes/Score.php: Short circuit lilypond version check to allow usage of cached files T257066 (duration: 00m 55s)

2020-07-03

  • 21:49 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/: Sync maintenance script (duration: 00m 58s)
  • 18:47 cdanis: βœ”οΈ cdanis@an-coord1001.eqiad.wmnet ~ πŸ•’β˜• sudo systemctl restart hive-server2.service
  • 16:51 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: Ifa929b2ad4 (duration: 00m 57s)
  • 16:02 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Rename wgRestrictionMethod to wgShellRestrictionMethod (duration: 00m 58s)
  • 15:46 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 15:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 15:43 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight to spread load mode evenly', diff saved to https://phabricator.wikimedia.org/P11730 and previous config saved to /var/cache/conftool/dbconfig/20200703-154337-jynus.json
  • 15:40 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:38 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 15:02 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 14:11 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.stop-cluster (exit_code=99)
  • 14:11 _joe_: restarted php-fpm on wtp1033, stuck in sigill
  • 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 12:41 hashar: Restarting Zuul / CI
  • 11:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:29 moritzm: rebooting urldownloader standby hosts for kernel updates (1002/2002)
  • 10:59 moritzm: installing json-c security updates on jessie
  • 10:51 moritzm: installing ruby-json security updates
  • 10:25 moritzm: installing nss security updates on jessie
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:15 elukey: notebook1004 renamed to an-scheduler1001
  • 10:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:43 moritzm: rebooting netflow* hosts for kernel security update
  • 08:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:04 jayme: authdns-update for chartmuseum - T256970
  • 08:03 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:55 moritzm: installing mutt security updates for jessie (stretch/buster already fixed)
  • 07:44 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:47 moritzm: installing php5 security updates
  • 06:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:09 moritzm: rebooting mw1390-mw1419 for kernel security updates
  • 05:46 XioNoX: remove chassis redundancy failover from fasw-c-eqiad for consistency with all other VCs
  • 05:33 XioNoX: remove chassis redundancy failover from fasw-c-codfw for consistency with all other VCs

2020-07-02

  • 23:22 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:16 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:03 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 21:56 mutante: gerrit1001 (prod gerrit) - restarting gerrit service
  • 21:52 maryum: frwikibooks reindex sucessful, continuing on with remainder of french wikis
  • 21:32 mutante: gerrit - deleted gerrit db_pass from prod private repo, running puppet
  • 21:25 mutante: gerrit2001 - restarted gerrit
  • 21:14 mutante: gerrit1002 restarted gerrit
  • 20:20 maryum: reindexing frwikibooks to test https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/604221
  • 19:52 mutante: gerrit2001 - restarting gerrit after removing db_pass from config
  • 16:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:42 moritzm: rebooting mw1370-mw1389 for kernel security updates
  • 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:03 kormat: stopped mariadb@s8 on dbstore1005 for data restoration T256966
  • 12:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:31 moritzm: rebooting mw1349-mw1369 for kernel security updates
  • 12:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:27 vgutierrez: rolling restart of esams load balancers to catch up on kernel upgrades
  • 12:12 XioNoX: pre-configure asw2-b-eqiad<->cloudsw1-c8-eqiad - T251632
  • 12:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 vgutierrez: rolling restart of codfw load balancers to catch up on kernel upgrades
  • 11:18 akosiaris: preactively restart docker-registry on registry1001, registry1002 to force CA refresh
  • 11:16 akosiaris: restart docker-registry on registry2002 for CA refresh
  • 11:14 _joe_: restarting docker-registry on registry2001
  • 10:34 godog: move "cluster overview" dashboard to Thanos - T256954
  • 09:35 XioNoX: advertise codfw prefixes from eqord
  • 09:28 jayme: imported chartmuseum_0.12.0-2 to buster-wikimedia - T253843
  • 09:07 addshore: addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "DCausse_(WMF)" # T256949
  • 09:07 addshore: addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "Addshore" # T256949
  • 08:59 XioNoX: deploy flex flow for MX204s - T248394
  • 05:52 _joe_: removing all tags for envoy-tls-local-proxy
  • 05:46 _joe_: upload docker-report 0.0.4 on buster-wikimedia T242604
  • 04:32 eileen: process-control config revision is b4655897b5
  • 03:17 eileen: process-control config revision is 12fe6b5151
  • 03:15 eileen: tools revision changed from 4ea8567819 to e974147f27
  • 02:32 eileen: tools revision changed from e38f7a83d4 to 4ea8567819
  • 00:53 eileen: tools revision changed from 806e2b4412 to e38f7a83d4

2020-07-01

  • 23:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set $wgForceUIAsContentMsg for zhwikibooks, zhwikinews, zhwikiquote, zhwikisource, zhwikiversity, zhwiktionary (T256521) (duration: 00m 55s)
  • 23:35 ejegg: updated fundraising CiviCRM from 391d0fdf75 to a48caf0f37
  • 23:32 catrope@deploy1001: Synchronized static/images/project-logos/: Change Simplified Chinese logo for zhwiki (T256839) (duration: 00m 55s)
  • 23:18 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: Ibb42db7fd1ee (duration: 00m 55s)
  • 23:00 bstorm: set a short downtime on labstore1006/7 to prevent alert while disabling direct systemd monitoring
  • 22:37 krinkle@deploy1001: Synchronized php-1.35.0-wmf.39/includes/Title.php: I8d5bad (duration: 01m 00s)
  • 21:00 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:58 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:56 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:56 Krinkle: krinkle@deploy1001 Ran `scap deploy --init` for /srv/deployment/performance/arc-lamp
  • 20:55 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d7476f5]: Update mobileapps to 953fc41a (duration: 04m 08s)
  • 20:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d7476f5]: Update mobileapps to 953fc41a
  • 20:27 eileen: tools revision changed from 6f38c14fe3 to 806e2b4412 -
  • 20:11 eileen: tools revision changed from aab96444df to 6f38c14fe3
  • 19:23 twentyafterfour: 1.35.0-wmf.39 is now deployed to group2 wikis, everything appears to be normal. refs T254176
  • 19:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.39 refs T254176
  • 18:44 addshore@deploy1001: Synchronized wmf-config: REVERT T254315 Wikidata client wikis: Define entity sources configuration gerrit:569259 (duration: 01m 04s)
  • 18:41 addshore@deploy1001: sync-file aborted: T254315 Wikidata client wikis: Define entity sources configuration gerrit:569259 (duration: 00m 38s)
  • 18:38 joal@deploy1001: Finished deploy [analytics/refinery@8b7bddf] (thin): Regular analytics weekly train THIN [analytics/refinery@8b7bddf] (duration: 02m 19s)
  • 18:36 joal@deploy1001: Started deploy [analytics/refinery@8b7bddf] (thin): Regular analytics weekly train THIN [analytics/refinery@8b7bddf]
  • 18:35 joal@deploy1001: Finished deploy [analytics/refinery@8b7bddf]: Regular analytics weekly train [analytics/refinery@8b7bddf] (duration: 08m 09s)
  • 18:27 joal@deploy1001: Started deploy [analytics/refinery@8b7bddf]: Regular analytics weekly train [analytics/refinery@8b7bddf]
  • 18:25 joal@deploy1001: Finished deploy [analytics/refinery@114bfed]: Regular analytics weekly train [analytics/refinery@114bfed] (duration: 03m 41s)
  • 18:21 joal@deploy1001: Started deploy [analytics/refinery@114bfed]: Regular analytics weekly train [analytics/refinery@114bfed]
  • 18:18 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable kafka purges on wikitech gerrit:607590 IS-labs.php (duration: 01m 03s)
  • 18:07 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy MediaModeration on all production wikis gerrit:608753 (duration: 01m 07s)
  • 17:14 XioNoX: set flex-flow-sizing to cr2-eqsin - T248394
  • 16:57 XioNoX: restart cr2-eqsin for software upgrade - T243080
  • 16:00 XioNoX: updating eqsin LVS BGP neighbors IPs - T255766
  • 15:16 XioNoX: re0.cr1-eqsin> request system power-off both-routing-engines - T255766
  • 15:15 XioNoX: disable BGP to pybal on cr1-eqsin - T255766
  • 15:13 XioNoX: disable cr1-eqsin transit/peering BGP - T255766
  • 15:09 XioNoX: bump eqsin-codfw ospf link cost - T255766
  • 15:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 XioNoX: move vrrp master to cr2-eqsin - T255766
  • 15:00 XioNoX: depool eqsin for routers work - T255766
  • 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:37 hashar: contint1001 stopped zuul-merger for a test. started it again
  • 13:35 hashar: Restarting zuul-merger on contint2001 # T252310
  • 13:30 hashar@deploy1001: Finished deploy [zuul/deploy@00f69b3]: (no justification provided) (duration: 00m 08s)
  • 13:30 hashar@deploy1001: Started deploy [zuul/deploy@00f69b3]: (no justification provided)
  • 13:29 hashar@deploy1001: Finished deploy [zuul/deploy@00f69b3]: (no justification provided) (duration: 00m 32s)
  • 13:28 hashar@deploy1001: Started deploy [zuul/deploy@00f69b3]: (no justification provided)
  • 13:16 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.39 (duration: 01m 04s)
  • 13:15 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.39
  • 13:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:08 cdanis: βœ”οΈ cdanis@netflow2001.codfw.wmnet ~ πŸ•˜β˜• sudo apt remove valgrind libc6-dbg
  • 13:03 cdanis: T256790 βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜β˜• sudo cumin 'netflow[3-5]001*' 'systemctl restart nfacctd'
  • 12:58 cdanis: T256790 βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜β˜• sudo debdeploy deploy -u 2020-07-01-pmacct.yaml -s netflow
  • 12:55 cdanis: T256790 βœ”οΈ cdanis@apt1001.wikimedia.org ~ πŸ•˜β˜• sudo -E reprepro -C main include buster-wikimedia pmacct_1.7.2-3+wmf1_amd64.changes
  • 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:47 ema: A:cp upgrade librdkafka1 to 0.11.6-1.1wmf1 and restart purged, varnishkafka T256444
  • 11:46 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T254315 Wikidata: Define entity sources configuration gerrit:569258 (duration: 01m 06s)
  • 11:32 Lucas_WMDE: EU B&C window done
  • 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized w/touch.php: Config: Fully set MW_NO_SESSION for browser metadata endpoints, 4/4 (duration: 01m 06s)
  • 11:22 lucaswerkmeister-wmde@deploy1001: Synchronized w/robots.php: Config: Fully set MW_NO_SESSION for browser metadata endpoints, 3/4 (duration: 01m 03s)
  • 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized w/favicon.php: Config: Fully set MW_NO_SESSION for browser metadata endpoints, 2/4 (duration: 01m 04s)
  • 11:19 lucaswerkmeister-wmde@deploy1001: Synchronized w/extract2.php: Config: Fully set MW_NO_SESSION for browser metadata endpoints, 1/4 (duration: 01m 16s)
  • 11:07 Amir1: Changing datatype of several properties with mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php (T255241)
  • 11:07 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:02 ema: restbase2009 depooled T256863
  • 11:02 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2009.codfw.wmnet
  • 10:50 ema: power on restbase2009
  • 10:45 jayme: draining and docker restart (one at a time) kubernetes[1001-1004].eqiad.wmnet - T256786
  • 10:34 ema: power-cycle restbase2009
  • 10:17 XioNoX: renumber NTT transit links - T254877
  • 10:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:09 jayme: draining and docker restart (one at a time) kubernetes[2001-2004].codfw.wmnet
  • 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:46 jayme: cordoning kubernetes[2001-2004].codfw.wmnet,kubernetes[1001-1004].eqiad.wmnet - T256786
  • 09:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:23 jayme: restarting dockerd on kubestage1002.eqiad.wmnet - T256786
  • 09:15 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:53 jayme: draining kubernetes staging node kubestage1001.eqiad.wmnet - T256786
  • 08:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:29 XioNoX: disable BGP to nfacct in eqiad - T256790
  • 08:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:01 vgutierrez: rolling restart of esams cache nodes to catch up on kernel upgrades
  • 07:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:39 ema: cp2041: restart purged, varnishkafka after librdkafka1 upgrade to 0.11.6-1.1wmf1 T256444
  • 05:47 _joe_: restarting nfacctd on netflow1001, it's segfaulting
  • 04:01 krinkle@deploy1001: Synchronized php-1.35.0-wmf.39/maintenance/findBadBlobs.php: I47c11190b665 (duration: 01m 08s)
  • 00:14 krinkle@deploy1001: Synchronized private/PrivateSettings.php: T254795 - Set $wmgXhguiDBuser and $wmgXhguiDBpasswor (duration: 01m 06s)

2020-06-30

  • 21:48 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:45 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:43 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:42 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:40 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:40 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:38 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:38 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 21:38 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:19 hashar@deploy1001: rebuilt and synchronized wikiversions files: group 0 wikis to 1.35.0-wmf.39 # T254176
  • 18:31 cdanis: T256790 βœ”οΈ cdanis@netflow2001.codfw.wmnet ~ πŸ•β˜• sudo apt install valgrind
  • 18:27 tgr: Morning deploys done
  • 18:23 tgr@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/ElectronPdfService/src/ElectronPdfServiceHooks.php: Backport: Hotfix: "Undefined index: print" (T256761) (duration: 01m 05s)
  • 18:11 shdubsh: restart varnishmtail,atsmtail,ncredirmtail on ncredir,cp hosts in codfw and eqsin
  • 18:05 cdanis: installing libc6-dbg on netflow2001 T256790
  • 17:40 mdholloway: mobileapps deployments on k8s failing with timeouts; filed T256786
  • 17:37 cdanis: βœ”οΈ cdanis@netflow2001.codfw.wmnet ~ πŸ•œβ˜• sudo systemctl restart nfacctd
  • 17:33 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:18 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:17 papaul: uplugging msw-c3 power to relocate port on PDU
  • 17:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@f9df1af]: Update mobileapps to 5c7611b9 (duration: 03m 33s)
  • 17:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@f9df1af]: Update mobileapps to 5c7611b9
  • 16:57 cdanis: T256444 restarted purged on cp2030 and repooling
  • 16:48 cdanis: T256444 βœ”οΈ cdanis@cp2030.codfw.wmnet ~ πŸ•β˜• sudo depool
  • 15:54 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 - take 3 (duration: 00m 03s)
  • 15:54 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 - take 3
  • 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:16 otto@deploy1001: Finished deploy [analytics/refinery@1112749]: roll back to 1112749 on an-launcher1002, git-fat not pulling artifacts (duration: 01m 21s)
  • 15:14 otto@deploy1001: Started deploy [analytics/refinery@1112749]: roll back to 1112749 on an-launcher1002, git-fat not pulling artifacts
  • 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:10 moritzm: rebooting mwdebug* hosts for kernel security update
  • 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:59 moritzm: rebooting failoid hosts for kernel update
  • 14:49 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 - take 3 (duration: 00m 03s)
  • 14:49 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 - take 3
  • 14:47 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 - take 2 (duration: 00m 03s)
  • 14:47 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 - take 2
  • 14:44 hashar: Train blocked on Flow being broken: T256761 # T254176
  • 14:38 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.35.0-wmf.39" - T256759
  • 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.39
  • 14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:15 moritzm: rebooting miscweb servers for kernel security update
  • 14:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:10 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370 (duration: 01m 56s)
  • 14:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:09 hashar@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.39 (duration: 62m 30s)
  • 14:08 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for T256370
  • 14:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:37 moritzm: rebooting LDAP replicas for kernel security update
  • 13:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 hashar@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.39
  • 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 awight: EU BACON cooked
  • 11:32 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: Configure TeWΓΌ survey on dewiki (take 2) (T253112) (duration: 00m 58s)
  • 11:32 jayme: restarted docker-reporter-base-images and docker-reporter-releng-images on deneb - T253396
  • 11:31 jayme: pushed a scratch docker image as docker-registry.discovery.wmnet/envoy-tls-local-proxy:dontuseme - T253396
  • 11:28 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/QuickSurveys: BACON: Embedded surveys are hidden when no element is available (T256627) (duration: 00m 56s)
  • 11:26 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/FileImporter: BACON: Set Status error if permission check returns false. (T256428) (duration: 00m 58s)
  • 11:13 ema: deneb: systemctl restart docker-reporter-base-images.service
  • 10:59 ema: upload librdkafka 0.11.6-1.1wmf1 to buster-wikimedia https://phabricator.wikimedia.org/P11703 T256444
  • 10:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076', diff saved to https://phabricator.wikimedia.org/P11710 and previous config saved to /var/cache/conftool/dbconfig/20200630-105254-marostegui.json
  • 10:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:41 ema: cp2040: restart purged and varnishkafka to use updated librdkafka1 T256444
  • 10:38 ema: cp2040: upgrade librdkafka1 to 0.11.6-1.1wmf1 https://phabricator.wikimedia.org/P11703 T256444
  • 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:30 hashar@deploy1001: Synchronized php-1.35.0-wmf.39/includes/specials/SpecialUndelete.php: Remove another use of PageArchive::getRevision - T249982 T254176 (duration: 00m 56s)
  • 10:09 marostegui: Deploy schema change on db1076
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P11708 and previous config saved to /var/cache/conftool/dbconfig/20200630-100912-marostegui.json
  • 10:04 vgutierrez: rolling restart of eqiad cache nodes to catch up on kernel upgrades
  • 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 07s)
  • 10:02 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 09:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.37 (duration: 02m 20s)
  • 09:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:21 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.36 (duration: 28m 11s)
  • 08:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:53 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.35.0-wmf.36 (duration: 00m 00s)
  • 08:51 hashar: Applied security patches to wmf/1.35.0-wmf.39 # T254176
  • 08:51 vgutierrez: rolling restart of codfw cp nodes after "re-formatting" nvme devices - T256655
  • 08:23 vgutierrez: repool cp3053 - T256632
  • 08:10 hashar: 1.35.0-wmf.39 was branched at e169e3d T254176
  • 08:05 marostegui: Stop MySQL on db1117:3322 to clone db1080 (this will trigger haproxy alerts) - T256717
  • 08:05 vgutierrez: powercycle cp3053 (unresponsive after reboot) - T256632
  • 08:01 jbond42: disable puppet to restart puppetmasters front ends
  • 07:42 vgutierrez: reboot cp3053 - T256632
  • 05:51 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 05:13 marostegui: Deploy schema change on s8 codfw - T256680
  • 04:58 marostegui: remove pl_from index from db1141, db1121, db1148 - T256684
  • 04:57 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 04:56 marostegui: Remove plfrom from db1096:3316 and db1098:3316 - T256684

2020-06-29

  • 23:28 eileen: civicrm revision changed from 52a32f2d66 to 391d0fdf75, config revision is f1b4bdb7b7
  • 22:00 sbassett: Deployed patch for T256171
  • 21:56 sbassett: Deployed patch for T255918
  • 20:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315 T256679', diff saved to https://phabricator.wikimedia.org/P11699 and previous config saved to /var/cache/conftool/dbconfig/20200629-200002-marostegui.json
  • 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 T256679', diff saved to https://phabricator.wikimedia.org/P11698 and previous config saved to /var/cache/conftool/dbconfig/20200629-194327-marostegui.json
  • 18:55 shdubsh: test mtail rc35+wmf2 on cp5001 - T255776
  • 18:15 Urbanecm: Morning B&C done
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c86fcd4: Add HTTP proxy to MediaModeration (T247943) (duration: 00m 58s)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: aeb7b52: Setup rollbacker and mover on lijwiki (T256109) (duration: 02m 05s)
  • 17:30 sukhe: LDAP - added datn to groups wmde, nda - T254442
  • 15:43 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:43 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:37 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P11696 and previous config saved to /var/cache/conftool/dbconfig/20200629-153140-marostegui.json
  • 15:20 gehel: repool wdqs1004 - catched up on lag
  • 14:50 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Redeploy to fix transient error in gom wiktionary deploy (duration: 00m 06s)
  • 14:50 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Redeploy to fix transient error in gom wiktionary deploy
  • 14:48 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Enable gom wiktionary (duration: 13m 40s)
  • 14:34 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Enable gom wiktionary
  • 14:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Enable gom wiktionary (duration: 17m 49s)
  • 14:28 ema: A:cp rolling purged upgrade to 0.16 T256479
  • 14:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add "E" as an alias of EntitySchema namespace on wikidata (T245529) (duration: 00m 57s)
  • 14:20 ema: upload purged 0.16 to apt.wm.org T256479
  • 14:16 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Enable gom wiktionary
  • 14:14 hnowlan@deploy1001: Finished deploy [restbase/deploy@ce5177e]: Enable gom wiktionary (duration: 20m 44s)
  • 14:02 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Fix 'closed-labs' reading as 'closed' for static config (duration: 00m 56s)
  • 13:54 jforrester@deploy1001: Synchronized dblists/: Drop nonbetafeatures dblist, unused (duration: 00m 57s)
  • 13:54 hnowlan@deploy1001: Started deploy [restbase/deploy@ce5177e]: Enable gom wiktionary
  • 13:50 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Drop 'nonbetafeatures' dblist from production reads (duration: 00m 56s)
  • 13:49 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch uses from nonbetafeatures to lockeddown (duration: 00m 57s)
  • 13:47 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Add 'lockeddown' dblist to production reads (duration: 00m 57s)
  • 13:43 jforrester@deploy1001: Synchronized dblists/lockeddown.dblist: Add lockddown dblist (unused as yet) (duration: 00m 59s)
  • 13:35 vgutierrez: depool cp3053 due to nvme hardware issues
  • 13:02 XioNoX: test pfw3-codfw uplinks failover
  • 13:00 elukey: move archiva.wikimedia.org to archiva1002 (new buster vm); create archiva-old.wikimedia.org to archiva1001
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P11693 and previous config saved to /var/cache/conftool/dbconfig/20200629-125824-marostegui.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085', diff saved to https://phabricator.wikimedia.org/P11692 and previous config saved to /var/cache/conftool/dbconfig/20200629-125630-marostegui.json
  • 12:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:32 jayme: deleted all tags for docker-registry.wikimedia.org/envoy-tls-local-proxy from docker registry - T253396
  • 12:20 marostegui: Stop MySQL on db2096 (codfw x1 master) for reimage T254871
  • 12:03 cdanis: re-pool eqiad T256512
  • 11:59 cdanis: deployed I132075ee on cr1-eqiad T256512
  • 11:58 cdanis: deployed I132075ee on cr2-eqiad T256512
  • 11:58 cdanis: deployed I132075ee on cr2-eqiad
  • 11:41 cdanis: depool eqiad T256512
  • 11:15 awight: EU BACON cooked
  • 11:08 marostegui: Deploy schema change on db1095:3312 (lag will show up)
  • 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (608284) (duration: 00m 57s)
  • 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (608284) (duration: 00m 58s)
  • 10:29 gehel: restart blazegraph on wdqs1004 + depool to catchup on lag
  • 09:59 ema: cp2040: upgrade purged to 0.16 T256479
  • 09:59 jbond42: switch idp to memcached
  • 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:45 marostegui: Deploy schema change on dbstore1004:3312
  • 09:11 jbond42: dploying shellcheck CI https://gerrit.wikimedia.org/r/c/operations/puppet/+/602693
  • 08:59 marostegui: Compress InnoDB on db1089 (this will cause lag and will take a few days) - T254462
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for InnoDB compression T254462', diff saved to https://phabricator.wikimedia.org/P11690 and previous config saved to /var/cache/conftool/dbconfig/20200629-085854-marostegui.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11688 and previous config saved to /var/cache/conftool/dbconfig/20200629-084827-marostegui.json
  • 08:40 ema: cp2034: restart purged T256444
  • 08:36 ema: cp4025: restart purged T256444
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11687 and previous config saved to /var/cache/conftool/dbconfig/20200629-083631-marostegui.json
  • 08:33 ema: cp1087, cp2033, cp2037, cp2039: repool after spending (way) more than 24h depooled T256444
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11686 and previous config saved to /var/cache/conftool/dbconfig/20200629-082635-marostegui.json
  • 08:24 marostegui: Deploy schema change on s2 codfw (lag will show up) T253276
  • 08:04 XioNoX: add term selected-paths to policy BGP_IXP_in on all routers
  • 08:03 godog: prometheus eqiad -- lvextend --resizefs --size +200G vg-ssd/prometheus-ops
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11685 and previous config saved to /var/cache/conftool/dbconfig/20200629-080253-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1135 (depooled) to s1 T253217', diff saved to https://phabricator.wikimedia.org/P11684 and previous config saved to /var/cache/conftool/dbconfig/20200629-074611-marostegui.json
  • 07:16 XioNoX: push new pfw firewall rules - T256170
  • 07:13 marostegui: Deploy schema change on db1085 with replication to labs T253276
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P11683 and previous config saved to /var/cache/conftool/dbconfig/20200629-071236-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1080 from MW', diff saved to https://phabricator.wikimedia.org/P11682 and previous config saved to /var/cache/conftool/dbconfig/20200629-065335-marostegui.json
  • 06:50 elukey: execute gnt-instance remove an-launcher1001.eqiad.wmnet on ganeti1011 - T256363
  • 06:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:46 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:45 marostegui: Deploy MCR schema change on db1090:3312
  • 06:35 elukey: force puppet run on ores* to overcome celery OOMs on some nodes
  • 04:57 marostegui: Stop MySQL on db1080 to clone db1135 T253217
  • 04:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime

2020-06-28

  • 21:43 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op I56eb4a802 (duration: 00m 58s)
  • 21:38 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta-only I56eb4a802 (duration: 01m 00s)

2020-06-27

  • 20:22 qchris: Gerrit upgrade done.
  • 19:49 mutante: removed 2620:0:861:3:208:80:154:136 from /etc/network/interfaces on gerrit1001, rebooting
  • 19:27 mutante: rebooting gerrit1001 one more time
  • 19:24 mutante: restarted ferm on gerrit1001
  • 19:19 mutante: rebooting gerrit1001 one more time
  • 19:05 mutante: rebooting gerrit1001
  • 18:58 mutante: rebooting gerrit2001
  • 18:49 hashar: Enabling beta cluster update job (gerrit maintenance) https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/
  • 18:35 qchris@deploy1001: Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit2001 (duration: 00m 10s)
  • 18:34 qchris@deploy1001: Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit2001
  • 18:27 qchris@deploy1001: Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1001 (duration: 00m 08s)
  • 18:27 qchris@deploy1001: Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1001
  • 17:25 hashar: Disabled beta cluster update job (gerrit maintenance) https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/
  • 17:19 qchris: Stopping gerrit on gerrit1001 for the Gerrit upgrade
  • 17:14 qchris: Duplicating reviewdb changes so we get a cheap and quick rollback
  • 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:11 qchris: Disabling puppet on gerrit1001 for Gerrit upgrades + data migrations
  • 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:07 qchris: Starting Gerrit upgrade to v3.2.2-98-g98d827eaa3
  • 15:44 qchris@deploy1001: Finished deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1002 (gerrit-test) (duration: 00m 08s)
  • 15:44 qchris@deploy1001: Started deploy [gerrit/gerrit@da40615]: Gerrit to v3.2.2-98-g98d827eaa3 on gerrit1002 (gerrit-test)
  • 13:03 qchris@deploy1001: Finished deploy [gerrit/gerrit@460e439]: Gerrit to v3.2.2-97-gcaf5020db1 on gerrit1002 (gerrit-test) (duration: 00m 08s)
  • 13:03 qchris@deploy1001: Started deploy [gerrit/gerrit@460e439]: Gerrit to v3.2.2-97-gcaf5020db1 on gerrit1002 (gerrit-test)

2020-06-26

  • 18:42 robh: all ulsfo onsite work completed as of 30 minutes ago
  • 17:52 robh: msw2-ulsfo work done, all mgmt items confirmed back online and icinga alerts cleared, moving onto msw1-ulsfo (rack 22) and will lose all mgmt in that rack for next 10-20 minutes T256300
  • 17:52 robh: msw2-ulsfo work done, all mgmt items confirmed back online and icinga alerts cleared, moving onto msw1-ulsfo (rack 22) and will lose all mgmt in that rack for next 10-20 minutes
  • 17:11 robh: msw work in ulsfo via T256300
  • 10:24 ema: pool 5006 T256449
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085', diff saved to https://phabricator.wikimedia.org/P11677 and previous config saved to /var/cache/conftool/dbconfig/20200626-102248-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093', diff saved to https://phabricator.wikimedia.org/P11676 and previous config saved to /var/cache/conftool/dbconfig/20200626-102201-marostegui.json
  • 10:03 ema: cp2039: restart purged T256444
  • 09:57 ema: cp2037: restart purged T256444
  • 09:55 ema: cp1087: restart purged T256444
  • 09:46 ema: cp2033: restart purged T256444
  • 09:38 akosiaris: move the sessionstore eqiad pods back to the dedicated sessionstore nodes
  • 09:37 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 09:35 akosiaris: move the sessionstore codfw pods back to the dedicated sessionstore nodes
  • 09:35 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P11675 and previous config saved to /var/cache/conftool/dbconfig/20200626-090813-marostegui.json
  • 08:58 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:56 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088', diff saved to https://phabricator.wikimedia.org/P11674 and previous config saved to /var/cache/conftool/dbconfig/20200626-083319-marostegui.json
  • 08:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P11673 and previous config saved to /var/cache/conftool/dbconfig/20200626-082242-marostegui.json
  • 08:20 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:20 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes.*.wmnet
  • 08:04 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes.*.wmnet
  • 08:04 akosiaris: pool all new kubernetes nodes in LVS T252185 T256236
  • 07:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:44 volans: force rebooted cp5006 that is unresponsive (after having depooled it) - T256449
  • 07:42 volans@cumin1001: conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet
  • 06:40 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: add cache-cookies log channel (duration: 00m 59s)
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3312, db2104', diff saved to https://phabricator.wikimedia.org/P11672 and previous config saved to /var/cache/conftool/dbconfig/20200626-051328-marostegui.json
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:01 cdanis: re-enable puppet on cps
  • 03:54 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•›πŸΊ sudo cumin A:cp 'disable-puppet "I39e1c68a is broken"'
  • 03:54 cdanis: https://gerrit.wikimedia.org/r/c/operations/puppet/+/607917
  • 02:52 tstarling@deploy1001: Synchronized private/PrivateSettings.php: updating wgAuthenticationTokenVersion per my wikitech-l post (duration: 00m 57s)
  • 02:19 cdanis: three more hosts not processing purges for multiple days βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•₯🍺 sudo cumin 'cp2033*,cp2037*,cp2039*' 'depool'
  • 02:17 cdanis: depooling cp1087 which has not been processing purges for 11.415 days
  • 01:53 cdanis: I6cc5f3e6 has been deployed to all cp text nodes T256395
  • 01:41 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜πŸΊ sudo cumin A:cp 'enable-puppet "cdanis deploying I6cc5f3e6 T256395"'
  • 01:13 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜πŸΊ sudo cumin A:cp 'disable-puppet "cdanis deploying I6cc5f3e6 T256395"'
  • 00:41 eileen: tools revision changed from c96813eda4 to aab96444df
  • 00:38 tstarling@deploy1001: Synchronized w/T256395-cookie-test.php: (no justification provided) (duration: 00m 56s)
  • 00:36 tstarling@deploy1001: Synchronized w/T256395-cookie-test.php: (no justification provided) (duration: 00m 58s)

2020-06-25

  • 23:37 mutante: puppetmaster - signing certs and initial puppet run for logstash1030/logstash1031 - no prod role yet
  • 22:25 mutante: puppetmaster - signing certs and initial run for logstash2030/2031 - no prod role yet
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 19:30 dcausse: repooling wdqs1007.eqiad.wmnet
  • 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.38
  • 18:58 mutante: LDAP - added qchris to archiva-deployers (T256404)
  • 17:37 mutante: mwmaint1002 - restarted apache2 to add server_headers snippet for T255629 - but not working as expected yet
  • 16:40 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:31 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:31 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:28 krinkle@deploy1001: Synchronized wmf-config/logging.php: Ia6ef7617d378 (duration: 01m 02s)
  • 16:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 16:16 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:15 Krinkle: I've deleted a "saved object" visualisation in logstash called "Production Errors & Deployments" which seemed to be corrupt and redirect random logstash dashboards to a management page. Backed up at https://phabricator.wikimedia.org/P11666 (NDA)
  • 16:15 moritzm: installing libxml2 security updates
  • 16:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:06 moritzm: installing 4.9.210-1+deb9u1~deb8u1 on jessie hosts (fixed kernel for recent cacheoutattack CPU leaks)
  • 16:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 16:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:55 krinkle@deploy1001: Synchronized wmf-config/logging.php: I4c519f (duration: 01m 05s)
  • 15:54 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:51 vgutierrez: upgrade ATS in eqiad to version 8.0.8
  • 15:42 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, more groups (duration: 05m 09s)
  • 15:37 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, more groups
  • 15:37 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, more groups (duration: 03m 38s)
  • 15:33 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, more groups
  • 15:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, more groups (duration: 03m 24s)
  • 15:30 vgutierrez: upgrade ATS in codfw to version 8.0.8
  • 15:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:30 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, more groups
  • 15:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, take 2 (duration: 06m 38s)
  • 15:29 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:25 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: structured logging for xff log, stop logging jobrunner requests (duration: 01m 05s)
  • 15:23 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358, take 2
  • 15:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358 (duration: 01m 37s)
  • 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@821e96b]: Only emit vary: accept-language for feeds when it matters T256358
  • 14:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:43 vgutierrez: upgrade ATS in esams to version 8.0.8
  • 14:29 papaul: replacing mr1-codfw
  • 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:19 vgutierrez: upgrade ATS in eqsin to version 8.0.8
  • 14:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:05 marostegui: Stop MySQL on db2104 and db2088:3312
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104', diff saved to https://phabricator.wikimedia.org/P11664 and previous config saved to /var/cache/conftool/dbconfig/20200625-140519-marostegui.json
  • 14:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:04 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db2088:3312', diff saved to https://phabricator.wikimedia.org/P11663 and previous config saved to /var/cache/conftool/dbconfig/20200625-140421-marostegui.json
  • 13:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:57 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T254301 Remove OAuthReplaceMessage hook subscriber (duration: 01m 05s)
  • 13:56 vgutierrez: upgrade ATS in ulsfo to version 8.0.8
  • 13:51 vgutierrez: upload trafficserver 8.0.8 to apt.wm.o (buster)
  • 13:51 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Replace PasswordNotInLargeBlacklist with PasswordNotInCommonList (duration: 01m 05s)
  • 13:49 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Replace PasswordNotInLargeBlacklist with PasswordNotInCommonList (duration: 01m 06s)
  • 13:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:28 godog: bounce logstash on logstash1007
  • 13:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:02 moritzm: installing 4.9.210-1+deb9u1~deb8u1 on jessie hosts (fixed kernel for recent cacheoutattack CPU leaks)
  • 12:55 elukey: rename notebook1003 to an-launcher1002 - T256363
  • 12:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:42 moritzm: installing libmspack security updates
  • 12:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:32 moritzm: installing libssh2 security updates
  • 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:26 moritzm: installing libjpeg-turbo security updates
  • 12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:55 moritzm: installing python3.4 security updates
  • 11:55 awight: EU BACON is cooked
  • 11:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:50 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: Enable QuickSurveys on metawiki (T253112) (duration: 01m 05s)
  • 11:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:38 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: Enable WMDE Tech Wishes survey configuration (T253112) (duration: 01m 09s)
  • 11:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:27 moritzm: rolling reboot of ms-be[1044-1059].eqiad.wmnet
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:45 moritzm: rolling reboot of ms-be[2044-2056]
  • 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 10:04 akosiaris: poweroff kubestagetcd1004 and ganeti1005 for T244530
  • 10:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:57 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 09:57 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:34 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:28 akosiaris: schedule downtime for eqiad wikifeeds as it's flapping too much without yet knowing why. T256358
  • 09:28 godog: extend lv on thanos-fe2001 and restart thanos-compact
  • 09:21 vgutierrez: rolling restart of ncredir instances to catch up on kernel updates
  • 09:13 joal@deploy1001: Finished deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] (duration: 00m 10s)
  • 09:13 joal@deploy1001: Started deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370]
  • 09:13 joal@deploy1001: Finished deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] (duration: 16m 27s)
  • 09:01 vgutierrez: restarting acme-chief instances to catch up on kernel updates
  • 08:56 joal@deploy1001: Started deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370]
  • 08:42 hashar: releases2002: restarted bacula-fd to take in account the puppet provided configuration # T247652
  • 08:14 jynus: restarting bacula-dir on backup1001
  • 08:09 akosiaris: restart etherpad-lite on etherpad1002
  • 08:03 marostegui: Failover m1 from db1135 to db1097 - T254556
  • 07:52 jynus: stop bacula-director on backup1001 for db maintenance T254556
  • 07:49 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 07:49 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 07:49 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 07:49 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 07:49 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 07:48 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 07:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 07:47 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm
  • 07:36 elukey: reboot an-launcher1001 for kernel upgrades
  • 07:18 elukey: reboot kafkamon* vms for kernel upgrades
  • 07:08 marostegui: Start pre switchover steps on m1 T254556
  • 06:40 elukey: reboot matomo1002 for kernel upgrades
  • 06:35 elukey: reboot archiva1002 (new vm, not yet in service) for kernel upgrades
  • 06:34 elukey: reboot archiva for kernel upgrades
  • 06:31 elukey: force puppet run on ores1003/1005 to restore celery (killed by the oom)
  • 06:24 elukey: reboot an-tool* vms for kernel upgrades
  • 06:23 elukey: reboot analytics-tool1004 for kernel upgrades (Superset host)
  • 06:22 elukey: reboot analytics-tool1001 for kernel upgrades
  • 06:19 elukey: execute ip addr flush ens5 on an-airflow1001 to clear RTNETLINK answers: File exists (error from ifup@ens5.service)
  • 06:03 elukey: reboot an-airflow1001 for kernel upgrades
  • 04:26 marostegui: Remove triggers from db2095:3312 - T238966
  • 04:25 marostegui: Deploy schema change on s2 codfw - T238966
  • 00:48 twentyafterfour: restart php-fpm on phab1001 to fix T256343
  • 00:12 twentyafterfour: phabricator updated, all seems normal
  • 00:11 twentyafterfour: updating phabricator to release/2020-06-25/1, momentary (<1 minute) downtime expected.

2020-06-24

  • 23:44 mutante: releases2002 - systemctl stop jenkins, kill 15244 (rogue jenkins process), start jenkins with systemctl start jenkins (T247652)
  • 23:43 mutante: releases1002 - kill rogue jenkins process, start jenkins with systemctl start jenkins (T247652)
  • 23:02 mutante: releases1002/2002 - disabling puppet, removing failing cron job to pull deployment_charts (because /srv/deployment-charts does not exist yet)
  • 21:45 shdubsh: install mtail 3.0.0~rc35+wmf2 on logstash1007 - T255776
  • 20:42 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.38 (duration: 01m 06s)
  • 20:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.38
  • 20:41 brennen: train 1.35.0-wmf.38: attempting to roll forward to group1 after php-fpm restart on mw1287 (T256305, T254175)
  • 20:32 cdanis: restarting php-fpm on mw1287 T256305
  • 20:32 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:30 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:28 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:14 halfak@deploy1001: Finished deploy [ores/deploy@1b87365]: T254505 (duration: 14m 08s)
  • 20:09 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@80c763d]: Update mobileapps to a413db4f (duration: 03m 37s)
  • 20:06 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@80c763d]: Update mobileapps to a413db4f
  • 20:00 halfak@deploy1001: Started deploy [ores/deploy@1b87365]: T254505
  • 19:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert Migrate SearchSatisfaction from EventLogging to EventGate on group1 - T249261 (duration: 01m 06s)
  • 19:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.37
  • 19:11 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.38 (duration: 01m 04s)
  • 19:10 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.38
  • 19:01 brennen: train 1.35.0-wmf.38: finished triage meeting, clear to proceed to group 1 (T254175)
  • 18:53 joal@deploy1001: Finished deploy [analytics/refinery@1112749] (thin): Regular analytics weekly train THIN [analytics/refinery@1112749] (duration: 00m 09s)
  • 18:53 joal@deploy1001: Started deploy [analytics/refinery@1112749] (thin): Regular analytics weekly train THIN [analytics/refinery@1112749]
  • 18:53 joal@deploy1001: Finished deploy [analytics/refinery@1112749]: Regular analytics weekly train [analytics/refinery@1112749] (duration: 05m 50s)
  • 18:49 Urbanecm: Morning B&C deploy window is done
  • 18:48 cstone: payments-wiki revision changed from 28ad76dcd7 to 91852dbc9b
  • 18:47 Urbanecm: mwscript namespaceDupes.php --wiki=guwiki --fix (T255358)
  • 18:47 joal@deploy1001: Started deploy [analytics/refinery@1112749]: Regular analytics weekly train [analytics/refinery@1112749]
  • 18:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2a1dfc5: Set namespace aliases for guwiki (T255358) (duration: 01m 05s)
  • 18:42 Urbanecm: mwscript namespaceDupes.php --wiki=banwiki --add-prefix=T255941 --fix (T255941)
  • 18:41 Urbanecm: Run mwscript namespaceDupes.php --wiki=banwiki --fix (T255941)
  • 18:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c6d6c85: Set WP as a NS_PROJECT alias for banwiki (T255941) (duration: 01m 06s)
  • 18:38 Urbanecm: Run mwscript namespaceDupes.php dewiktionary --fix (T256242)
  • 18:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2b93e0f: Define Rekonstruktion NS for dewiktionary (T256242) (duration: 01m 05s)
  • 18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: dea9214: Revert "IS: Cleanup some redundant rows." (T256279) (duration: 01m 05s)
  • 18:25 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventBus: Emit kafka purges for everything gerrit:607298 (duration: 01m 05s)
  • 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MediaModeration on group0 gerrit:607327 (duration: 01m 04s)
  • 18:08 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable click tracking in Vector on beta cluster gerrit:607136 IS.php (duration: 01m 05s)
  • 18:06 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable click tracking in Vector on beta cluster gerrit:607136 IS-labs.php (duration: 01m 07s)
  • 17:31 elukey: update archiva-ci user's password in Jenkins credentials plugin
  • 16:56 elukey: update archiva-deploy user's password in Jenkins credentials plugin
  • 16:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, feeds timed out, redo (duration: 05m 11s)
  • 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, feeds timed out, redo
  • 16:40 ppchelko@deploy1001: Finished deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, take 2 (duration: 14m 11s)
  • 16:34 brennen@deploy1001: Finished scap: (no justification provided) (duration: 60m 22s)
  • 16:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:26 ppchelko@deploy1001: Started deploy [restbase/deploy@5f08f32]: Release PCS endpoints updates, take 2
  • 16:17 elukey: reimage db1108 to debian Buster - T234826
  • 15:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@386b736]: Revert (duration: 27m 21s)
  • 15:38 brennen: previous scap sync for T256151 - gerrit:607379 and gerrit:607380
  • 15:36 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 100% into s6 T255927', diff saved to https://phabricator.wikimedia.org/P11652 and previous config saved to /var/cache/conftool/dbconfig/20200624-153604-kormat.json
  • 15:34 brennen@deploy1001: Started scap: (no justification provided)
  • 15:25 ppchelko@deploy1001: Started deploy [restbase/deploy@386b736]: Revert
  • 15:24 ppchelko@deploy1001: deploy aborted: Release updates to PCS endpoints (duration: 05m 04s)
  • 15:20 jayme: rolling restart of swift-proxy on thanos-fe[2001-2003].codfw.wmnet,thanos-fe[1001-1003].eqiad.wmnet - T256020
  • 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@9686627]: Release updates to PCS endpoints
  • 15:06 brennen: merging backports and running a full scap sync for UBN at T256151
  • 15:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:57 moritzm: rebooting deneb for kernel update
  • 14:57 ema: rmlist teampractices T255525
  • 14:42 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SearchSatisfaction from EventLogging to EventGate on group0 - T249261 (duration: 01m 06s)
  • 13:28 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: [config] 603167 Remove TranslationNotifications user settings 1/2 (2nd attempt, now with correct file) (duration: 01m 06s)
  • 13:23 marostegui: Deploy schema change on s6 eqiad primary master - T238966
  • 12:59 jbond42: update metamonitoring to use icinga-extmon.wikimedia.org
  • 12:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1005.eqiad.wmnet
  • 12:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1006.eqiad.wmnet
  • 12:19 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1006.eqiad.wmnet
  • 12:19 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1005.eqiad.wmnet
  • 12:19 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2005.codfw.wmnet
  • 12:19 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2006.codfw.wmnet
  • 12:17 akosiaris: depool/drain/reboot/pool kubernetes1005,6 for CPU capacity increase T256236
  • 12:14 akosiaris: reboot kubernetes2005,6 for CPU capacity increase T256236
  • 12:11 akosiaris: depool kubernetes2005,kubernetes2006 for CPU capacity increase T256236
  • 12:10 akosiaris: depool kubernetes2005,kubernetes2006 for CPU capacity increase
  • 12:05 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2006.codfw.wmnet
  • 12:05 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2005.codfw.wmnet
  • 12:04 awight: EU vegan BACON cooked
  • 12:03 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/GrowthExperiments: BACON: Help panel home screen menu item fixes (T255254) (duration: 01m 06s)
  • 11:40 nikerabbit@deploy1001: Synchronized private/PrivateSettings.php: Remove TranslationNotifications user settings 3/2 (duration: 01m 06s)
  • 11:35 nikerabbit@deploy1001: Synchronized private/readme.php: [config] 607414 Remove TranslationNotifications user settings 2/2 (duration: 01m 04s)
  • 11:28 nikerabbit@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [config] 603167 Remove TranslationNotifications user settings 1/2 (duration: 01m 03s)
  • 11:09 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: BACON: TwoColConflict: Talk page small deployment CommonSettings.php (T254458) (duration: 01m 17s)
  • 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:38 marostegui: Stop haproxy on dbproxy1003 T256216
  • 10:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:01 volans: Production management IP allocation must be done from Netbox from now on, see https://wikitech.wikimedia.org/wiki/DNS/Netbox#Cutoff_dates
  • 09:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 75% into s6 T255927', diff saved to https://phabricator.wikimedia.org/P11648 and previous config saved to /var/cache/conftool/dbconfig/20200624-095338-kormat.json
  • 09:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:36 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 50% into s6 T255927', diff saved to https://phabricator.wikimedia.org/P11647 and previous config saved to /var/cache/conftool/dbconfig/20200624-093624-kormat.json
  • 09:13 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:40 moritzm: prune remaining nginx packages on mw* servers T255565
  • 08:31 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1088 @ 20% into s6 T255927', diff saved to https://phabricator.wikimedia.org/P11645 and previous config saved to /var/cache/conftool/dbconfig/20200624-083120-kormat.json
  • 08:06 moritzm: re-enable puppet in eqiad
  • 08:04 marostegui@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:04 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:00 moritzm: disable puppet in eqiad to unblock puppetdb1002 VM migration
  • 07:22 gehel: restarting blazegraph on wdqs1007
  • 06:53 moritzm: draining ganeti1009 for eventual reboot
  • 06:28 XioNoX: enable peering BGP sessions on AMS-IX - T253970
  • 05:59 XioNoX: disable peering BGP sessions on AMS-IX - T253970
  • 05:34 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:33 marostegui@cumin2001: START - Cookbook sre.hosts.decommission
  • 05:14 marostegui: Remove grants from dbproxy1008 - T231280 T255406
  • 05:03 marostegui: Remove revision triggers from db1125:Β·3316
  • 05:02 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1085 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P11643 and previous config saved to /var/cache/conftool/dbconfig/20200624-050235-marostegui.json
  • 04:53 marostegui: Reload haproxy on dbproxy1012 and dbproxy1014
  • 00:35 ejegg: restarted fundraising jobs on main CiviCRM box
  • 00:33 ejegg: updated Fundraising CiviCRM from f01b036128 to 52a32f2d66

2020-06-23

  • 23:16 wkandek: releases1002 is back after being moved to row D (T255590)
  • 23:11 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:35 ejegg: disabled fundraising jobs on civi1001 for testing on civi2001
  • 22:24 wkandek@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:13 AndyRussG: updated payments-wiki from 5fd4eb1519 to 28ad76dcd7
  • 22:06 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:23 wkandek@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:23 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:22 wkandek@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 21:22 wkandek@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 21:15 wkandek@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:14 wkandek@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate TemplateWizard from EventLogging to EventGate on all wikis - take 2 - T238230 (duration: 01m 06s)
  • 19:16 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate TemplateWizard from EventLogging to EventGate on all wikis - T238230 (duration: 01m 05s)
  • 19:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.38
  • 18:55 mutante: gerrit1001 (prod) - restarting gerrit service to verify config changes
  • 18:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate TemplateWizard from EventLogging to EventGate on group0 - T238230 (duration: 01m 06s)
  • 18:24 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T254925 T246489 (duration: 01m 06s)
  • 18:04 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.38 (duration: 85m 53s)
  • 16:39 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.38
  • 16:01 brennen: 1.35.0-wmf.38 was branched at a35f7318 for https://phabricator.wikimedia.org/T254175
  • 15:47 moritzm: prune nginx packages on mwdebug hosts T255565
  • 15:37 moritzm: prune nginx packages on mw1380-mw1412 T255565
  • 15:28 moritzm: installing libvpx security updates
  • 15:27 mutante: removing ganeti VM xhgui1001 from eqiad row_A, will recreate in another row for rebalancing VMs between rows (T180761 T238098)
  • 15:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:18 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:12 mutante: removing ganeti VM releases1002 in eqiad row_A - will recreate in another row to re-balance (T255590)
  • 15:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:56 moritzm: failover ganeti master in eqiad to ganeti1011
  • 14:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:48 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: T250887 (duration: 00m 58s)
  • 14:08 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@db7fd80]: Update recommendation-api to 7e00177 (duration: 03m 13s)
  • 14:05 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@db7fd80]: Update recommendation-api to 7e00177
  • 13:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:54 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:34 moritzm: draining ganeti1012 for eventual reboot
  • 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:56 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:54 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:45 moritzm: draining ganeti1011 for eventual reboot
  • 12:45 marostegui: Deploy schema change on s6 codfw master (lag will appear on codfw) - T253276
  • 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:35 awight: EU BACON cooked
  • 11:34 awight@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/TwoColConflict/: BACON: Fix broken copy link in JS mode (T253724) (duration: 00m 57s)
  • 11:07 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: test commons: Use the database name in the Wikibase entity source config (duration: 00m 59s)
  • 11:04 moritzm: draining ganeti1008 for eventual reboot
  • 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:38 moritzm: temporarily shutdown xhgui1001/releases1002 to reshuffle Ganeti instances for reboots
  • 10:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:22 kormat: reimaging db1088 to buster T250666
  • 10:03 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:01 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:48 jbond42: add new CI check for cloud yaml data https://gerrit.wikimedia.org/r/c/operations/puppet/+/606444/
  • 09:46 jynus: stopping and reimaging db2101 into buster T254871
  • 09:32 marostegui: Reload haproxy on dbproxy1012 and dbproxy1014 to test db1097 as secondary for 24h T254556
  • 08:46 ema: mwmaint1002: add uid=abban,ou=people,dc=wikimedia,dc=org to group 'nda' T255775
  • 08:38 XioNoX: re-enable peering BGP sessions on AMS-IX - T253970
  • 08:03 moritzm: draining ganeti1007 for eventual reboot
  • 07:58 XioNoX: restart scs-a8-eqiad - T256101
  • 07:51 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:49 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:42 marostegui: Deploy schema change on db1088
  • 07:30 marostegui: Reimage db2133 (m2 codfw master) to Buster (this will trigger haproxy IRC alert) T250666
  • 07:01 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1118', diff saved to https://phabricator.wikimedia.org/P11637 and previous config saved to /var/cache/conftool/dbconfig/20200623-070120-marostegui.json
  • 06:06 XioNoX: disable peering BGP sessions on AMS-IX - T253970
  • 05:24 marostegui: Compress InnoDB on db1080 T254462
  • 05:23 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1080 for InnoDB compression', diff saved to https://phabricator.wikimedia.org/P11636 and previous config saved to /var/cache/conftool/dbconfig/20200623-052350-marostegui.json
  • 05:22 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P11635 and previous config saved to /var/cache/conftool/dbconfig/20200623-052254-marostegui.json
  • 05:12 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P11634 and previous config saved to /var/cache/conftool/dbconfig/20200623-051159-marostegui.json
  • 05:03 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P11633 and previous config saved to /var/cache/conftool/dbconfig/20200623-050314-marostegui.json

2020-06-22

  • 23:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: touch for T247330 (duration: 00m 56s)
  • 23:36 catrope@deploy1001: Synchronized dblists/: Close trwikinews (T247330) (duration: 00m 58s)
  • 23:28 RoanKattouw: Synchronized wmf-config/InitialiseSettings.php: Create rollbacker group on elwiktionary (T255569) (typoed the task number before)
  • 23:26 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create rollbacker group on elwiktionary (T225569) (duration: 00m 56s)
  • 23:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add localized sitename for bewikibooks (T253962) (duration: 00m 57s)
  • 23:16 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add domains to wgCopyUploadsDomains (T255336, T255363, T255386, T255313) (duration: 01m 01s)
  • 22:39 bstorm_: downtimed labstore1005 to prevent an alert during puppet merge T253353
  • 22:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:35 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 22:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@f2002c8]: bump glent jar to 0.2.2 (duration: 00m 56s)
  • 22:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@f2002c8]: bump glent jar to 0.2.2
  • 22:12 volans: cleanup interfaces and addresses in Netbox for offline servers - T233183
  • 21:59 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6e7f9f7]: bump glent jar to 0.2.2 (duration: 00m 18s)
  • 21:58 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6e7f9f7]: bump glent jar to 0.2.2
  • 17:19 mutante: gerrit1002 - let puppet remove [database] secttion from config; restart gerrit another time
  • 17:14 mutante: gerrit1002 (gerrit-test): re-enabled puppet, restarted gerrit service
  • 16:58 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:49 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:48 moritzm: installing mutt security updates
  • 14:47 Amir1: creating shnwiktionary is done
  • 14:44 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 58s)
  • 14:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:41 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating shnwiktionary (T253029) (duration: 00m 56s)
  • 14:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating shnwiktionary (T253029) (duration: 00m 56s)
  • 14:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:37 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating shnwiktionary (T253029)
  • 14:36 ladsgroup@deploy1001: Synchronized dblists: Creating shnwiktionary (T253029) (duration: 00m 58s)
  • 14:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:59 moritzm: re-enabling Puppet in codfw
  • 13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:51 moritzm: disable Puppet in codfw to reduce puppetdb2002 memory activity, unblocking the migration of the Ganeti instance for a reboot
  • 13:19 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump eventlogging_Test schema version to 1.1.0 to pick up client_dt and set wgEventLoggingServiceUri for all wikis - T238230 (duration: 00m 58s)
  • 13:11 marostegui: Stop MySQL on db2078 instances
  • 12:53 vgutierrez: upgrade to trafficserver 8.0.8~rc0-1wm1 on cp5006 and cp5012
  • 12:45 moritzm: draining ganeti2007 for eventual reboot
  • 12:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:31 akosiaris: failover logstash2023 from ganeti2007->ganeti2023 for migration_downtime change to apply
  • 12:26 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster (duration: 01m 25s)
  • 12:24 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster
  • 12:22 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster (duration: 00m 03s)
  • 12:22 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster
  • 11:53 Urbanecm: EU B&C window done
  • 11:50 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/VisualEditor/modules/: Backport: 0a08066: Revert "Allow generic params to be passed to getWikitextFragment" (T255785) (duration: 00m 58s)
  • 11:45 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P11627 and previous config saved to /var/cache/conftool/dbconfig/20200622-114554-marostegui.json
  • 11:40 moritzm: draining ganeti2008 for eventual reboot
  • 11:37 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster (duration: 00m 28s)
  • 11:37 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin1001 now on buster
  • 11:34 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P11625 and previous config saved to /var/cache/conftool/dbconfig/20200622-113401-marostegui.json
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 74e8295: IS: Cleanup some redundant rows (duration: 00m 56s)
  • 11:29 Urbanecm: Run namespaceDupes.php for zh* projects (T165593)
  • 11:24 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P11623 and previous config saved to /var/cache/conftool/dbconfig/20200622-112451-marostegui.json
  • 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: db952ba: Add zh-hans and zh-hant translation of Module and Module_talk aliases for all Zh Projects (T165593) (duration: 00m 56s)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1301fd4: Add import sources for gomwiktionary (T255098) (duration: 00m 57s)
  • 11:08 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P11622 and previous config saved to /var/cache/conftool/dbconfig/20200622-110806-marostegui.json
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: defa81e: Disable NS_USER(_TALK) search engine indexing on trwiki (T255538) (duration: 00m 58s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (606985) (duration: 00m 56s)
  • 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (606985) (duration: 01m 12s)
  • 09:58 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:56 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1094 for reimage', diff saved to https://phabricator.wikimedia.org/P11621 and previous config saved to /var/cache/conftool/dbconfig/20200622-093323-marostegui.json
  • 09:31 godog: roll-restart logstash in codfw/eqiad to apply configuration change
  • 08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:33 moritzm: reimaging cumin1001 to buster T245114
  • 08:13 godog: extend prometheus codfw ops filesystem to 1TB
  • 08:02 vgutierrez: upgrade to trafficserver 8.0.8~rc0-1wm1 on cp4026 and cp4032
  • 08:02 vgutierrez: upload trafficserver 8.0.8~rc0-1wm1 to apt.wm.o (buster)
  • 07:33 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:30 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:16 marostegui: Reimage db1117 (irc haproxy alerts will be triggered)
  • 06:26 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:24 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:06 marostegui: Stop MySQL on dbstore1005 for reimage to Buster - T254870
  • 05:58 marostegui: Compress InnoDb on db1118 T254462
  • 05:51 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:49 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 05:43 marostegui: Stop haproxy on dbproxy1008 - T255406
  • 05:33 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1118 for reimage and InnoDB compression', diff saved to https://phabricator.wikimedia.org/P11617 and previous config saved to /var/cache/conftool/dbconfig/20200622-053334-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1134', diff saved to https://phabricator.wikimedia.org/P11616 and previous config saved to /var/cache/conftool/dbconfig/20200622-053104-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11615 and previous config saved to /var/cache/conftool/dbconfig/20200622-051730-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11614 and previous config saved to /var/cache/conftool/dbconfig/20200622-051720-marostegui.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11613 and previous config saved to /var/cache/conftool/dbconfig/20200622-050259-marostegui.json
  • 04:50 marostegui: Deploy schema change on s3 primary master with a big sleep between wikis - T250066
  • 04:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11612 and previous config saved to /var/cache/conftool/dbconfig/20200622-044853-marostegui.json

2020-06-20

  • 22:56 cdanis@cumin2001: dbctl commit (dc=all): 'db1088 seems to have crashed', diff saved to https://phabricator.wikimedia.org/P11611 and previous config saved to /var/cache/conftool/dbconfig/20200620-225624-cdanis.json
  • 07:42 elukey: powercycle an-worker1093 - bug soft lock up CPU showed in mgmt console
  • 07:36 elukey: powercycle an-worker1091 - bug soft lock up CPU showed in mgmt console

2020-06-19

  • 18:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump eventlogging_Test schema version to 1.1.0 to pick up client_dt - T238230 (duration: 00m 59s)
  • 16:07 mutante: ganeti4003 - rebooting install4001 - trying to bootstrap OS install from install2003
  • 15:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 15:28 godog: roll-restart kibana to apply new settings
  • 13:01 moritzm: installing cups security updates (client side libs/tools)
  • 12:31 qchris: Disabling puppet on gerrit1002 (test instance) to do some more testing
  • 12:14 godog: delete march indices from logstash 5 eqiad to free up space
  • 12:12 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:10 marostegui@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:08 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:07 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:06 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:05 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:39 marostegui: Reimage db2116 db2119 db2130
  • 10:55 moritzm: installing mesa security updates
  • 10:49 godog: close april logstash indices on logstash 5 eqiad
  • 10:45 moritzm: installing tomcat8 security updates
  • 10:38 jayme: imported chartmuseum_0.12.0-1 to buster-wikimedia
  • 10:24 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1093', diff saved to https://phabricator.wikimedia.org/P11604 and previous config saved to /var/cache/conftool/dbconfig/20200619-102447-marostegui.json
  • 10:21 godog: start closing logstash indices for 2020.03 in elastic 5 eqiad
  • 09:22 godog: restart elasticsearch on logstash1010
  • 09:14 apergos: rsync from dumpsdata1003 as root to labstore1007 of dumps output files to catch up, with --bwlimit=160000 up from 80000
  • 08:45 volans: backup netbox and run one-time script to reserve first IPs on all infra prefixes on Netbox - T233183
  • 08:45 godog: roll restart elasticsearch_5@production-logstash-eqiad
  • 08:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:15 godog: roll-restart logstash elk5 for "JVM GC Old generation-s runs" alert
  • 08:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:59 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1093', diff saved to https://phabricator.wikimedia.org/P11601 and previous config saved to /var/cache/conftool/dbconfig/20200619-075907-marostegui.json
  • 07:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:44 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P11600 and previous config saved to /var/cache/conftool/dbconfig/20200619-074420-marostegui.json
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:02 moritzm: rebooting ganeti nodes in eqiad for kernel security updates
  • 06:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 06:47 moritzm: force reinstall of memcached 1.6 deb packages to ensure that the override is used in addition to the unmodified systemd unit from the deb T233933
  • 06:39 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:36 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:20 marostegui: Stop mysql on db2132 to reimage m1 codfw master - T254556
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2075 db2111', diff saved to https://phabricator.wikimedia.org/P11599 and previous config saved to /var/cache/conftool/dbconfig/20200619-061922-marostegui.json
  • 06:05 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:02 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:01 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:00 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11598 and previous config saved to /var/cache/conftool/dbconfig/20200619-055430-marostegui.json
  • 05:41 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db2075 and db2111 for reimage', diff saved to https://phabricator.wikimedia.org/P11597 and previous config saved to /var/cache/conftool/dbconfig/20200619-054118-marostegui.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108', diff saved to https://phabricator.wikimedia.org/P11596 and previous config saved to /var/cache/conftool/dbconfig/20200619-053402-marostegui.json
  • 05:25 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:23 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 for reimage', diff saved to https://phabricator.wikimedia.org/P11595 and previous config saved to /var/cache/conftool/dbconfig/20200619-044440-marostegui.json
  • 04:39 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P11594 and previous config saved to /var/cache/conftool/dbconfig/20200619-043956-marostegui.json
  • 04:35 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P11593 and previous config saved to /var/cache/conftool/dbconfig/20200619-043554-marostegui.json

2020-06-18

  • 22:30 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on all wikis - T249261 (duration: 00m 56s)
  • 21:14 volans: start check-homer-diff.service on cumin2001 after merging the fix r/606526
  • 20:17 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on all wikis - T249261 (duration: 00m 57s)
  • 19:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on group1 wikis - T249261 (duration: 00m 57s)
  • 18:53 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
  • 18:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:16 wkandek@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
  • 17:14 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
  • 17:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
  • 16:51 maryum: reindex suspended until deployment of code
  • 16:49 hnowlan: Shut off non-dockerised deployment-prep instance of changeprop
  • 16:15 maryum: reindexing French wiki in Elasticsearch
  • 15:37 Reedy: creatd bot_passwords tables on officeiwki and otrs_wikiwiki T254925 T246489
  • 15:34 moritzm: installing harfbuzz security updates
  • 15:23 moritzm: installing Ruby 2.1 security updates
  • 15:15 moritzm: installing python-django security updates (packaged buster version)
  • 15:04 moritzm: installing bind updates on jessie (client side tools/libs)
  • 14:19 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11591 and previous config saved to /var/cache/conftool/dbconfig/20200618-141941-marostegui.json
  • 14:14 moritzm: failover ganeti master in codfw to ganeti2021
  • 14:03 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1078 for schema change', diff saved to https://phabricator.wikimedia.org/P11590 and previous config saved to /var/cache/conftool/dbconfig/20200618-140352-marostegui.json
  • 14:02 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11589 and previous config saved to /var/cache/conftool/dbconfig/20200618-140203-marostegui.json
  • 13:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 akosiaris: restart logstash2005 for applying an increased ganeti migration_downtime of 10k
  • 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:52 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1075 for schema change', diff saved to https://phabricator.wikimedia.org/P11586 and previous config saved to /var/cache/conftool/dbconfig/20200618-125216-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es5 master as es1024 is fully repooled now', diff saved to https://phabricator.wikimedia.org/P11585 and previous config saved to /var/cache/conftool/dbconfig/20200618-124801-marostegui.json
  • 12:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:05 kormat: reimaging db1077 for final test T251768
  • 11:51 jbond@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: (no justification provided) (duration: 01m 00s)
  • 11:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2076', diff saved to https://phabricator.wikimedia.org/P11583 and previous config saved to /var/cache/conftool/dbconfig/20200618-094001-marostegui.json
  • 09:39 akosiaris: update wikifeeds to latest chart version in codfw
  • 09:39 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 09:38 marostegui@cumin2001: dbctl commit (dc=all): 'Repool es2022', diff saved to https://phabricator.wikimedia.org/P11582 and previous config saved to /var/cache/conftool/dbconfig/20200618-093803-marostegui.json
  • 09:38 akosiaris: uncordon kubernetes20{07..14} and kubernetes10{07..14}. Nodes are now fully put in rotation and ready to receive production traffic
  • 09:34 marostegui: Deploy schema change on s3 codfw master (this will create lag on codfw) - T250066
  • 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:30 godog: temp stop logstash on elk7 to test 8 pipeline workers - T255243
  • 09:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:09 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 09:06 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:59 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool es1025', diff saved to https://phabricator.wikimedia.org/P11581 and previous config saved to /var/cache/conftool/dbconfig/20200618-085927-marostegui.json
  • 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:50 ayounsi@cumin2001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 08:49 ayounsi@cumin2001: START - Cookbook sre.network.prepare-upgrade
  • 08:49 ayounsi@cumin2001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 08:49 ayounsi@cumin2001: START - Cookbook sre.network.prepare-upgrade
  • 08:49 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool es1025', diff saved to https://phabricator.wikimedia.org/P11580 and previous config saved to /var/cache/conftool/dbconfig/20200618-084929-marostegui.json
  • 08:47 marostegui@cumin2001: dbctl commit (dc=all): 'Depool es2022 for reimage', diff saved to https://phabricator.wikimedia.org/P11578 and previous config saved to /var/cache/conftool/dbconfig/20200618-084720-marostegui.json
  • 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:37 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool es1025', diff saved to https://phabricator.wikimedia.org/P11577 and previous config saved to /var/cache/conftool/dbconfig/20200618-083749-marostegui.json
  • 08:25 elukey: change archiva-ci password in archiva
  • 08:24 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool es1025', diff saved to https://phabricator.wikimedia.org/P11576 and previous config saved to /var/cache/conftool/dbconfig/20200618-082432-marostegui.json
  • 08:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:10 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:08 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:41 marostegui: Reimage es1025
  • 07:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:34 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P11574 and previous config saved to /var/cache/conftool/dbconfig/20200618-073414-marostegui.json
  • 07:33 ayounsi@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:25 ayounsi@cumin2001: START - Cookbook sre.dns.netbox
  • 07:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:22 moritzm: rolling reboot of ganeti servers in codfw
  • 07:10 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 04:50 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P11573 and previous config saved to /var/cache/conftool/dbconfig/20200618-045047-marostegui.json

2020-06-17

  • 23:25 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0e7079d: Install DiscussionTools on all wikis (attempt 2) (T252264; T253943) (duration: 00m 56s)
  • 23:23 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/DiscussionTools/includes/Hooks.php: ff01083: Use $wgLocaltimezone global instead of request context (T255704) (duration: 00m 57s)
  • 23:21 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/DiscussionTools/includes/Hooks.php: 4551d29: Use $wgLocaltimezone global instead of request context (T252264; T253943; T255704) (duration: 00m 58s)
  • 23:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@79fb82f]: 0.3.39 (duration: 14m 38s)
  • 22:47 ryankemper@deploy1001: Started deploy [wdqs/wdqs@79fb82f]: 0.3.39
  • 21:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:32 hashar: Fixed up zuul-merger on contint1001 due to some faulty hotfix
  • 20:08 hashar: Stopped zuul-merger on contint1001
  • 19:21 marostegui: Deploy schema change on s6 codfw master T238966
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094', diff saved to https://phabricator.wikimedia.org/P11572 and previous config saved to /var/cache/conftool/dbconfig/20200617-191723-marostegui.json
  • 19:11 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:08 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:05 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 18:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6640d6f] (thin): Quick fix for data quality bundles (THIN) (duration: 00m 10s)
  • 18:57 milimetric@deploy1001: Started deploy [analytics/refinery@6640d6f] (thin): Quick fix for data quality bundles (THIN)
  • 18:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:44 milimetric@deploy1001: Finished deploy [analytics/refinery@6640d6f]: Quick fix for data quality bundles (duration: 27m 55s)
  • 18:41 Urbanecm: Morning B&C window done
  • 18:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 96153f9: Add temporary logging for mediamoderation (T247943) (duration: 00m 56s)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT: ae76450: Install DiscussionTools on all wikis (T252264; T253943) (duration: 00m 34s)
  • 18:22 urbanecm@deploy1001: scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 18:21 urbanecm@deploy1001: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 18:16 milimetric@deploy1001: Started deploy [analytics/refinery@6640d6f]: Quick fix for data quality bundles
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c9f6452: Set DiscussionToolsEnableVisual to true by default (T251654) (duration: 00m 56s)
  • 18:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:04 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on group0 wikis - T249261 (duration: 00m 56s)
  • 16:00 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P11571 and previous config saved to /var/cache/conftool/dbconfig/20200617-160013-marostegui.json
  • 15:28 godog: temp bump logstash7 workers to 8 and temp stop logstash - T255243
  • 15:17 jforrester@deploy1001: Synchronized private/PrivateSettings.php: T247943 Add API key and recipient config for MediaModeration (duration: 00m 55s)
  • 15:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2338.codfw.wmnet
  • 15:11 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw233[5-9].codfw.wmnet
  • 15:11 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T247943 Install MediaModeration extension - III: Install where enabled (duration: 00m 56s)
  • 15:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
  • 15:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
  • 15:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2337.codfw.wmnet
  • 15:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
  • 15:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw233[5-9].codfw.wmnet
  • 14:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/GrowthExperiments/modules/help/ext.growthExperiments.HelpPanelProcessDialog.js: T255607 Fix help panel sizing logic (duration: 00m 56s)
  • 14:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:52 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 mdholloway: rolled back recommendation-api deployment due to canary endpoint check failure (T255683)
  • 14:44 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@c39d567]: Update recommendation-api to db97742 (duration: 01m 16s)
  • 14:43 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@c39d567]: Update recommendation-api to db97742
  • 14:30 akosiaris: redrain kubernetes1007-14
  • 14:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:27 mutante: disabling puppet on icinga to avoid alert spam when adding new appservers
  • 14:25 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:22 akosiaris: uncordon kubernetes10{07..14} again
  • 14:13 mutante: generating new mcrouter certs for mw2335 - mw2339 (T247021)
  • 14:02 mutante: rebooting mw2335 through mw2339 (not in service)
  • 13:51 XioNoX: cleanup msw1-codfw interfaces
  • 13:44 akosiaris: redrain kubernetes1007-14
  • 13:37 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:35 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - SearchSatisfaction on testwiki version 1.1.0 - T249261 (duration: 00m 58s)
  • 13:30 moritzm: upgrade remaining parsoid nodes to PHP 7.2.31
  • 13:21 jbond42: re-enable puppet on C:memcached nodes
  • 13:04 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:04 marostegui: The above db1129 depool was meant to be a repool, wrong commit message
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.37
  • 13:03 jbond42: disable puppet on C:memcache to deploy a new change
  • 13:02 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P11567 and previous config saved to /var/cache/conftool/dbconfig/20200617-130236-marostegui.json
  • 13:02 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:00 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:00 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:00 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:00 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 13:00 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:00 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:59 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:59 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:59 akosiaris@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:59 akosiaris@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:54 hnowlan: upgraded cpjobqueue to newer container image, rolled back
  • 12:40 marostegui@cumin2001: dbctl commit (dc=all): 'Add db2091 to s8 T253217', diff saved to https://phabricator.wikimedia.org/P11566 and previous config saved to /var/cache/conftool/dbconfig/20200617-124034-marostegui.json
  • 12:32 hnowlan: Removed remaining changeprop systemd components from scb
  • 12:06 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db2076 to remove triggers from sanitarium T238966', diff saved to https://phabricator.wikimedia.org/P11565 and previous config saved to /var/cache/conftool/dbconfig/20200617-120622-marostegui.json
  • 11:59 Amir1: not today, just EU noon
  • 11:59 Amir1: B&C is done for today
  • 11:58 ladsgroup@deploy1001: Synchronized wmf-config/config/trwikisource.yaml: Change sidebar upload link destination for tr.wikisource (T253490) (duration: 01m 03s)
  • 11:55 ladsgroup@deploy1001: Synchronized dblists/commonsuploads.dblist: Change sidebar upload link destination for tr.wikisource (T253490) (duration: 01m 04s)
  • 11:48 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 11:47 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add extended-confirmed group and restriction level for rowiki (T254471) (duration: 01m 04s)
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 for reimage, give weight to es1023 (es5 master)', diff saved to https://phabricator.wikimedia.org/P11563 and previous config saved to /var/cache/conftool/dbconfig/20200617-113026-marostegui.json
  • 11:23 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/GrowthExperiments/extension.json: Fix NewcomerTask schema (T255597) (duration: 01m 04s)
  • 11:18 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/GrowthExperiments/extension.json: Fix NewcomerTask schema (T255597) (duration: 01m 06s)
  • 11:07 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set hiwiktionary timezone to Asia/Kolkata (T255531) (duration: 01m 05s)
  • 10:48 marostegui@cumin2001: dbctl commit (dc=all): 'Remove db2091 from dbctl in s2 and s4', diff saved to https://phabricator.wikimedia.org/P11562 and previous config saved to /var/cache/conftool/dbconfig/20200617-104816-marostegui.json
  • 10:40 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:38 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:31 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.37 (duration: 01m 04s)
  • 10:30 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.37
  • 09:44 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:42 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:40 hnowlan: killing stale changeprop instances running on scb hosts
  • 09:16 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/Flow/: T255608 Revert 'Hooks: Use PageMoveComplete instead of TitleMoveCompleting' (duration: 01m 05s)
  • 09:15 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11558 and previous config saved to /var/cache/conftool/dbconfig/20200617-091509-marostegui.json
  • 09:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/includes/HookContainer/DeprecatedHooks.php: T255608 Revert 'Hard deprecate the hook' (duration: 01m 05s)
  • 09:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T247943 Install MediaModeration extension - II: Add flag to IS (duration: 01m 05s)
  • 08:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:52 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:49 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:47 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11557 and previous config saved to /var/cache/conftool/dbconfig/20200617-084751-marostegui.json
  • 08:44 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11556 and previous config saved to /var/cache/conftool/dbconfig/20200617-084402-marostegui.json
  • 08:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.37/includes/EditPage.php: T255177 T255614 Do not return internal edit status from EditPage (duration: 01m 08s)
  • 08:31 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1113:3315, db1113:3316', diff saved to https://phabricator.wikimedia.org/P11554 and previous config saved to /var/cache/conftool/dbconfig/20200617-083120-marostegui.json
  • 08:30 godog: start logstash on logstash7 - T255243
  • 08:29 moritzm: prune nginx from remaining mw* servers in codfw T255565
  • 08:23 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:20 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:10 godog: stop logstash temporarily on logstash7 hosts to test increased es shards - T255243
  • 08:05 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1113:3315 db1113:3316', diff saved to https://phabricator.wikimedia.org/P11553 and previous config saved to /var/cache/conftool/dbconfig/20200617-080511-marostegui.json
  • 07:53 elukey: reboot kafka-jumbo1009 for kernel upgrades
  • 06:40 elukey: reboot krb1001 for kernel upgrades
  • 06:24 elukey: reboot an-master100[1,2] for kernel upgrades
  • 06:23 XioNoX: set lacp active on cr2-esams:ae2 - T253970
  • 06:15 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: test fast stale mode on testwiki T250248 (duration: 01m 17s)
  • 06:03 elukey: reboot an-conf100[1-3] for kernel upgrades
  • 05:45 elukey: reboot stat1007/8 for kernel upgrades
  • 05:45 elukey: clean up old systemd timer config on an-coord1001 (came up after the last reboot)
  • 05:42 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide: (duration: 00m 05s)
  • 05:42 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
  • 05:34 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11552 and previous config saved to /var/cache/conftool/dbconfig/20200617-053421-marostegui.json
  • 05:29 marostegui: Deploy schema change on s7 codfw (lag will appear) - T250066
  • 05:28 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11551 and previous config saved to /var/cache/conftool/dbconfig/20200617-052809-marostegui.json
  • 05:22 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11550 and previous config saved to /var/cache/conftool/dbconfig/20200617-052202-marostegui.json
  • 05:19 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P11549 and previous config saved to /var/cache/conftool/dbconfig/20200617-051916-marostegui.json
  • 05:10 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:08 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for reimage', diff saved to https://phabricator.wikimedia.org/P11548 and previous config saved to /var/cache/conftool/dbconfig/20200617-045105-marostegui.json
  • 04:44 marostegui: Reload pt-kill on labsdb analytics host to pick up new config
  • 04:38 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P11547 and previous config saved to /var/cache/conftool/dbconfig/20200617-043826-marostegui.json
  • 01:43 shdubsh: restart elasticsearch on logstash1011

2020-06-16

  • 23:43 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Deploying Netbox to netbox-dev T253140 (duration: 00m 05s)
  • 23:43 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Deploying Netbox to netbox-dev T253140
  • 23:35 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: update ML models for ko and zh, drop ja (duration: 01m 00s)
  • 23:34 ebernhardson@deploy1001: sync-file aborted: cirrus: update ML models for ko and zh, drop ja (duration: 00m 04s)
  • 22:40 krinkle@deploy1001: Synchronized src/Noc/: (no justification provided) (duration: 01m 04s)
  • 22:31 krinkle@deploy1001: Synchronized docroot/noc: (no justification provided) (duration: 01m 05s)
  • 21:12 krinkle@deploy1001: Synchronized php-1.35.0-wmf.37/extensions/WikimediaEvents/modules/: I67794c (duration: 01m 04s)
  • 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.37
  • 20:41 foks: reset email and pw for CactusJack
  • 20:32 brennen: rolling 1.35.0-wmf.37 back to group0
  • 20:29 mutante: signing puppet cert requests for releases1002 and releases2002 - T255590
  • 19:24 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.37 (duration: 01m 04s)
  • 19:23 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.37
  • 19:18 otto@deploy1001: Started deploy [analytics/refinery@8b8ce6e]: deploying refinery source 0.0.127 for eventlogging -> eventgate migration - T249261
  • 19:15 brennen@deploy1001: Synchronized php-1.35.0-wmf.37/skins/Vector/resources/skins.vector.styles/: Restore Watchlist star (duration: 01m 05s)
  • 19:03 brennen: CORRECTION: holding _1.35.0-wmf.37_ deploy to group1 for a few minutes while merging & testing fix for T255574
  • 19:01 brennen: holding 1.35.0-wmf.27 deploy to group1 for a few minutes while merging & testing fix for T255574
  • 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:52 qchris: Turning on puppet again on gerrit1002 to avoid having it lag too far behind.
  • 18:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:18 mutante: mw2293 - scap pull (because Icinga reports mismatched MW versions)
  • 18:01 crusnov@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 17:55 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:52 crusnov@cumin2001: START - Cookbook sre.ganeti.makevm
  • 17:44 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@f4f5d7b]: airflow: adjust glent legal cutoff (duration: 01m 35s)
  • 17:42 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@f4f5d7b]: airflow: adjust glent legal cutoff
  • 17:32 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 17:03 herron: performing rolling reboots of kafka-main hosts for security updates T254990
  • 16:27 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 16:26 hnowlan: Updating changeprop to new container version with updated dependencies
  • 16:07 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 16:04 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:02 elukey: reboot kafka-jumbo1008 for kernel upgrades
  • 15:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076', diff saved to https://phabricator.wikimedia.org/P11543 and previous config saved to /var/cache/conftool/dbconfig/20200616-154924-marostegui.json
  • 15:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7d4458c]: Reduce glent maximum yarn resource usage to reasonable levels (duration: 00m 41s)
  • 15:44 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7d4458c]: Reduce glent maximum yarn resource usage to reasonable levels
  • 15:26 milimetric@deploy1001: Finished deploy [analytics/refinery@c652f62] (thin): Regular analytics weekly THIN train [analytics/refinery@c652f62] (duration: 00m 08s)
  • 15:25 milimetric@deploy1001: Started deploy [analytics/refinery@c652f62] (thin): Regular analytics weekly THIN train [analytics/refinery@c652f62]
  • 15:23 milimetric@deploy1001: Finished deploy [analytics/refinery@c652f62]: Regular analytics weekly train [analytics/refinery@c652f62] (duration: 07m 56s)
  • 15:20 elukey: reboot kafka-jumbo1007 for kernel upgrades
  • 15:15 moritzm: upgrading intel-microcode on jessie hosts
  • 15:15 milimetric@deploy1001: Started deploy [analytics/refinery@c652f62]: Regular analytics weekly train [analytics/refinery@c652f62]
  • 15:06 elukey: reboot an-coord1001 for kernel upgrades
  • 14:49 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:45 moritzm: rebooting scandium for kernel security update
  • 14:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:43 cdanis: repool eqiad T243080
  • 14:40 papaul: power off ms-be2018 for BBU replacement
  • 14:33 cdanis: eqiad router upgrades completed! πŸŽ‰ T243080
  • 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:31 elukey: reboot druid100[7,8] for kernel upgrades
  • 14:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P11541 and previous config saved to /var/cache/conftool/dbconfig/20200616-141540-marostegui.json
  • 14:14 cdanis: T243080 cdanis@re1.cr2-eqiad> request chassis routing-engine master switch
  • 14:11 moritzm: removing stray nginx packages from mw canaries (mw1261-mw1265 and mw1276-mw1283) T255565
  • 14:06 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:03 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:03 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 14:03 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:56 cdanis: T243080 cdanis@re0.cr2-eqiad> request chassis routing-engine master switch
  • 13:50 cdanis: cr2-eqiad: rebooting RE1 [backup] with new junos version T243080
  • 13:39 cdanis: cr2-eqiad: disable transit/peering BGP & bump fr MED T243080
  • 13:32 marostegui@cumin2001: dbctl commit (dc=all): 'Repool db2092 T254462', diff saved to https://phabricator.wikimedia.org/P11535 and previous config saved to /var/cache/conftool/dbconfig/20200616-133241-marostegui.json
  • 13:17 XioNoX: pfw3-eqiad rollback MED to cr1 to 0 - T243080
  • 13:12 XioNoX: add graceful-switchover to cr1-eqiad
  • 13:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.37
  • 13:06 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:03 cdanis: T243080 cdanis@re1.cr1-eqiad> request chassis routing-engine master switch
  • 13:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:01 moritzm: rebooting mw2291-mw2334
  • 12:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 12:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 12:47 jbond42: upload new memcache package with TLS to component/memcached16 in buster-wikimedia
  • 12:42 XioNoX: pfw3-eqiad set MED to cr1 to 300 - T243080
  • 12:38 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 12:31 cdanis: T243080 cr1-eqiad: request chassis routing-engine master switch
  • 12:31 cdanis: cr1-eqiad: request chassis routing-engine master switch
  • 12:25 cdanis: cr1-eqiad: rebooting RE1 [backup] with new junos version T243080
  • 12:15 cdanis: cdanis@re0.cr1-eqiad# commit confirmed 2 comment "force VRRP failover T243080"
  • 12:14 cdanis: disable transit/peering & increase frack MED on cr1-eqiad T243080
  • 12:09 hnowlan@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:48 cdanis: depooling eqiad for router upgrade T243080
  • 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:42 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:42 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:42 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:40 hnowlan: roll-restarting restbase201[0-2] for cert updates
  • 11:40 hnowlan@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 11:39 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:39 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:38 hnowlan@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:35 elukey: reboot an-druid100[1,2] for kernel upgrades
  • 11:27 hnowlan: roll-restart restbase2009 for cert update
  • 11:26 hnowlan@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 11:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:18 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: T32405 T254731 Drop mobile special casing of main page for simplewiki, itwikisource, vecwikisource (duration: 01m 05s)
  • 11:15 moritzm: updating perf on stretch hosts
  • 11:14 marostegui: Deploy MCR schema change on db2087:3316
  • 11:09 moritzm: updating perf on buster
  • 11:02 moritzm: rebooting mw2350-mw2376
  • 11:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgActorTableSchemaMigrationStage, no longer read in core (duration: 01m 05s)
  • 10:52 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgTagStatisticsNewTable, no longer read in core (duration: 01m 04s)
  • 10:51 hnowlan: roll-restarting restbase101[6-8].eqiad.wmnet for cert updates
  • 10:50 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgChangeTagsSchemaMigrationStage, no longer read in core (duration: 01m 06s)
  • 10:26 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgCommentTableSchemaMigrationStage, no longer read in core (duration: 01m 07s)
  • 09:54 volans: restarting netbox to pickup modified customscripts
  • 09:14 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=eqiad
  • 08:53 godog: roll restart prometheus eqiad ops to enable thanos upload
  • 08:48 marostegui: Upgrade db2132
  • 08:44 marostegui@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:42 marostegui@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:39 liw@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.37 (duration: 59m 05s)
  • 08:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:09 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3bis) (duration: 00m 12s)
  • 08:09 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3bis)
  • 08:09 volans@deploy1001: Finished deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3) (duration: 01m 37s)
  • 08:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:07 volans@deploy1001: Started deploy [homer/deploy@e9acec8]: Release v0.2.3 on cumin2001 now on buster (take 3)
  • 07:59 volans@deploy1001: Finished deploy [homer/deploy@85e92b8]: Release v0.2.3 on cumin2001 now on buster (take 2) (duration: 00m 57s)
  • 07:58 volans@deploy1001: Started deploy [homer/deploy@85e92b8]: Release v0.2.3 on cumin2001 now on buster (take 2)
  • 07:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:49 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:40 liw@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.37
  • 07:37 liw@deploy1001: Pruned MediaWiki: 1.35.0-wmf.35 (duration: 01m 47s)
  • 07:31 liw@deploy1001: Pruned MediaWiki: 1.35.0-wmf.34 (duration: 11m 52s)
  • 07:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:08 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:07 liw: 1.35.0-wmf.37 was branched at f856960 for T254174
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P11526 and previous config saved to /var/cache/conftool/dbconfig/20200616-070651-marostegui.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P11525 and previous config saved to /var/cache/conftool/dbconfig/20200616-070450-marostegui.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11524 and previous config saved to /var/cache/conftool/dbconfig/20200616-070429-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11523 and previous config saved to /var/cache/conftool/dbconfig/20200616-070209-marostegui.json
  • 06:57 marostegui: Compress InnoDB on db1134 T254462
  • 06:56 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1134 for InnoDB compression T254462', diff saved to https://phabricator.wikimedia.org/P11522 and previous config saved to /var/cache/conftool/dbconfig/20200616-065600-marostegui.json
  • 06:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093', diff saved to https://phabricator.wikimedia.org/P11521 and previous config saved to /var/cache/conftool/dbconfig/20200616-065412-marostegui.json
  • 06:40 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:25 elukey: roll restart memcached on mc-gp* (gutter pools) to pick up new slab size distribution setting - T252391
  • 06:04 hashar: Restarted Zuul scheduler and merger on contint2001 a couple hotfixes # T252310 T255424
  • 05:54 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide: (duration: 00m 05s)
  • 05:54 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
  • 05:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11520 and previous config saved to /var/cache/conftool/dbconfig/20200616-045958-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11519 and previous config saved to /var/cache/conftool/dbconfig/20200616-045744-marostegui.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1147', diff saved to https://phabricator.wikimedia.org/P11518 and previous config saved to /var/cache/conftool/dbconfig/20200616-045636-marostegui.json
  • 04:55 marostegui: Deploy schema change on db1147
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P11517 and previous config saved to /var/cache/conftool/dbconfig/20200616-045451-marostegui.json
  • 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1149', diff saved to https://phabricator.wikimedia.org/P11516 and previous config saved to /var/cache/conftool/dbconfig/20200616-044612-marostegui.json
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149', diff saved to https://phabricator.wikimedia.org/P11515 and previous config saved to /var/cache/conftool/dbconfig/20200616-044409-marostegui.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P11514 and previous config saved to /var/cache/conftool/dbconfig/20200616-044326-marostegui.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P11513 and previous config saved to /var/cache/conftool/dbconfig/20200616-044126-marostegui.json
  • 04:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1138', diff saved to https://phabricator.wikimedia.org/P11512 and previous config saved to /var/cache/conftool/dbconfig/20200616-044036-marostegui.json
  • 04:37 marostegui: Deploy schema change on db1138
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11511 and previous config saved to /var/cache/conftool/dbconfig/20200616-043748-marostegui.json
  • 00:28 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: limit HTTP client timeout T245170 (duration: 00m 56s)
  • 00:25 tstarling@deploy1001: Synchronized wmf-config/set-time-limit.php: expose excimer timeout as a global variable T245170 (duration: 00m 56s)
  • 00:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@17212bb]: airflow: migrate leven-dist to edit-dist (duration: 00m 45s)
  • 00:16 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide: (duration: 00m 04s)
  • 00:16 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
  • 00:16 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@17212bb]: airflow: migrate leven-dist to edit-dist

2020-06-15

  • 23:56 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: reducing connect timeout per T105378 (duration: 01m 00s)
  • 23:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@eb0ac12]: Ship templatad table names in HivePartitionRangeSensor (duration: 00m 49s)
  • 23:30 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@eb0ac12]: Ship templatad table names in HivePartitionRangeSensor
  • 22:58 krinkle@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: If7e1613cbcf8 (duration: 00m 56s)
  • 22:57 krinkle@deploy1001: Synchronized wmf-config/profiler.php: If7e1613cbcf8 (duration: 00m 59s)
  • 22:02 bstorm_: downtimed puppet alerts for testing some changes on labstore1004/5
  • 20:59 ebernhardson@deploy1001: Finished deploy [search/airflow@62a024b]: Add pydruid to airflow (duration: 00m 50s)
  • 20:58 ebernhardson@deploy1001: Started deploy [search/airflow@62a024b]: Add pydruid to airflow
  • 20:55 shdubsh: update mtail to 3.0.0~rc35 on the rest of the hosts - eqiad and esams
  • 20:44 shdubsh: update mtail to 3.0.0~rc35 on cp nodes in eqiad and esams
  • 20:30 shdubsh: update mtail to 3.0.0~rc35 on wtp in eqiad
  • 19:35 shdubsh: update mtail to 3.0.0~rc35 on mw in eqiad
  • 18:50 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@41186c8]: port glent from oozie to airflow (duration: 00m 39s)
  • 18:50 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@41186c8]: port glent from oozie to airflow
  • 18:28 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:605584 T254315 test wikidata: Use the database name in the Wikibase entity source config (duration: 00m 58s)
  • 17:56 krinkle@deploy1001: Synchronized wmf-config: I7721f4 (duration: 00m 58s)
  • 17:55 krinkle@deploy1001: Synchronized wmf-config/ProductionServices.php: I7721f4 (duration: 00m 57s)
  • 17:52 krinkle@deploy1001: Synchronized lib/: I7721f4 (duration: 00m 58s)
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1142', diff saved to https://phabricator.wikimedia.org/P11504 and previous config saved to /var/cache/conftool/dbconfig/20200615-153825-marostegui.json
  • 15:37 marostegui: Deploy schema change on db1142
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P11503 and previous config saved to /var/cache/conftool/dbconfig/20200615-153630-marostegui.json
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141', diff saved to https://phabricator.wikimedia.org/P11502 and previous config saved to /var/cache/conftool/dbconfig/20200615-153546-marostegui.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P11501 and previous config saved to /var/cache/conftool/dbconfig/20200615-153344-marostegui.json
  • 15:16 moritzm: upgrading wtp1025-wtp1027 to PHP 7.2.31
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P11499 and previous config saved to /var/cache/conftool/dbconfig/20200615-150908-marostegui.json
  • 15:07 marostegui: Deploy schema change on db1121 (and labs)
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P11498 and previous config saved to /var/cache/conftool/dbconfig/20200615-150639-marostegui.json
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11497 and previous config saved to /var/cache/conftool/dbconfig/20200615-150148-marostegui.json
  • 15:00 marostegui: Deploy schema change on db1144:3314
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11496 and previous config saved to /var/cache/conftool/dbconfig/20200615-145914-marostegui.json
  • 14:55 XioNoX: delete VCP from msw1-codfw
  • 14:24 marostegui: Deploy schema change on db2107 (s2 codfw master) - T250066
  • 14:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:09 elukey@cumin2001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 13:54 marostegui: Deploy schema change on db1100 (s5 master) - T250066
  • 13:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:49 marostegui: Upgrade db2133
  • 13:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:38 elukey@cumin2001: START - Cookbook sre.hadoop.roll-restart-workers
  • 13:31 volans@deploy1001: Finished deploy [homer/deploy@ac7a4c6]: Release v0.2.3 on cumin2001 now on buster (duration: 01m 15s)
  • 13:30 moritzm: rolling reboot on the ganeti cluster in esams (for kernel security updates and to pick up the network changes to provides instances with a public IP)
  • 13:30 volans@deploy1001: Started deploy [homer/deploy@ac7a4c6]: Release v0.2.3 on cumin2001 now on buster
  • 13:26 hashar: Started zuul-merger on contint1001 with newer virtualenv # T255424
  • 13:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:21 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=eqiad
  • 13:20 hashar: Stopping zuul-merger on contint1001 to rebuild the virtualenv # T255424
  • 13:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3312, db2091:3314 - T253217', diff saved to https://phabricator.wikimedia.org/P11495 and previous config saved to /var/cache/conftool/dbconfig/20200615-125856-marostegui.json
  • 12:58 vgutierrez: upgrade acme-chief to version 0.26
  • 12:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:46 vgutierrez: upload acme-chief 0.26 to apt.wm.o (buster) - T255249
  • 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:34 moritzm: rolling reboot on the ganeti cluster in eqsin (for security updates and to pick up the network changes to provides instances with a public IP)
  • 12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:11 marostegui: Upgrade db2134
  • 12:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:57 moritzm: reimaging sretest1002 to validate the reimage script on Buster
  • 11:43 marostegui: Reimage dbproxy2003 which points to m3-master.codfw.wmnet (not in use) - T255408
  • 11:40 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Switch on guidance feature (T239181) (duration: 00m 57s)
  • 11:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:10 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:07 hnowlan: regenerated certificates for restbase2009, restbase101[678], restbase201[012]. Did not roll-restart yet
  • 11:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:54 moritzm: imported python-phabricator 0.7.0-2~wmf2 to apt.wikimedia.org/buster-wikimedia T245114
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (605553) (duration: 00m 58s)
  • 10:38 hnowlan: regenerated restbase2009's cassandra certificates
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (605553) (duration: 00m 58s)
  • 10:16 jmm@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 10:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T254820 [enwikivoyage] Undeploy the Listings extension (duration: 01m 00s)
  • 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:46 godog: run logstash benchmark on logstash1023
  • 09:42 volans: deploying esams mgmt DNS records automatically generated by Netbox ( operations/dns/+/604136/ ) - T233183
  • 09:41 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:35 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:29 elukey: update analytics-in4/6 filters on cr1-cr2 eqiad to update the Druid term (new nodes added)
  • 09:21 jbond42: offlining puppetmaster1003 and 2003 for reboot
  • 09:17 XioNoX: reduce ae device-count from 10 to 3 on asw2-a/b/c-eqiad
  • 09:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:11 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:55 marostegui: Deploy schema change on db2123 (s5 codfw master) - T250066
  • 08:50 kart_: Updated cxserver to 2020-06-10-044445-production (T246319, T254959)
  • 08:46 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 08:42 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 08:39 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 08:34 moritzm: reimaging cumin2001 T245114
  • 08:22 marostegui: Switchover m3-master from dbproxy1008 to dbproxy1016 - T202367
  • 08:17 marostegui: Deploy schema change on db1131 (s6 master) - T250066
  • 08:09 moritzm: installing libexif security updates
  • 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:46 XioNoX: standardize ae device-count on all routers
  • 07:36 XioNoX: push new pfw firewall policies - T255185
  • 07:28 marostegui: Deploy schema change on db1093
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P11492 and previous config saved to /var/cache/conftool/dbconfig/20200615-072835-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P11491 and previous config saved to /var/cache/conftool/dbconfig/20200615-072742-marostegui.json
  • 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime

2020-06-14

  • 13:51 qchris: Disabling puppet on gerrit1002 (test instance) to do some more upgrade testing

2020-06-13

  • 21:12 qchris: Enabling puppet on gerrit1002 (test instance). Done with testing for today.
  • 12:51 herron: restarted logstash service on logstash1007, logstash1009
  • 12:34 qchris: Disabling puppet on gerrit1002 (test instance) to do some more upgrade testing
  • 12:33 godog: bounce logstash on logstash1008, GC death

2020-06-12

  • 17:44 herron: restarting logstash1011 elasticsearch instance
  • 16:49 elukey: restart php-fpm and pool mw1384 - T255282
  • 16:33 elukey: (correct) depool again mw1384 - investigation will follow up in a task
  • 16:32 elukey: depool again mw1348 - investigation will follow up in a task
  • 15:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:44 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:40 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:36 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:27 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:25 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:22 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 elukey: repool mw1384 as test
  • 14:31 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 14:30 akosiaris: bump cpu limits for changeprop another 50%
  • 14:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:36 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:34 akosiaris: update changeprop in eqiad+codfw for higher CPU limits
  • 13:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P11483 and previous config saved to /var/cache/conftool/dbconfig/20200612-131205-marostegui.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P11482 and previous config saved to /var/cache/conftool/dbconfig/20200612-124015-marostegui.json
  • 12:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 11:52 filippo@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 11:23 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:19 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:15 moritzm: failover ganeti master in ulsfo to ganeti4003
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2080 and db2084 into s8 T253217', diff saved to https://phabricator.wikimedia.org/P11481 and previous config saved to /var/cache/conftool/dbconfig/20200612-111422-marostegui.json
  • 11:11 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:07 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:02 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:58 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:39 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:36 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:33 moritzm: rolling restart of the ulsfo ganeti cluster
  • 10:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 10:02 filippo@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • 10:01 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:01 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:01 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:01 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Include db2084 in dbctl, depooled', diff saved to https://phabricator.wikimedia.org/P11480 and previous config saved to /var/cache/conftool/dbconfig/20200612-095855-marostegui.json
  • 09:58 godog: roll-restart thanos-fe / thanos-be for microcode updates
  • 08:51 elukey: restart gerrit on gerrit1001
  • 08:48 elukey: update cr1/cr2 analyitics filters for T252767 and T252675
  • 08:44 marostegui: Compress InnoDB on db2092 - T254462
  • 08:36 marostegui: Clone db2084 from db2080
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 to clone db2084', diff saved to https://phabricator.wikimedia.org/P11478 and previous config saved to /var/cache/conftool/dbconfig/20200612-083231-marostegui.json
  • 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2084 from s4 and s5', diff saved to https://phabricator.wikimedia.org/P11477 and previous config saved to /var/cache/conftool/dbconfig/20200612-081455-marostegui.json
  • 07:56 elukey: depool mw1384
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 from s4 and s5', diff saved to https://phabricator.wikimedia.org/P11476 and previous config saved to /var/cache/conftool/dbconfig/20200612-075202-marostegui.json
  • 07:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:08 marostegui: Reimage db2086
  • 07:07 elukey: depool/scap pull/pool mw1384
  • 07:05 moritzm: installing intel-microcode security updates (regressions have been sorted out)
  • 05:42 moritzm: installing stretch kernel security updates (no reboots yet)
  • 05:40 moritzm: installing buster kernel security updates (no reboots yet)
  • 04:54 marostegui: Deploy schema change on s6 codfw - T250066
  • 01:02 ejegg: updated payments-wiki from aceddff8b5 to 5fd4eb1519
  • 00:10 Amir1: BACON is done

2020-06-11

  • 23:54 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/Wikibase: Fix entity id lookup for interwiki special page links (T255078) (duration: 00m 38s)
  • 23:51 ladsgroup@deploy1001: scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 23:43 ladsgroup@deploy1001: Synchronized wmf-config/extension-list: Remove ContributionTracking extension (T255216), Part III (duration: 00m 57s)
  • 23:42 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove ContributionTracking extension (T255216), Part II (duration: 00m 58s)
  • 23:38 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove ContributionTracking extension (T255216), Part I (duration: 00m 59s)
  • 23:37 Reedy: create cn_notice_regions on metawiki and testwiki T252596
  • 20:34 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:59 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.36
  • 19:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:33 akosiaris: apply emergency sessionstore fixes in codfw as well
  • 19:32 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 19:25 gilles@deploy1001: Finished deploy [performance/asoranking@0a096c4]: T252424 (duration: 00m 47s)
  • 19:19 gilles@deploy1001: Started deploy [performance/asoranking@0a096c4]: T252424
  • 19:12 akosiaris: repool eqiad for sessionstore
  • 19:12 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
  • 19:10 akosiaris: remove the podaffinity restrictions for sessionstore in eqiad
  • 19:10 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 19:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 18:08 ppchelko@deploy1001: Synchronized wmf-config/reverse-proxy-staging.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, reverse-proxy-staging.php (duration: 01m 06s)
  • 18:06 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, IS-labs.php (duration: 01m 06s)
  • 17:29 mbsantos@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:26 mbsantos@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:22 mbsantos@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:19 mbsantos@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:12 bstorm_: reboot for stretch upgrade on labstore1004 T224582
  • 16:49 bstorm_: doing stretch upgrade for labstore1004 T224582
  • 16:36 bstorm_: rebooting labstore1004 for upgrades T224582
  • 16:12 bstorm_: downtimed labstore1005 for upgrades on 1004 since that will alert as well T224582
  • 16:10 bstorm_: downtimed labstore1004 for upgrades T224582
  • 15:50 cstone: SmashPig revision changed from b9de3c7aac to 2246685626
  • 15:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:31 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:25 moritzm: installing buster kernel security updates (no reboots yet)
  • 15:04 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 15:04 mforns@deploy1001: Finished deploy [analytics/refinery@c969b56]: Regular analytics weekly train [analytics/refinery@c969b56afae1b2532e07f0ff699c2ce161360966] (duration: 01m 39s)
  • 15:04 root@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 15:04 root@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 15:02 mforns@deploy1001: Started deploy [analytics/refinery@c969b56]: Regular analytics weekly train [analytics/refinery@c969b56afae1b2532e07f0ff699c2ce161360966]
  • 15:02 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:56 herron: bounced elasticsearch on logstash1012
  • 14:41 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:40 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:37 herron: enabled VO incident resolution notification in global settings
  • 14:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:31 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:30 godog: bounce logstash on logstash1009, apparent GC death spiral
  • 14:03 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 14:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:03 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 14:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:35 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=eqiad
  • 13:35 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=eqiad
  • 12:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 12:36 elukey: updated pcc facts
  • 12:28 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 12:28 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 12:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 12:04 jforrester@deploy1001: Synchronized php-1.35.0-wmf.36/includes/title/NamespaceInfo.php: T253098 NamespaceInfo::makeValidNamespace: Don't throw for -1 or -2 (duration: 01m 06s)
  • 12:03 marostegui: Reimage es2023 (es5 codfw master)
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2075 T254139', diff saved to https://phabricator.wikimedia.org/P11469 and previous config saved to /var/cache/conftool/dbconfig/20200611-115430-marostegui.json
  • 11:46 marostegui: Deploy schema change on s6 codfw - T250066
  • 11:44 volans@deploy1001: Finished deploy [homer/deploy@df83901]: Release v0.2.3 (duration: 00m 25s)
  • 11:44 volans@deploy1001: Started deploy [homer/deploy@df83901]: Release v0.2.3
  • 11:36 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 11:36 matthiasmullie: EU BACON done
  • 11:35 mlitn@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/GrowthExperiments: Help panel: Update guidance behavior rules (duration: 01m 06s)
  • 11:34 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 11:34 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 11:28 kartik@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/ContentTranslation/modules/tools/mw.cx.tools.IssueTrackingTool.js: Backport: 604587|IssueTrackingTool: Fix js error in getCurrentNodeId method (T254965) (duration: 01m 07s)
  • 11:08 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 11:04 mlitn@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/MachineVision: $aliases should be an array of strings, not AliasGroup objects (duration: 01m 07s)
  • 10:47 moritzm: repooling mw1318,mw2139,mw2145,mw2147,mw2221,mw2219,mw2250,mw2350 (these were depooled, but seem all fine in Icinga and were probably just forgotten)
  • 10:41 filippo@cumin1001: conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-swift
  • 10:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-query
  • 10:37 moritzm: installing buster kernel security updates (no reboots yet, on hold for regression-free microcode update)
  • 10:32 godog: roll-restart pybal in eqiad lvs low-traffic
  • 10:21 mutante: restarting gerrit on gerrit-replica (gerrit2001) - java.lang.OutOfMemoryError: Java heap space
  • 10:21 Urbanecm: Run scap pull at mwdebug1001 to revert temporary changes
  • 10:14 Urbanecm: Applying temporary changes on mwdebug1001
  • 09:58 moritzm: upgrading netmon* to PHP 7.2.31
  • 09:55 marostegui: Upgrade es2025
  • 09:54 moritzm: upgrading mwmaint* to PHP 7.2.31
  • 09:46 moritzm: upgrading labweb* PHP 7.2.31
  • 09:36 elukey: switch piwik.wikimedia.org from matomo1001 to matomo1002 (new buster node)
  • 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:48 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 08:48 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 08:42 moritzm: imported memcached 1.6.6-1~wmf10u1
  • 08:39 marostegui: Reimage es2024 to buster
  • 08:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:23 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 08:23 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 08:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 07:59 moritzm: upgrading remaining job runners in eqiad to PHP 7.2.31
  • 07:59 hashar: Restarted Zuul on contint2001 for config change # T253263
  • 07:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 07:34 moritzm: upgrading remaining app servers in eqiad to PHP 7.2.31
  • 07:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:07 marostegui: Stop MySQL on dbstore1003 for reimage - T254870
  • 06:38 XioNoX: make asw2-esams interfaces Homer like - T250429
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1127 T253217', diff saved to https://phabricator.wikimedia.org/P11467 and previous config saved to /var/cache/conftool/dbconfig/20200611-055536-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127 T253217', diff saved to https://phabricator.wikimedia.org/P11466 and previous config saved to /var/cache/conftool/dbconfig/20200611-052535-marostegui.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127 T253217', diff saved to https://phabricator.wikimedia.org/P11465 and previous config saved to /var/cache/conftool/dbconfig/20200611-050446-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11464 and previous config saved to /var/cache/conftool/dbconfig/20200611-050200-marostegui.json
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P11463 and previous config saved to /var/cache/conftool/dbconfig/20200611-045426-marostegui.json
  • 04:50 marostegui: Deploy schema change on testwiki - T254371
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084 and slowly repool db1127 T253217', diff saved to https://phabricator.wikimedia.org/P11462 and previous config saved to /var/cache/conftool/dbconfig/20200611-044725-marostegui.json
  • 03:13 shdubsh: removing WDQS-Streaming-Updater-POC metrics on graphite1004 - T255044
  • 02:43 tstarling@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/Wikibase/lib/includes/Store/EntityLinkTargetEntityIdLookup.php: investigate UBN T255078 (duration: 01m 07s)

2020-06-10

  • 23:55 catrope@deploy1001: Synchronized php-1.35.0-wmf.36/includes/skins/SkinTemplate.php: T255073 (duration: 01m 07s)
  • 22:14 eileen: civicrm revision changed from 80a0d22350 to f01b036128, config revision is a26d023633
  • 21:23 akosiaris: increase memory/cpu limits for proton
  • 21:23 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:11 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:08 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 21:06 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:45 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:33 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:15 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:04 mbsantos@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:46 herron: bouncing elasticsearch on logstash1011
  • 19:01 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use EventRelayerNull for wikitech, gerrit:604469 (duration: 01m 05s)
  • 18:54 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/VisualEditor/: 8958860: Make VisualEditorDisableForAnons only hide the tabs, not disable the editor (T253941) (duration: 01m 07s)
  • 18:32 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/VisualEditor/: 5f4c609: Make VisualEditorDisableForAnons only hide the tabs, not disable the editor (T253941) (duration: 01m 14s)
  • 16:40 godog: EDIT: in esams
  • 16:39 godog: restart prometheus@ops in eqiad
  • 16:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable HTCP purges everywhere, gerrit:603655 (duration: 01m 05s)
  • 16:27 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 16:27 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 16:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:13 ema: correction: restart purged on all *cache_upload* hosts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604430/ T250781 T133821
  • 16:12 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 16:12 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 16:12 ema: restart purged on all cache hosts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604430/ T250781 T133821
  • 16:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:06 ema: cp3051: restart purged to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604430/ T250781 T133821
  • 16:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:38 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:36 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Send kafka purges everywhere, gerrit:603654 (duration: 01m 05s)
  • 15:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:32 ema: remaining-cp (non-ulsfo): rolling ats-tls-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ T255015
  • 15:29 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: Make kafka purges config more robust, gerrit:603649, CS.php (duration: 01m 05s)
  • 15:27 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make kafka purges config more robust, gerrit:603649, IS.php (duration: 01m 08s)
  • 15:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:08 godog: roll-restart prometheus k8s to enable thanos upload
  • 15:02 ema: A:cp-ulsfo: rolling ats-tls-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ T255015
  • 14:43 ema: A:cp rolling systemctl restart trafficserver
  • 14:28 ema: systemctl restart trafficserver for instances critical in icinga
  • 14:21 ema: cp3056: ats-backend-restart
  • 14:09 ema: A:cp rolling ats-be/ats-tls restarts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ T255015
  • 14:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 into s7', diff saved to https://phabricator.wikimedia.org/P11458 and previous config saved to /var/cache/conftool/dbconfig/20200610-135753-marostegui.json
  • 13:50 ema: cp3050: ats-tls-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ T255015
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094 into s7', diff saved to https://phabricator.wikimedia.org/P11457 and previous config saved to /var/cache/conftool/dbconfig/20200610-135039-marostegui.json
  • 13:40 ema: cp3050: ats-backend-restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/604305/ T255015
  • 13:36 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 13:06 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.36 (duration: 01m 04s)
  • 13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.36
  • 12:33 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 12:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 12:32 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 12:13 akosiaris: pool thumbor2002, thumbor2001. T251570
  • 12:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2002.codfw.wmnet
  • 12:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2001.codfw.wmnet
  • 11:50 marostegui: Deploy schema change on commonswiki codfw T255003
  • 11:41 moritzm: upgrading remaining app servers in codfw to PHP 7.2.31
  • 11:38 marostegui: Deploy schema change on testcommonswiki T255003
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 52091b8: Grant cswiki accountcreators tboverride-account and override-antispoof (T254927) (duration: 01m 06s)
  • 11:13 moritzm: upgrading remaining job runners in codfw to PHP 7.2.31
  • 11:02 marostegui: Stop MySQL on db1094 to clone db1127
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 moving to clone db1127 T253217', diff saved to https://phabricator.wikimedia.org/P11453 and previous config saved to /var/cache/conftool/dbconfig/20200610-110204-marostegui.json
  • 10:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 moving it to s7 T253217', diff saved to https://phabricator.wikimedia.org/P11452 and previous config saved to /var/cache/conftool/dbconfig/20200610-103742-marostegui.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1103,db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11451 and previous config saved to /var/cache/conftool/dbconfig/20200610-102805-marostegui.json
  • 10:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T254036 Undeploy CollaborationKit: IV – Drop flag to load (duration: 01m 05s)
  • 10:23 jayme: T254581 re-enabled puppet on all mw, api and jobrunner servers
  • 10:20 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T254036 Undeploy CollaborationKit: III – Drop ability to load (duration: 01m 05s)
  • 10:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T254036 Undeploy CollaborationKit: II – Disable on Test Wikipedia (duration: 01m 37s)
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1103,db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11450 and previous config saved to /var/cache/conftool/dbconfig/20200610-101407-marostegui.json
  • 10:12 moritzm: upgrading remaining API servers in codfw to PHP 7.2.31
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1103,db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11449 and previous config saved to /var/cache/conftool/dbconfig/20200610-100834-marostegui.json
  • 10:03 jynus: cloning reviewdb into reviewdb-test at db1132 with replication enabled T254516
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1103 into x1', diff saved to https://phabricator.wikimedia.org/P11448 and previous config saved to /var/cache/conftool/dbconfig/20200610-100306-marostegui.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137 into x1', diff saved to https://phabricator.wikimedia.org/P11447 and previous config saved to /var/cache/conftool/dbconfig/20200610-100037-marostegui.json
  • 09:35 volans: imported 0.0.38-1+deb10u1 into buster-wikimedia APT - T245114
  • 09:35 marostegui: Stop mysql on db1127 to clone db1103
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for cloning db1103 - T253217', diff saved to https://phabricator.wikimedia.org/P11443 and previous config saved to /var/cache/conftool/dbconfig/20200610-093440-marostegui.json
  • 09:31 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:31 godog: configure thanos-be1* HDDs as raid0 - T252186
  • 09:26 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1103 to dbctl, depooled T253217', diff saved to https://phabricator.wikimedia.org/P11442 and previous config saved to /var/cache/conftool/dbconfig/20200610-092603-marostegui.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1103:3312 and db1103:3314', diff saved to https://phabricator.wikimedia.org/P11441 and previous config saved to /var/cache/conftool/dbconfig/20200610-092406-marostegui.json
  • 09:14 jayme: T254581 disabling puppet on all mw, api and jobrunner servers to move termbox envoy config to TLS
  • 09:08 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:50 XioNoX: make asw1-eqsin interfaces Homer like - T250429
  • 08:45 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 08:45 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 08:45 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:17 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 08:15 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:13 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 07:53 kormat: reimaging db1077 T252027
  • 07:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 07:36 XioNoX: make asw2-ulsfo interfaces Homer like - T250429
  • 07:33 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 07:31 moritzm: upgrade mw1298-mw1309 (job runners) to PHP 7.2.31
  • 07:26 XioNoX: trunk public vlan to esams ganeti hosts - T254157
  • 07:16 XioNoX: trunk public vlan to eqsin ganeti hosts - T254157
  • 07:15 moritzm: upgrade remaining API servers in eqiad to PHP 7.2.31
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103 for reimage - T253217', diff saved to https://phabricator.wikimedia.org/P11439 and previous config saved to /var/cache/conftool/dbconfig/20200610-070822-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2113 after on-site maintenance T251570', diff saved to https://phabricator.wikimedia.org/P11438 and previous config saved to /var/cache/conftool/dbconfig/20200610-070508-marostegui.json
  • 06:53 XioNoX: trunk public vlan to ulsfo ganeti hosts - T254157
  • 05:10 marostegui: Deploy schema change on s3 master with 2 minutes sleep between wikis - T206103

2020-06-09

  • 23:18 Reedy: run namespaceDupes.php --fix for hiwikibooks T254012
  • 23:10 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T254706 T254012 T241893 (duration: 01m 06s)
  • 23:03 Reedy: created wikilove_log on slwiki T254706
  • 20:00 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.32 (duration: 05m 11s)
  • 19:51 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.36
  • 19:42 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.36 (duration: 57m 47s)
  • 19:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:45 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.36
  • 18:41 jforrester@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/TimedMediaHandler/includes/TimedMediaHandler.php: T254824 Avoid undefined index error (duration: 00m 57s)
  • 18:36 volans: migrated mgmt DNS records in eqsin to the Netbox-generated records - T233183
  • 18:13 jforrester@deploy1001: Synchronized php-1.35.0-wmf.36/extensions/CheckUser/: T234921 T254912 Use UserGroupManagerFactory with correct domain to fetch groups (duration: 02m 26s)
  • 18:12 volans: uploaded cumin_4.0.0rc1-1_amd64.deb to apt.wikimedia.org buster-wikimedia
  • 16:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:06 longma: cutting the branch for 1.35.0-wmf.36 T254173
  • 15:26 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:26 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:25 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:06 volans: forcing a debmonitor GC to verify the fix of T254865
  • 14:59 mutante: gerrit2001 - delete gerrit logfiles older than 30 days, crons are now enabled to keep doing it in the future
  • 14:55 volans@deploy1001: Finished deploy [debmonitor/deploy@44aa1ee]: Release v0.2.5 (duration: 00m 43s)
  • 14:54 volans@deploy1001: Started deploy [debmonitor/deploy@44aa1ee]: Release v0.2.5
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2131 after reimage', diff saved to https://phabricator.wikimedia.org/P11436 and previous config saved to /var/cache/conftool/dbconfig/20200609-144929-marostegui.json
  • 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:34 moritzm: rebooting auth1002
  • 14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 elukey: update release repository's settings on Archiva - T254849
  • 14:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131 for reimage', diff saved to https://phabricator.wikimedia.org/P11434 and previous config saved to /var/cache/conftool/dbconfig/20200609-123817-marostegui.json
  • 12:22 kormat: reimaging sretest1002 T252027
  • 12:18 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:16 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:14 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1141 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11433 and previous config saved to /var/cache/conftool/dbconfig/20200609-120009-marostegui.json
  • 11:50 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11432 and previous config saved to /var/cache/conftool/dbconfig/20200609-115016-marostegui.json
  • 11:46 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1148 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11431 and previous config saved to /var/cache/conftool/dbconfig/20200609-114615-marostegui.json
  • 11:44 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11430 and previous config saved to /var/cache/conftool/dbconfig/20200609-113818-marostegui.json
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11429 and previous config saved to /var/cache/conftool/dbconfig/20200609-113702-marostegui.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11428 and previous config saved to /var/cache/conftool/dbconfig/20200609-113056-marostegui.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11427 and previous config saved to /var/cache/conftool/dbconfig/20200609-112701-marostegui.json
  • 11:15 ladsgroup@deploy1001: Synchronized langlist: Add be-tarask to langlist (T111853) (duration: 00m 57s)
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148 into s4 T252512', diff saved to https://phabricator.wikimedia.org/P11426 and previous config saved to /var/cache/conftool/dbconfig/20200609-111443-marostegui.json
  • 10:49 elukey: update pcc facts
  • 10:48 moritzm: imported tqdm 4.23.4-1+wmf1 to buster-wikimedia/component/spicerack
  • 10:35 volans: installed spicerack 0.0.38 on cumin[12]001
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1141 depooled to s4 T252512', diff saved to https://phabricator.wikimedia.org/P11425 and previous config saved to /var/cache/conftool/dbconfig/20200609-103252-marostegui.json
  • 10:27 volans: uploaded spicerack_0.0.38-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 10:14 jayme: restarting pybal on lvs1015 and lvs2009 for T254581
  • 10:12 XioNoX: "Re-order some BGP transit neighbors terms"
  • 10:07 marostegui: Deploy schema change on s7 T206103
  • 10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:00 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:57 jayme: restarting pybal on lvs1016 and lvs2010 for T254581
  • 09:57 akosiaris: correction: depool and set as inactive thumbor200{1,2} for T251570
  • 09:57 akosiaris: depool and set as inactive thumber200{1,2} for T251750
  • 09:56 vgutierrez: disable parent proxies on ats-tls
  • 09:55 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2001.codfw.wmnet
  • 09:55 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2002.codfw.wmnet
  • 09:41 marostegui: Compress InnoDB on db2072 T254462
  • 09:34 marostegui: Stop MySQL on db1148 to clone db1141 - T252512
  • 09:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 to clone db1141 - T252512', diff saved to https://phabricator.wikimedia.org/P11423 and previous config saved to /var/cache/conftool/dbconfig/20200609-092915-marostegui.json
  • 09:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:01 moritzm: rolling restart of cassandra on maps* to pick up Java security updates
  • 08:39 moritzm: upgrading snapshot servers to PHP 7.2.31
  • 08:28 moritzm: upgrading deployment servers to PHP 7.2.31
  • 08:01 marostegui: stop m1 on db1117 to clone db1097 (this will trigger an haproxy irc alert) - T254556
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1097 from config', diff saved to https://phabricator.wikimedia.org/P11421 and previous config saved to /var/cache/conftool/dbconfig/20200609-073635-marostegui.json
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:30 moritzm: upgrading mw1390-mw1413 to PHP 7.2.31
  • 07:11 ema: deployment-cache-text06: stop vhtcpd, start purged T254844
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314, db1097:3315 T253217', diff saved to https://phabricator.wikimedia.org/P11420 and previous config saved to /var/cache/conftool/dbconfig/20200609-070917-marostegui.json
  • 06:53 marostegui: Stop MySQL on db2113 for maintenance - T251570
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2113 for on-site maintenance T251570', diff saved to https://phabricator.wikimedia.org/P11419 and previous config saved to /var/cache/conftool/dbconfig/20200609-065125-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1091 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11418 and previous config saved to /var/cache/conftool/dbconfig/20200609-064829-marostegui.json
  • 06:40 marostegui: Deploy schema change on s2 T206103
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1091 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11417 and previous config saved to /var/cache/conftool/dbconfig/20200609-063344-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1091 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11416 and previous config saved to /var/cache/conftool/dbconfig/20200609-061916-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1091 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11415 and previous config saved to /var/cache/conftool/dbconfig/20200609-055128-marostegui.json
  • 05:32 marostegui: Switch dbproxy1018 from "master" service to "replicas" - T249188
  • 01:02 eileen: civicrm revision changed from 4a19db672f to 80a0d22350, config revision is 386b9bc457
  • 00:39 ejegg: updated payments-wiki from c1d14a5db7 to aceddff8b5
  • 00:30 shdubsh: restart elasticsearch on logstash1010
  • 00:24 eileen: civicrm revision changed from be4c5a4951 to 4a19db672f, config revision is 386b9bc457

2020-06-08

  • 23:49 krinkle@deploy1001: Synchronized wmf-config/logging.php: If99192 (duration: 00m 57s)
  • 23:35 krinkle@deploy1001: Synchronized wmf-config/logging.php: I8c22a1a8fc402 (duration: 00m 58s)
  • 23:32 foks: removing one file for legal compliance
  • 23:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:02 ryankemper@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 22:58 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 22:53 ryankemper@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
  • 22:53 shdubsh: update mtail to 3.0.0~rc35 on mw and wtp hosts codfw
  • 22:49 eileen: civicrm revision changed from 11b0e7c7e5 to be4c5a4951, config revision is 386b9bc457
  • 22:49 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 20:52 Amir1: applying the sql alter table on ipblocks on labswiki (T251188)
  • 20:27 RoanKattouw: Running initUserPreference.php -s growthexperiments-homepage-enable -t growthexperiments-help-panel-tog-help-panel on wikis that have GrowthExperiments installed (T240920)
  • 18:56 Urbanecm: Morning SWATconfig/backport window done
  • 18:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 1630a10: Set wgProofreadPagePageJoiner to __PAGEJOIN__ for zhwikisource (T205826) (duration: 00m 58s)
  • 18:55 urbanecm@deploy1001: sync-file aborted: SWAT: 1630a10: Set wgProofreadPagePageJoiner to __PAGEJOIN__ for zhwikisource (duration: 00m 00s)
  • 18:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0e85203: Enable subpages in Page namespace on napwikisource (T252755) (duration: 00m 58s)
  • 18:44 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: End GrowthExperiments homepage A/B test (T254413) (duration: 00m 57s)
  • 18:23 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable HTCP purges for testwiki (T250781) (part 2) (duration: 00m 56s)
  • 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable HTCP purges for testwiki (T250781) (part 1) (duration: 00m 59s)
  • 17:50 elukey: restart prometheus burrow exporter for kafka main on kafkamon1001 - T254498
  • 17:43 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/resources/src/mediawiki.misc-authed-curate/rollback.js: Fix: Diff pages show rollback confirmation prompt if there is the "Mark as patrolled" link (T254538) (duration: 00m 59s)
  • 17:14 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 16:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 16:44 liw: testing upcoming Scap release on beta
  • 15:29 hnowlan: Migrated all cpjobqueue jobs from scb to Kubernetes
  • 15:29 hnowlan@deploy1001: Finished deploy [cpjobqueue/deploy@07d8c32]: Disabling jobs migrated to k8s (duration: 04m 34s)
  • 15:28 jynus@cumin2001: dbctl commit (dc=all): 'depool db2075 for mw maintenance T254139', diff saved to https://phabricator.wikimedia.org/P11411 and previous config saved to /var/cache/conftool/dbconfig/20200608-152811-jynus.json
  • 15:24 hnowlan@deploy1001: Started deploy [cpjobqueue/deploy@07d8c32]: Disabling jobs migrated to k8s
  • 15:12 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part III out of III (T254536) (duration: 00m 57s)
  • 15:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part II out of III (T254536) (duration: 00m 57s)
  • 15:09 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part I out of III (T254536) (duration: 00m 59s)
  • 15:05 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:53 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•šβ˜• sudo cumin A:mw-canary 'enable-puppet "cdanis deploying I25ab44c1 T252605"'
  • 14:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:48 papaul: powering down ms-be2016 for BBU replacement
  • 14:47 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•šβ˜• sudo cumin A:mw-canary 'disable-puppet "cdanis deploying I25ab44c1 T252605"'
  • 14:41 moritzm: upgrading mw API servers in codfw to PHP 7.2.31
  • 14:00 jbond42: updating puppet-merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/602738/4
  • 13:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:50 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update mitigations for T250887 (duration: 00m 57s)
  • 13:41 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 12:23 XioNoX: repool codfw - T243080
  • 12:18 XioNoX: rollback cr2-codfw vrrp/ospf/bgp changes - T243080
  • 12:18 marostegui: Compress InnoDB on db2094:3311 T254462
  • 12:09 XioNoX: cr2-codfw> request chassis routing-engine master switch - T243080
  • 12:05 XioNoX: reboot cr2-codfw:re0 (backup) - T243080
  • 11:53 XioNoX: cr2-codfw> request chassis routing-engine master switch - T243080
  • 11:53 moritzm: restarting dnsdist on malmok
  • 11:53 marostegui: Deploy schema change on s3 - T251188
  • 11:49 XioNoX: reboot cr2-codfw:re1 (backup) - T243080
  • 11:45 moritzm: restarting slapd on ldap-corp* for Gnu TLS security update
  • 11:43 moritzm: rolling restart of Apache on Kibana/7 host to pick up Gnu TLS security update
  • 11:41 XioNoX: de-pref cr2-codfw OSPF - T243080
  • 11:39 XioNoX: deactivate cr2-codfw transit/peering - T243080
  • 11:38 XioNoX: fail vrrp master from cr2 to cr1 - T243080
  • 11:32 XioNoX: cr1-codfw set OSPF metrics back to normal - T243080
  • 11:30 XioNoX: cr1-codfw re-enable transit/peering - T243080
  • 11:29 XioNoX: cr1-codfw add graceful-restart - T243080
  • 11:28 XioNoX: cr1-codfw add graceful-switchover - T243080
  • 11:18 Lucas_WMDE: EU SWAT done
  • 11:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove Wikibase idBlacklist setting (T254686), part 2 (duration: 00m 56s)
  • 11:15 XioNoX: cr1-codfw> request chassis routing-engine master switch - T243080
  • 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Remove Wikibase idBlacklist setting (T254686), part 1 (duration: 00m 56s)
  • 11:11 XioNoX: reboot cr1-codfw:re0 (backup) - T243080
  • 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable GrowthExperiments guidance everywhere behind feature flag (T253794) (duration: 00m 57s)
  • 11:05 marostegui: Install events on es1 T254689
  • 11:05 XioNoX: install Junos on cr1-codfw:re0 (backup) - T243080
  • 10:56 XioNoX: do cr1-codfw RE mastership switch - T243080
  • 10:53 XioNoX: reboot cr1-codfw:re1 (backup) - T243080
  • 10:46 XioNoX: install Junos on cr1-codfw:re1 (backup) - T243080
  • 10:43 XioNoX: deactivate cr1-codfw transit/peering - T243080
  • 10:41 XioNoX: bump all cr1-codfw OSPF metrics - T243080
  • 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (603408) (duration: 00m 57s)
  • 10:40 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (603408) (duration: 01m 09s)
  • 10:39 XioNoX: depool codfw - T243080
  • 09:46 moritzm: installing gnutls28 security updates on buster (older releases not affected)
  • 09:32 qchris: Turning on puppet on gerrit1002 again to avoid starting to lag too far behind
  • 08:17 XioNoX: push T250136 to eqsin - T250136
  • 08:09 XioNoX: push T250136 to eqiad - T250136
  • 08:07 moritzm: upgrading mw1349-mw1383 to PHP 7.2.31
  • 08:07 mutante: stat1006 moved broken jupyter-dedcode-singleuser.service out of /run/systemd/transient. systemctl reset-failed
  • 08:02 XioNoX: push T250136 to codfw - T250136
  • 07:58 XioNoX: push T250136 to eqord/eqdfw - T250136
  • 07:58 mutante: stat1006 bash[40607]: /bin/bash: line 0: exec: jupyterhub-singleuser: not found
  • 07:57 mutante: ran puppet on all stat* hosts for an access request (dcipoletti was added) - stat1006 systemd state broke right after, jupyter-dedcode-singleuser.service failed
  • 07:46 XioNoX: push T250136 to esams/knams - T250136
  • 07:42 XioNoX: cr4-ulsfo protocols bgp group Transit4 family inet any -> unicast - T250136
  • 07:39 XioNoX: cr3-ulsfo protocols bgp group Transit4 family inet any -> unicast - T250136
  • 07:37 moritzm: installing nodejs security updates
  • 07:05 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 T249188
  • 05:22 marostegui: Upgrade db1077 to 10.4.13 to test events memory leak
  • 04:45 _joe_: de-firewalling mc1029
  • 04:27 _joe_: firewallingf off memcached on mc1029

2020-06-05

  • 16:45 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0 (duration: 00m 11s)
  • 16:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0
  • 16:29 bd808: Testing stashbot following hard restart of service. It was having LDAP connection failure problems.
  • 16:00 AndyRussG: Turned off Fundraising job recurring_smashpig_charge
  • 15:54 cdanis: enabling & rerunning puppet on netflow* T254574
  • 15:39 cdanis: disabling puppet on netflow* and trying I6598d8f8 on netflow3001 first T254574
  • 15:39 cdanis: disabling puppet on netflow* and trying I6598d8f8 on netflow3001 first
  • 13:33 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:18 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 13:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:55 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Hotfix for be-tarask interwiki link being broken (T111853) (duration: 01m 00s)
  • 12:41 mutante: rebooting gerrit1002 to add more vCPUs, after [ganeti1009:~] $ sudo gnt-instance modify -B vcpus=8 gerrit1002.wikimedia.org T239151
  • 12:20 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 12:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 12:17 akosiaris: update blubberoid changeprop changeprop-jobqueue citoid cxserver wikifeeds zotero in staging to latest charts
  • 12:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 12:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:17 akosiaris: fix typo in ganeti2016 /etc/network/interfaces and reboot
  • 11:28 akosiaris: master-failover from ganeti2001 to ganeti2019 for ganeti01.svc.codfw.wmnet
  • 11:25 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:25 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:14 mutante: running puppet on all ganeti nodes
  • 11:05 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
  • 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 09:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:25 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 09:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:44 akosiaris: reimage ganeti2016 for stretch
  • 08:42 akosiaris: migrate mx2001.wikimedia.org to new ganeti nodes
  • 08:40 akosiaris: migrate acrab to new ganeti nodes
  • 08:38 akosiaris: failover master IP from ganeti1003 to ganeti1009
  • 08:37 akosiaris: empty ganeti100{1,2,3,4}. Move all VMs to new ganeti nodes
  • 08:28 akosiaris: migrate seaborgium.wikimedia.org to new ganeti nodes
  • 08:27 akosiaris: migrate etherpad1002 to new ganeti nodes
  • 08:11 marostegui: Upgrade db2075 to 10.1.45
  • 07:52 vgutierrez: rolling restart of ats-tls - T249335
  • 07:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 06:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:17 elukey@cumin1001: START - Cookbook sre.hosts.downtime

2020-06-04

  • 23:45 catrope@deploy1001: Synchronized wmf-config/mc.php: Set coalesceKeys=non-global for WANCache on enwiki (duration: 00m 59s)
  • 23:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Minerva site notices on Wikivoyage wiis (T254391) (duration: 00m 58s)
  • 23:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set guwiki timezone to Asia/Kolkata (T253827) (duration: 00m 57s)
  • 23:17 catrope@deploy1001: Synchronized static/images/: Change logo for zhwiki (T254467) (duration: 01m 00s)
  • 22:56 ryankemper: re-enabled puppet on `cloudelastic1006`. All `cloudelastic` instances now have puppet enabled and are in sync
  • 20:56 ryankemper: enabled puppet on `cloudelastic1005` in order to kick off a puppet run and verify that this new node joins the ES cluster properly
  • 20:39 ryankemper: disabled puppet on `cloudelastic100[5,6]` which are two racked nodes that we are now bringing into service. Will re-enable after successful puppet-merge / elasticsearch cluster join
  • 20:38 ryankemper: disabled puppet on `cloudelastic100[5,6]` which are two racked nodes that we are now bringing into service. Will re-enable after successful puppet-merge / elasticsearch cluster join
  • 19:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.35
  • 15:12 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1004.eqiad.wmnet
  • 15:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:10 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:36 moritzm: installing libexif security updates on jessie
  • 14:08 moritzm: installing clamav security updates on mendelevium (ticket.wikimedia.org)
  • 14:00 qchris: Stopping puppet on gerrit1002 (gerrit-test) to run tests for Gerrit upgrade
  • 13:41 moritzm: bounced ferm on ms-be1023
  • 13:35 moritzm: installing exim security updates on jessie (stretch/buster already done)
  • 12:54 urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: c06e720: Revert "wgNamespaceRobotPolicies: thwiki: Add 100 NS to noindex" (T253574) (duration: 01m 06s)
  • 12:18 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:14 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:02 moritzm: upgrading mw1276 to PHP 7.2.31
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11396 and previous config saved to /var/cache/conftool/dbconfig/20200604-115933-marostegui.json
  • 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ec07467: wgNamespaceRobotPolicies: thwiki: Add 100 NS to noindex (T253574) (duration: 01m 15s)
  • 11:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 338cb90: 1ade16f: Change $wgNamespaceRobotPolicies on Thai wikis (T253578; T253577; T253576; T253575; T253574) (duration: 01m 07s)
  • 11:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11395 and previous config saved to /var/cache/conftool/dbconfig/20200604-114149-marostegui.json
  • 11:29 marostegui: Compress InnoDB on db1091 before pooling it as new slave on s1 - T254462
  • 11:21 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [metawiki] Add `centralauth-rename` to WMF OIT staff - T254372 (duration: 01m 08s)
  • 11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:04 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:59 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 10:53 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:53 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
  • 10:46 marostegui: Deploy schema change on s3 (only testwiki) eqiad - T238966
  • 10:42 marostegui: Deploy schema change on s3 (only testwiki) codfw - T238966
  • 10:41 jbond42: deployed new version of puppet-merge revert is https://gerrit.wikimedia.org/r/c/operations/puppet/+/602329
  • 09:57 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:56 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:55 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:50 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:50 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:50 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:46 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:42 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 09:42 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:41 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
  • 09:41 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:26 moritzm: rolling restart of cassandra on maps* to pick up Java security updates
  • 09:09 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:04 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:03 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:03 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:03 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:03 moritzm: deploying Java security updates on elastic search nodes
  • 09:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:50 marostegui: Repool labsdb1009 after running maintain-views T252219
  • 08:42 moritzm: restarting archiva to pick up Java security updates
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to clone db1091 on s1 T253217', diff saved to https://phabricator.wikimedia.org/P11392 and previous config saved to /var/cache/conftool/dbconfig/20200604-081545-marostegui.json
  • 08:14 marostegui: Run sudo /usr/local/sbin/maintain-views --all-databases --replace-all on labsdb1009 - T252219
  • 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:45 marostegui: Depool labsdb1009 - T252219
  • 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:33 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=labweb,service=labweb-ssl
  • 07:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=cloudceph,service=cloudceph
  • 06:52 mutante: mwmaint1002 started mediawiki_job_cirrus_build_completion_indices_eqiad.service
  • 06:06 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: name=logstash200.*
  • 06:05 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: name=logstash100.*
  • 06:04 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=eventschemas,service=eventschemas
  • 06:02 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch.*
  • 06:01 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch
  • 06:00 _joe_: fixing weights of cp2040 T245594
  • 05:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:36 reedy@deploy1001: Synchronized php-1.35.0-wmf.35/includes/specials/SpecialUserrights.php: T254417 T251534 (duration: 01m 06s)

2020-06-03

  • 23:08 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: T249834 (duration: 01m 06s)
  • 23:06 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: T249834 (duration: 01m 06s)
  • 22:22 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 21:54 jforrester@deploy1001: rebuilt and synchronized wikiversions files: Re-rolling group1 to 1.35.0-wmf.35 for T253023
  • 21:49 jforrester@deploy1001: Synchronized php-1.35.0-wmf.35/extensions/EventStreamConfig/includes/ApiStreamConfigs.php: T254390 ApiStreamConfigs: If the 'constraints' parameter is unset, don't explode (duration: 01m 06s)
  • 21:43 cstone: civicrm revision changed from 63508b01b9 to 11b0e7c7e5
  • 21:16 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 21:15 ryankemper: The previously ran `_cluster/reroute?retry_failed=true` command worked as intended, the two shards in question have recovered and we're back to green cluster status. We're now in a known state and ready to proceed with the eqiad rolling upgrade
  • 21:13 ryankemper: Ran `curl -X POST "https://localhost:9243/_cluster/reroute?pretty&retry_failed=true&explain=true" -H 'Content-Type: application/json' -d '{}' --insecure` via the ssh tunnel `ssh bast4002.wikimedia.org -L 9243:search.svc.eqiad.wmnet:9243 -L 9443:search.svc.eqiad.wmnet:9443 -L 9643:search.svc.eqiad.wmnet:9643`, two unassigned shards are now initializing
  • 21:05 ryankemper: Elasticsearch Eqiad was in yellow cluster status before starting the above cookbook run (therefore the run was a no-op until I ctlr+C'd), going to try unsticking the two unassigned shards via `/_cluster/reroute?retry_failed=true`
  • 21:03 ryankemper@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 20:58 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 20:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
  • 20:49 eileen: civicrm revision changed from eb156dffa4 to 63508b01b9, config revision is 95dcdb0a8a
  • 20:47 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 20:19 gehel: elasticsearch cluster restart stopped
  • 20:18 ryankemper@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
  • 19:35 ppchelko@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:35 ppchelko@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:33 ppchelko@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:32 ppchelko@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:30 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
  • 19:29 ppchelko@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:29 ppchelko@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:20 jforrester@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to wmf.34 T253023
  • 19:16 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 19:15 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 19:14 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.35 (duration: 01m 05s)
  • 19:13 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.35
  • 19:05 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: T32405 Stop special casing the main page on another 47 projects (duration: 01m 08s)
  • 19:01 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601843 Enable talk pages on Swedish Minerva (duration: 01m 08s)
  • 18:59 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 18:56 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 18:55 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 601842 - Disable growth survey (duration: 01m 06s)
  • 18:49 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 596277 Use AddFooterLink hook for code of conduct and contact links (duration: 01m 05s)
  • 18:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 599150 - enable kafka purges for group0 (duration: 01m 06s)
  • 18:19 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. CS.php (duration: 01m 05s)
  • 18:14 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit 570396 - enable kask-session everywhere. IS.php (duration: 01m 06s)
  • 17:15 ejegg: updated payments-wiki from e46114d8b1 to c1d14a5db7
  • 17:08 elukey: ganeti: gnd-instance reboot an-launcher1001 to get new memory settings - T254125
  • 15:21 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:19 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:50 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 in pc3 after reimaging T252182 (duration: 01m 06s)
  • 14:47 moritzm: updated grafana on cloudmetrics* to 6.7.4
  • 14:26 kormat: stopping replication on pc1010
  • 14:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:16 gehel: cleaning commonsrdf-dumps cron entry manually on snapshot1008
  • 14:00 hashar: Restarted CI Jenkins for plugin update
  • 13:59 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Replace pc1009 with pc1010 reimaging T252182 (duration: 01m 06s)
  • 13:47 kormat: reimaging *pc1009 (promise) to buster T252182
  • 13:44 kormat: reimaging pc1007 to buster, wish me luck T252182
  • 13:20 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 13:13 kormat@deploy1001: Synchronized wmf-config/db-codfw.php: Put pc2009 back into pc3 after reimaging T252182 (duration: 01m 05s)
  • 13:01 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P11385 and previous config saved to /var/cache/conftool/dbconfig/20200603-120136-marostegui.json
  • 11:57 moritzm: updating linux-libc-dev on stretch and buster hosts
  • 11:56 XioNoX: configure management-instance on cr1/2-eqiad - T247073
  • 11:51 XioNoX: configure management-instance on cr2-codfw - T247073
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2124 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P11384 and previous config saved to /var/cache/conftool/dbconfig/20200603-114409-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P11383 and previous config saved to /var/cache/conftool/dbconfig/20200603-114351-marostegui.json
  • 11:31 Lucas_WMDE: EU SWAT done
  • 11:30 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php eswiki --fix | tee T254077.fix
  • 11:29 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php eswiki | tee T254077.dry-run
  • 11:27 moritzm: installing rubygems-integration updates for Buster
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [eswiki] Normalize talk namespaces for Anexo, Portal and Wikiproyecto (T254077) (duration: 01m 03s)
  • 11:25 moritzm: install brltty updates on Buster
  • 11:23 XioNoX: configure management-instance on cr1-codfw - T247073
  • 11:19 XioNoX: configure management-instance on cr1-eqsin - T247073
  • 11:15 moritzm: installing python-oslo.utils security updates
  • 11:12 XioNoX: remove unused logical-systems from all MX204 routers - T247073
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P11382 and previous config saved to /var/cache/conftool/dbconfig/20200603-111055-marostegui.json
  • 11:08 marostegui: Add rev_id to revision table on db2124 - T238966
  • 11:05 moritzm: installing pango updates for buster
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 - will be reimaged and moved to s1 T252512', diff saved to https://phabricator.wikimedia.org/P11381 and previous config saved to /var/cache/conftool/dbconfig/20200603-104251-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P11380 and previous config saved to /var/cache/conftool/dbconfig/20200603-101426-marostegui.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P11378 and previous config saved to /var/cache/conftool/dbconfig/20200603-093810-marostegui.json
  • 09:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:10 moritzm: upgrading mw1262-1265 to PHP 7.2.31
  • 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:04 marostegui: Reimage db1080
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for reimage', diff saved to https://phabricator.wikimedia.org/P11376 and previous config saved to /var/cache/conftool/dbconfig/20200603-090143-marostegui.json
  • 08:42 kormat@deploy1001: Synchronized wmf-config/db-codfw.php: Replace pc2009 with pc2010 while reimaging (duration: 01m 16s)
  • 08:19 moritzm: upgrading mw1261 to PHP 7.2.31
  • 08:17 XioNoX: re-add ae2 physical interfaces to external group - T253970
  • 08:09 moritzm: upgrading remaining mwdebug* servers to PHP 7.2.31
  • 08:08 kormat: reimaging pc2009 to buster T252182
  • 08:08 XioNoX: remove ae2 physical interfaces from external group - T253970
  • 07:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:44 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw218[0-6].codfw.wmnet
  • 07:36 mutante: depooling mw2180 - mw2186
  • 07:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw218[0-6].codfw.wmnet
  • 07:35 moritzm: imported PHP 7.2.31 to apt.wikimedia.org/component/php72
  • 07:33 ema: cp: upgrade purged to 0.15
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071 after cloning it from db2130 to restore all the schema changes applied', diff saved to https://phabricator.wikimedia.org/P11375 and previous config saved to /var/cache/conftool/dbconfig/20200603-072841-marostegui.json
  • 07:15 XioNoX: repool esams - T254021
  • 07:09 XioNoX: re-activate peering/transit BGP on cr2-esams - T254021
  • 07:00 XioNoX: re0.cr2-esams> request system reboot both-routing-engines - T254021
  • 06:56 XioNoX: deactivate peering/transit BGP cr2-esams - T244497
  • 06:54 XioNoX: failover vrrp to cr3-esams - T244497
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1138 T253808', diff saved to https://phabricator.wikimedia.org/P11374 and previous config saved to /var/cache/conftool/dbconfig/20200603-063752-marostegui.json
  • 06:18 XioNoX: cr3-esams> request chassis routing-engine master switch - T244497
  • 06:11 XioNoX: cr3-esams> request vmhost reboot re1 (backup re) - T244497
  • 06:08 XioNoX: re-activate transit BGP to cr3-knams - T254021
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138 T253808', diff saved to https://phabricator.wikimedia.org/P11373 and previous config saved to /var/cache/conftool/dbconfig/20200603-060124-marostegui.json
  • 05:58 XioNoX: reboot cr3-knams - T254021
  • 05:51 XioNoX: deactivate transit BGP ton cr3-knams - T254021
  • 05:48 XioNoX: depool esams - T254021
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130 to clone db2071', diff saved to https://phabricator.wikimedia.org/P11371 and previous config saved to /var/cache/conftool/dbconfig/20200603-054117-marostegui.json
  • 05:40 marostegui: Stop MySQL on db2130 to clone db2071
  • 05:38 XioNoX: deactivate graceful-switchover on cr3-esams - T254021
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138 T253808', diff saved to https://phabricator.wikimedia.org/P11370 and previous config saved to /var/cache/conftool/dbconfig/20200603-053748-marostegui.json
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:14 XioNoX: turn cr1-codfw:fpc0 online - T254110
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138 T253808', diff saved to https://phabricator.wikimedia.org/P11369 and previous config saved to /var/cache/conftool/dbconfig/20200603-050911-marostegui.json
  • 01:00 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: wgLocalVirtualHosts (duration: 01m 06s)
  • 00:59 krinkle@deploy1001: Synchronized wmf-config/mc.php: Ic27b60 (duration: 01m 11s)

2020-06-02

  • 23:58 ejegg: updated fundraising CiviCRM from 657c4b9455 to eb156dffa4
  • 23:55 ejegg: updated payments-wiki from 1942a537ef to e46114d8b1
  • 22:48 cstone: civicrm revision changed from d1cd99166f to 657c4b9455
  • 21:48 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: laaaaabs (duration: 01m 05s)
  • 21:23 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: beta apiportalwiki T254185 (duration: 01m 06s)
  • 21:21 reedy@deploy1001: Synchronized wmf-config/config/apiportalwiki.yaml: beta apiportalwiki T254185 (duration: 01m 05s)
  • 21:20 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta apiportalwiki T254185 (duration: 01m 05s)
  • 21:19 reedy@deploy1001: Synchronized wikiversions-labs.json: beta apiportalwiki T254185 (duration: 01m 05s)
  • 21:17 reedy@deploy1001: Synchronized dblists/all-labs.dblist: beta apiportalwiki T254185 (duration: 01m 06s)
  • 21:12 cdanis: repooled wtp1032 T254258
  • 20:24 reedy@deploy1001: Synchronized composer.lock: Update (duration: 01m 06s)
  • 20:02 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.35
  • 19:59 jforrester@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.35 (duration: 93m 52s)
  • 18:25 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.35
  • 18:22 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.31 (duration: 19m 59s)
  • 18:05 cdanis: fixing g+w permissions of deploy1001 /srv/mediawiki-staging/php-*/.git/objects/*
  • 17:20 James_F: 1.35.0-wmf.35 was branched at 8d70150 for T253023
  • 16:55 cstone: SmashPig revision changed from 44690f761c to b9de3c7aac
  • 15:57 ejegg: updated payments-wiki from d11efeb1cf to 1942a537ef
  • 15:50 cdanis: thumbor1003 and thumbor1004 blipped, no obvious explanation, logs gathered at P11365 P11366 P11367
  • 15:49 XioNoX: push frack fw rules - T254260
  • 15:48 mutante: contint1001 - rm -rf /mnt/docker (T224591)
  • 15:45 mutante: contint1001 - restarting docker afer changed data-root path (T224591)
  • 15:37 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=wtp1032.*
  • 15:35 cdanis: power cycling wtp1032 which is bootlooping? https://phabricator.wikimedia.org/P11364
  • 15:31 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:24 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor100[34].*
  • 15:23 XioNoX: repool codfw - T254216
  • 15:19 XioNoX: rollback ospf changes - T254216
  • 15:09 hnowlan@deploy1001: Finished deploy [cpjobqueue/deploy@8a53ff1]: (no justification provided) (duration: 02m 33s)
  • 15:07 XioNoX: reboot cr1-codfw:fpc5 - T254216
  • 15:06 hnowlan@deploy1001: Started deploy [cpjobqueue/deploy@8a53ff1]: (no justification provided)
  • 15:05 hnowlan: shifting all high traffic cpjobqueue rules to k8s
  • 14:57 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:57 XioNoX: depref ulsfo-codfw link - T254216
  • 14:51 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:50 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:49 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 XioNoX: prefer eqsin-ulsfo tunnel - T254216
  • 14:47 cdanis@cumin1001: conftool action : set/pooled=no; selector: name=thumbor100[34].*
  • 14:38 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:31 XioNoX: depool codfw - T254216
  • 14:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 13:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:28 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:19 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 13:18 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:18 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 13:18 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:05 cdanis@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/ContribsPager.php: revert contribs limit to 5000 T234450 (duration: 00m 57s)
  • 13:04 cdanis@deploy1001: Synchronized php-1.35.0-wmf.32/includes/specials/pagers/ContribsPager.php: revert contribs limit to 5000 T234450 (duration: 00m 57s)
  • 13:03 cdanis@deploy1001: Synchronized php-1.35.0-wmf.34/includes/specials/pagers/ContribsPager.php: revert contribs limit to 5000 T234450 (duration: 00m 58s)
  • 12:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:56 cdanis@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: 5debc3223 limit per-user Special:Contributions concurrency to 2 T234450 (duration: 00m 58s)
  • 12:50 kormat@cumin1001: dbctl commit (dc=all): 'Pool db2140 into s4 T252985', diff saved to https://phabricator.wikimedia.org/P11363 and previous config saved to /var/cache/conftool/dbconfig/20200602-125012-kormat.json
  • 12:39 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 12:31 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw217[3-9].codfw.wmnet
  • 12:30 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2110, copy to db2140 complete T252985', diff saved to https://phabricator.wikimedia.org/P11362 and previous config saved to /var/cache/conftool/dbconfig/20200602-123020-kormat.json
  • 12:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw217[3-9].codfw.wmnet
  • 11:10 kart_: Finished EU Mid-day SWAT.
  • 11:08 mutante: contint1001 - common issue after reinstalls again - a2dismod mpm_event ; systemctl restart apache2 ; puppet agent -tv ( T196968) https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206
  • 11:07 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 601174|Create URL campaign for African languages for COVID-19 translation project (T253305) (duration: 01m 00s)
  • 11:01 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 10:48 mutante: LDAP - added uid=lulu to group nda (T254121)
  • 10:29 akosiaris: switch over ores1XXX hosts to redis::misc from oresrdb hosts. T254226
  • 10:12 jynus: disable non-global root login to gerrit2001 T254162
  • 10:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1121, db1148 T252512', diff saved to https://phabricator.wikimedia.org/P11361 and previous config saved to /var/cache/conftool/dbconfig/20200602-101150-marostegui.json
  • 10:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:09 akosiaris: switch over ores2XXX hosts to redis::misc from oresrdb hosts. T254226
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121, db1148 T252512', diff saved to https://phabricator.wikimedia.org/P11360 and previous config saved to /var/cache/conftool/dbconfig/20200602-100246-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121, db1148 T252512', diff saved to https://phabricator.wikimedia.org/P11359 and previous config saved to /var/cache/conftool/dbconfig/20200602-095321-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11358 and previous config saved to /var/cache/conftool/dbconfig/20200602-094914-marostegui.json
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121, db1148 T252512', diff saved to https://phabricator.wikimedia.org/P11357 and previous config saved to /var/cache/conftool/dbconfig/20200602-094441-marostegui.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1148 to dbctl depooled T252512', diff saved to https://phabricator.wikimedia.org/P11356 and previous config saved to /var/cache/conftool/dbconfig/20200602-093841-marostegui.json
  • 08:59 ema: upload purged 0.15 to buster-wikimedia
  • 08:09 mutante: re-imaging contint1001 with buster
  • 07:43 marostegui: Stop MySQL on db1121
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone db1148', diff saved to https://phabricator.wikimedia.org/P11353 and previous config saved to /var/cache/conftool/dbconfig/20200602-074027-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079 after data check', diff saved to https://phabricator.wikimedia.org/P11351 and previous config saved to /var/cache/conftool/dbconfig/20200602-073245-marostegui.json
  • 07:22 marostegui: Stop slave on db1079 for data check
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for data check', diff saved to https://phabricator.wikimedia.org/P11350 and previous config saved to /var/cache/conftool/dbconfig/20200602-072214-marostegui.json
  • 07:06 marostegui: Stop MySQL and poweroff on db1138 for on-site maintenance - T253808
  • 05:01 marostegui: Stop mysql on db1141 to save a binary backup - T249188
  • 01:03 krinkle@deploy1001: Synchronized wmf-config/mc.php: I06897bcc92c5 (duration: 00m 59s)

2020-06-01

  • 20:14 shdubsh: downgrade mtail to rc5 in ulsfo -- T254192
  • 20:12 XioNoX: enable IX4/6 on cr4-ulsfo - T237575
  • 19:57 XioNoX: disable IX4/6 on cr4-ulsfo - T237575
  • 19:55 XioNoX: fail vrrp over cr3-ulsfo - T237575
  • 19:44 shdubsh: restart atsmtail in eqsin
  • 18:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable kask-transition for all wikis (duration: 01m 00s)
  • 17:59 XioNoX: offline cr1-codfw:fpc0 - T254110
  • 17:47 XioNoX: turn online cr1-codfw:fpc0 - T254110
  • 17:46 shdubsh: update mtail in ulsfo caching hosts. restarting atsmtail and varnishmtail
  • 17:31 mutante: backup1001 - queued job 42 - gerrit backup after renaming of the file set and addition of LFS data (T254155, T254162) it is incremental, the full one already ran
  • 16:49 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging - fix searchsatisfaction schema URI - testwiki only - T249261 (duration: 00m 59s)
  • 16:48 otto@deploy1001: sync-file aborted: EventLogging - fix searchsatisfaction schema URI - testwiki only - T249261 (duration: 00m 02s)
  • 16:39 bstorm_: running view updates on db1141 T252219
  • 14:53 elukey: ganeti: increase memory available for an-launcher1001 from 8g to 12g - T254125
  • 14:44 volans: deploying ulsfo mgmt DNS records automatically generated by Netbox ( operations/dns/+/585545/ ) - T233183
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1142, db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11345 and previous config saved to /var/cache/conftool/dbconfig/20200601-120000-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142, db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11344 and previous config saved to /var/cache/conftool/dbconfig/20200601-114440-marostegui.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142, db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11343 and previous config saved to /var/cache/conftool/dbconfig/20200601-113032-marostegui.json
  • 10:49 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (601328) (duration: 00m 59s)
  • 10:48 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (601328) (duration: 01m 03s)
  • 09:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:30 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:26 jynus: reenabling puppet on all db/es/pc hosts after deploy of gerrit:599596
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142, db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11342 and previous config saved to /var/cache/conftool/dbconfig/20200601-092220-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1147 to dbctl, depooled T252512', diff saved to https://phabricator.wikimedia.org/P11341 and previous config saved to /var/cache/conftool/dbconfig/20200601-091809-marostegui.json
  • 09:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:05 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:05 XioNoX: offline cr1-codfw:fpc0 - T254110
  • 09:05 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:03 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:58 godog: prometheus eqiad lvextend --resizefs --size +100G vg-ssd/prometheus-ops
  • 08:43 mutante: deneb - apt-get remove --purge apt-listchanges (packages was in status "rc" causing DPKG alert, should be removed but config was not purged)
  • 08:41 mutante: deneb - apt-get remove python3-debconf (package was in status "ri" causing DPKG icinga alert. ri means it should be removed but is not)
  • 08:33 XioNoX: restart cr1-codfw:fpc0 - T254110
  • 08:22 mutante: mw1331 re-enabled puppet (SAL told me about an experiment a little while ago)
  • 08:19 jynus: disabling puppet on all db/es/pc hosts for deploy of gerrit:599596
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 to clone db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11339 and previous config saved to /var/cache/conftool/dbconfig/20200601-070519-marostegui.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool enwiki db2071 slave to test new index - T238966', diff saved to https://phabricator.wikimedia.org/P11338 and previous config saved to /var/cache/conftool/dbconfig/20200601-050354-marostegui.json
  • 04:54 marostegui: Drop testreduce_0715 from m5 master T245408
  • 04:44 marostegui: Depool db1141 from Analytics role - T249188

2020-05-31

  • 09:56 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Vox Golf' 'Colonel Chicken' (T254068)

2020-05-29

  • 22:32 bstorm_: updated views on labsdb1010 T252219
  • 20:55 bstorm_: updating views on labsdb1011 T252219
  • 19:27 ryankemper: Successfully finished a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks re-enabled.
  • 17:28 bstorm_: updating views on labsdb1009 T252219
  • 16:50 ryankemper: Performing a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks disabled.
  • 16:00 bstorm_: Updating views on labsdb1012 T252219
  • 15:59 ryankemper: Concluded rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade. Both hosts `relforge1001` and `relforge1002` are back up. Downtime lifted.
  • 15:29 ryankemper: Performing a rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade
  • 14:59 cdanis: disabling puppet on netflow* to deploy Ic71e96f0 T253128
  • 14:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:41 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:41 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:35 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:35 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:15 mdholloway: ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on commonswiki (T253821)
  • 12:49 hnowlan: reimaging restbase2009 after disk replacement
  • 12:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:15 godog: roll-restart to upgrade thanos to 0.13.0rc0 - T252186 T233956
  • 11:32 moritzm: installing cups security updates (client-side libs/tools)
  • 11:01 ema: upload prometheus-rdkafka-exporter 0.2 to buster-wikimedia T253551
  • 10:53 moritzm: updating mwdebug2002 to 7.2.31
  • 10:02 marostegui: Compress InnoDB on db1138 T232446
  • 08:30 godog: update swift uid/gid on thanos hosts - T123918
  • 08:04 mutante: phabricator - restarted apache2 - back for me now
  • 08:03 XioNoX: add new AMS-IX link to LACP bundle
  • 08:01 mutante: phabricator - broken due to "PhabricatorRepositoryMirrorEngine::pushToGitRepository" starting git process that uses 100% CPU, stopped phd service
  • 07:56 mutante: phabricator - killed pid 25070 (git) which used 100% of CPU, restarted phd service
  • 07:25 moritzm: updating perf on buster systems to new version from 10.4 point release
  • 07:15 moritzm: installing el-api update from latest Buster point release
  • 07:12 moritzm: installing xdg-utils update from latest Buster point release
  • 07:11 mutante: mw1293 (canary jobrunner ) replace apache2.conf with version from mwdebug1001, restart apache, to debug for T190111
  • 07:00 moritzm: installing rake security updates
  • 06:36 mutante: deneb - systemctl start docker-reporter-releng-images
  • 05:20 marostegui: Deploy schema change on db1138 (no longer s4 master) - T250055
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1081 to s4 master and remove read-only from s4 T253808', diff saved to https://phabricator.wikimedia.org/P11334 and previous config saved to /var/cache/conftool/dbconfig/20200529-050224-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T253808', diff saved to https://phabricator.wikimedia.org/P11333 and previous config saved to /var/cache/conftool/dbconfig/20200529-050153-marostegui.json
  • 05:00 marostegui: Starting s4 failover from db1138 to db1081 -T253808
  • 04:25 marostegui: Start topology changes in s4 - T253808

2020-05-28

  • 23:48 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/skins/Vector/resources/skins.vector.styles/Menu.less: T253912 Hotfix: Cannot rename emptyPortlet to empty-portlet yet (duration: 00m 59s)
  • 22:41 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/WikibaseMediaInfo/src/Services/FilePageLookup.php: T253792 Follow-up 1827c7a: Ensure inNamespace() is called only on Title object (duration: 00m 58s)
  • 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T253821 Update MachineVision block list for 2020-05-27 (duration: 00m 57s)
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Move one CheckUser right change next to the other (duration: 00m 57s)
  • 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove version wrapper around wgOverrideUcfirstCharacters; always true (duration: 00m 59s)
  • 21:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.34
  • 21:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/filerepo/FileRepo.php: T253922 Mark two FileRepo functions public (duration: 01m 07s)
  • 21:12 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/specials/SpecialUserrights.php: T253909 Restore visibility (previously implicitely public) (duration: 01m 06s)
  • 20:38 jforrester@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/resources/skins.vector.styles: T253905 HOTFIX: Do not apply p-personal absolute positioning to all menus (duration: 01m 07s)
  • 20:22 shdubsh: restart varnishmtail and atsmtail eqsin
  • 20:11 shdubsh: restart ncredirmtail on ncredir5001
  • 19:20 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back the train due to T253905
  • 19:20 twentyafterfour: group2 back to wmf.32 due to T253905
  • 19:20 milimetric@deploy1001: Finished deploy [analytics/refinery@f6d73c8] (thin): Hotfix #2 today (thin): forgot jars [analytics/refinery@f6d73c8] (duration: 00m 09s)
  • 19:20 milimetric@deploy1001: Started deploy [analytics/refinery@f6d73c8] (thin): Hotfix #2 today (thin): forgot jars [analytics/refinery@f6d73c8]
  • 19:17 milimetric@deploy1001: Finished deploy [analytics/refinery@f6d73c8]: Hotfix #2 today: forgot jars [analytics/refinery@f6d73c8] (duration: 16m 54s)
  • 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.34 refs T253022
  • 19:01 shdubsh: restart varnishmtail and atsmtail on cp5001.eqsin.wmnet
  • 19:00 milimetric@deploy1001: Started deploy [analytics/refinery@f6d73c8]: Hotfix #2 today: forgot jars [analytics/refinery@f6d73c8]
  • 17:03 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.34 refs T253022 (duration: 01m 06s)
  • 17:02 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.34 refs T253022
  • 16:32 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/Wikibase: T253804 Use ThrowingEntityTermStoreWriter when writers shouldn't be called (duration: 01m 15s)
  • 15:37 milimetric@deploy1001: Finished deploy [analytics/refinery@203d182] (thin): Three hotfixes (THIN) [analytics/refinery@203d182] (duration: 00m 10s)
  • 15:37 milimetric@deploy1001: Started deploy [analytics/refinery@203d182] (thin): Three hotfixes (THIN) [analytics/refinery@203d182]
  • 15:05 milimetric@deploy1001: Finished deploy [analytics/refinery@203d182]: Three hotfixes [analytics/refinery@203d182] (duration: 25m 59s)
  • 15:02 moritzm: installing exim4 security updates on jessie (stretch/buster already fixed)
  • 14:39 milimetric@deploy1001: Started deploy [analytics/refinery@203d182]: Three hotfixes [analytics/refinery@203d182]
  • 14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:01 ema: atskafka 0.8 uploaded to buster-wikimedia T253551
  • 13:49 godog: roll-restart prometheus k8s-staging to enable thanos upload - T252186
  • 13:36 hashar: Restarting CI Jenkins for plugin rollback
  • 11:49 moritzm: installing unbound security updates
  • 11:03 kormat@cumin1001: dbctl commit (dc=all): 'Add db2138 to s2+s4 T252985', diff saved to https://phabricator.wikimedia.org/P11330 and previous config saved to /var/cache/conftool/dbconfig/20200528-110333-kormat.json
  • 10:36 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 10:34 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 10:30 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 10:02 mutante: gerrit1002 (test server) - chown -R gerrit2:gerrit2 /var/lib/gerrit/review_site ; restarted gerrit service, now the service is not in restart loop anymore, gerrit-ssh is listening too, just not accepting publickey (T239151)
  • 09:51 XioNoX: failover VRRP in ulsfo
  • 09:41 XioNoX: re-activate peering/transit on cr2-eqdfw - T243080
  • 09:35 mutante: restarting gerrit on gerrit1002 after fixing db_pass to the readonly one (T243800)
  • 09:33 XioNoX: restart cr2-eqdfw for upgrade - T243080
  • 09:30 XioNoX: deactivate peering/transit on cr2-eqdfw - T243080
  • 09:25 _joe_: updating ACLs on all etcd servers
  • 09:22 XioNoX: install new Junos on cr2-eqdfw - T243080
  • 09:16 XioNoX: rollback cr2-eqord ospf/bgp - T243080
  • 09:07 XioNoX: restart cr2-eqord for upgrade - T243080
  • 09:05 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 08:50 _joe_: upgrading etcd ACLs (adding new users) to conf1004
  • 08:50 XioNoX: install new Junos on cr2-eqord - T243080
  • 08:46 XioNoX: deactivate peering/transit on cr2-eqord - T243080
  • 08:45 XioNoX: de-pref all OSPF links to cr2-eqord - T243080
  • 08:13 marostegui: Pool db1141 into labsdb analytics role - T249188
  • 07:33 gilles@deploy1001: Synchronized static/images: T252108 Deploying optimised static PNGs (duration: 01m 39s)
  • 07:31 gilles@deploy1001: Synchronized static/apple-touch: T252108 Deploying optimised static PNGs (duration: 01m 12s)
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1081 from API and set its weight to 0 on main traffic - preparation for tomorrow's failover T253808', diff saved to https://phabricator.wikimedia.org/P11329 and previous config saved to /var/cache/conftool/dbconfig/20200528-063037-marostegui.json
  • 04:44 marostegui: Run check_private data on db1141 - T249188
  • 04:22 marostegui: Stop MySQL on db1141 - T249188

2020-05-27

  • 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add autoreviewrestore right to rollbacker group on hiwiki (T252986) (duration: 01m 05s)
  • 23:16 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add thwiki Draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE there (T252959) (duration: 01m 06s)
  • 22:58 gehel@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 22:02 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) (duration: 00m 10s)
  • 22:02 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4)
  • 22:01 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) (duration: 01m 29s)
  • 22:00 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3)
  • 22:00 crusnov@deploy1001: deploy aborted: Netbox Upgrade to 2.8.4 (part2) (duration: 01m 31s)
  • 21:58 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part2)
  • 21:58 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) (duration: 01m 01s)
  • 21:57 crusnov@deploy1001: Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1)
  • 20:43 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 20:28 marostegui: Decrease innodb poolsize on s4 master and restart mysql
  • 20:11 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@9dc827f]: Update mobileapps to b3b9214c (T253648) (duration: 03m 31s)
  • 20:08 mbsantos@deploy1001: Started deploy [mobileapps/deploy@9dc827f]: Update mobileapps to b3b9214c (T253648)
  • 20:04 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.32 refs T253022 (duration: 01m 04s)
  • 20:03 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32 refs T253022
  • 20:00 gehel@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 19:56 twentyafterfour@deploy1001: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 19:46 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/parser/CoreParserFunctions.php: T253725 Partially revert 'Fix impedance mismatch with Parser::getRevisionRecordObject()' (duration: 01m 05s)
  • 19:12 joal@deploy1001: Finished deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train (an-launcher1001 only) [8a3dcb3] (duration: 06m 07s)
  • 19:09 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: T32405 Stop special casing the main page on mobile for twelve wikis (duration: 01m 05s)
  • 19:06 joal@deploy1001: Started deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train (an-launcher1001 only) [8a3dcb3]
  • 19:03 joal@deploy1001: Finished deploy [analytics/refinery@8a3dcb3] (thin): Analytics regular weekly train THIN [8a3dcb3] (duration: 00m 08s)
  • 19:03 joal@deploy1001: Started deploy [analytics/refinery@8a3dcb3] (thin): Analytics regular weekly train THIN [8a3dcb3]
  • 19:03 joal@deploy1001: Finished deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train [8a3dcb3] (duration: 21m 20s)
  • 18:41 joal@deploy1001: Started deploy [analytics/refinery@8a3dcb3]: Analytics regular weekly train [8a3dcb3]
  • 18:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable DiscussionTools as beta on mediawiki.org, part II T251208 (duration: 01m 05s)
  • 17:56 jayme: updated tiller to 2.16.7-wmf1 for all services in kubernetes cluster: eqiad
  • 17:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable DiscussionTools as beta on mediawiki.org T251208 (duration: 01m 05s)
  • 17:42 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 17:40 gehel: repool maps2003
  • 17:32 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/Translate/: Deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Translate/+/599027/ to wmf.34 refs T253748 and T253022 (duration: 01m 07s)
  • 16:55 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:53 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:26 hnowlan@deploy1001: Finished deploy [cpjobqueue/deploy@c8c653e]: Disabling ThumbnailRender as a test of k8s cpjobqueue (duration: 01m 57s)
  • 16:24 hnowlan@deploy1001: Started deploy [cpjobqueue/deploy@c8c653e]: Disabling ThumbnailRender as a test of k8s cpjobqueue
  • 16:10 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:09 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:52 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop defining wmgUsePerformanceInspector, unread T253689 (duration: 01m 04s)
  • 15:52 gehel@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 15:52 jayme: updated tiller to 2.16.7-wmf1 for all services in kubernetes cluster: codfw
  • 15:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop loading PerformanceInspector on any wiki T253689 (duration: 01m 06s)
  • 15:18 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:02 godog: eqiad-prod: decom ms-be101[678] - T252008
  • 14:58 jayme: updated tiller to 2.16.7-wmf1 for all services in cluster: staging
  • 14:58 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:56 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:56 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:54 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:52 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:51 cdanis: cumin1001: upgrading python3-conftool and python3-conftool-dbctl
  • 14:50 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:50 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:46 cdanis: cumin2001: upgrading python3-conftool and python3-conftool-dbctl
  • 14:43 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:43 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:40 cdanis: reprepro: upload conftool_1.3.1-1{,+deb10u1} to {stretch,buster}-wikimedia
  • 14:36 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:36 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:32 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:32 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:30 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11318 and previous config saved to /var/cache/conftool/dbconfig/20200527-141635-marostegui.json
  • 14:13 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:13 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:07 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:07 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:04 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11317 and previous config saved to /var/cache/conftool/dbconfig/20200527-140442-marostegui.json
  • 14:03 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:03 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:58 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:48 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 13:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11316 and previous config saved to /var/cache/conftool/dbconfig/20200527-134704-marostegui.json
  • 13:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 13:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:36 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:34 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 13:34 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:21 gehel: repool maps2004 / depool maps2003
  • 13:21 ema: cp: upgrade purged to 0.14
  • 13:19 marostegui: Kill /usr/local/bin/mwscriptwikiset updateSpecialPages.php s8.dblist --override --only=Fewestrevisions T238199
  • 13:16 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1146:3312, db1146:3314 and db1103:3312, db1103:3314', diff saved to https://phabricator.wikimedia.org/P11313 and previous config saved to /var/cache/conftool/dbconfig/20200527-131515-marostegui.json
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1146:3312 and db1146:3314 to dbctl T252512', diff saved to https://phabricator.wikimedia.org/P11312 and previous config saved to /var/cache/conftool/dbconfig/20200527-130820-marostegui.json
  • 13:06 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:34 Urbanecm: EU SWAT done
  • 11:29 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/GrowthExperiments/: SWAT: 983eda5: Mentorship dialog: Swap panel to ask-help on open (T253692) (duration: 01m 06s)
  • 11:18 ema: cp2027: upgrade purged to 0.14
  • 11:17 ema: purged 0.14 uploaded to buster-wikimedia
  • 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 598678|Enable ContentTranslation in Galician Wikipedia as a default tool (T250355) (duration: 01m 18s)
  • 10:15 hashar: contint2001: starting zuul
  • 10:15 hashar: contint2001: started jenkins
  • 10:03 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown -h jenkins:jenkins {} \;
  • 10:02 mutante: repeated rsync of /var/lib/jenkins with -p ; find /var/lib/jenkins -group bacula -user statsite -exec chown -h jenkins:jenkins {} \;
  • 09:55 hashar: contint2001: starting jenkins
  • 09:54 hashar: contint1001 / contint2001 : deleted obsolete files /var/lib/jenkins/.git and /var/lib/jenkins/jobs/_shared/
  • 09:52 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown -h jenkins:jenkins {} \;
  • 09:51 godog: roll restart prometheus on the fleet to apply I0e2fe8af
  • 09:49 mutante: contint2001 - find /var/lib/jenkins -group bacula -user statsite -exec chown jenkins:jenkins {} \;
  • 09:48 hashar: contint2001: unmasked jenkins and started it
  • 09:42 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
  • 09:42 mutante: switching CI backend from contint1001 to contint2001
  • 09:40 mutante: repeated rsync -avp --delete /var/lib/zuul/ rsync://contint2001.wikimedia.org/ci--var-lib-zuul-
  • 09:40 hashar: contint1001: masked jenkins and zuul
  • 09:39 mutante: repeated rsync -avp --delete /var/lib/jenkins/ rsync://contint2001.wikimedia.org/ci--var-lib-jenkins-
  • 09:39 hashar: Stopping Zuul and Jenkins CI for scheduled maintenance # T224591
  • 09:35 filippo@cumin1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
  • 08:52 hashar: contint1001: find /srv/jenkins/builds/operations-puppet-wmf-style-guide -type f -name '*.tmp' -delete # T253729
  • 08:48 marostegui: Stop MySQL on db1103
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 db1103:3314 to clone db1146 T252512', diff saved to https://phabricator.wikimedia.org/P11308 and previous config saved to /var/cache/conftool/dbconfig/20200527-084713-marostegui.json
  • 08:46 arturo: removing more old packages in labstore1006 (all packages in 'rc' state)
  • 08:43 arturo: running apt-get autoremove on labstore1006
  • 08:42 jynus: starting again db2097 db instances T252492
  • 08:11 jayme: updated admin tiller (namespace: kube-system) to 2.16.7-wmf1 in clusters: staging, codfw, eqiad
  • 08:08 hashar: contint1001 / contint2001 : deleted unused /var/lib/zuul/git (the real one is /srv/zuul/git )
  • 08:02 mutante: contint2001 - chown root:root /var/lib/zuul/git
  • 07:54 XioNoX: test new bird conf on dns4001 - T253666
  • 07:45 hashar: contint2001 also fixing symlink permissions: sudo find /var/lib/jenkins -not -user jenkins -exec chown -h jenkins:jenkins {} +
  • 07:35 mutante: contint2001 - find /var/lib/jenkins -group bacula -user jenkins -exec chown jenkins:jenkins {} \;
  • 07:30 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown jenkins {} \;
  • 07:26 mutante: contint2001 - chown -R zuul:zuul /var/lib/zuul/
  • 07:26 mutante: contint1001:~# rsync -avpz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/
  • 07:25 mutante: contint1001:~# rsync -avp --delete /var/lib/jenkins/ rsync://contint2001.wikimedia.org/ci--var-lib-jenkins-
  • 07:25 mutante: contint1001:~# rsync -avp --delete /var/lib/zuul/ rsync://contint2001.wikimedia.org/ci--var-lib-zuul-
  • 07:18 moritzm: installing bind security updates (only client-side tools/libraries in use)
  • 07:04 elukey: matomo upgraded to 3.13.5 on matomo1001 - T252741
  • 06:57 elukey: update matomo on stretch-wikimedia to 3.13.5
  • 06:10 elukey@deploy1001: Finished deploy [analytics/superset/deploy@369a2dd]: Upgrade Superset to 0.36 - second attempt (duration: 00m 57s)
  • 06:09 elukey@deploy1001: Started deploy [analytics/superset/deploy@369a2dd]: Upgrade Superset to 0.36 - second attempt
  • 05:17 marostegui: Remove tmp_3 key from enwiki.recentchanges on db1099:3311 - T206103
  • 04:41 _joe_: cassandra cannot start on restbase2009, one of the disk is failed.
  • 04:39 _joe_: restarting cassandra instances on restbase2009, has a broken disk
  • 04:20 marostegui: Depool labsdb1011 - T249188

2020-05-26

  • 21:34 krinkle@deploy1001: Synchronized wmf-config/mc.php: I0fb124b3593 (duration: 01m 05s)
  • 21:30 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I2714e2ae26404 (duration: 01m 06s)
  • 21:18 krinkle@deploy1001: Synchronized wmf-config/profiler.php: Ib0bf8d97b10b, T253674 (duration: 01m 06s)
  • 20:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.34 refs T253022
  • 20:08 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.34 refs T253022 (duration: 70m 02s)
  • 18:58 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.34 refs T253022
  • 18:07 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.30 (duration: 20m 45s)
  • 18:02 bblack: cr[12]-eqiad: re-route ns0.wikimedia.org to authdns1001 - T241770
  • 18:02 ejegg: restarted fundraising jobs: recurring charge, audit processing, deduplication
  • 17:57 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
  • 17:47 cdanis: netflow3001: disabling puppet and testing some pmacct/librdkafka config tweaks T253128
  • 17:16 James_F: 1.35.0-wmf.34 was branched at b5012a1 for T253022
  • 16:45 moritzm: installing jsp-api bugfix update from Buster point release
  • 15:22 akosiaris: sync kubernetes eqiad namespaces configuration with helmfile
  • 15:15 akosiaris: sync kubernetes codfw namespaces configuration with helmfile
  • 15:08 arturo: delete/re-import docker/containerd.io packages in the right version in buster-wikimedia/thirdparty/kubeadm-k8s-1-{15,16} (T250866)
  • 15:08 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add lazy-loading to Wikimedia Foundation powered-by icon T239377 (duration: 00m 57s)
  • 15:01 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Drop enwiki mobile mainpage special casing T32405 (duration: 00m 59s)
  • 14:58 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:57 akosiaris: sync staging namespaces configuration
  • 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 14:56 jforrester@deploy1001: Synchronized docroot/noc/: Clear out symlink to mobile.php, now removed (duration: 00m 55s)
  • 14:56 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:54 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 14:53 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Move mobile.php into CommonSettings.php (duration: 00m 57s)
  • 14:44 arturo: upgrade packages in buster-wikimedia/thirdpardy/kubeadm-k8s-1-16 (T246122)
  • 14:44 jforrester@deploy1001: Synchronized docroot/noc/: Clear out symlink to mobile-labs.php, now removed (duration: 00m 58s)
  • 14:43 moritzm: installing rails security updates
  • 14:41 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Don't try to load mobile-labs.php from mobile.php (duration: 00m 57s)
  • 14:38 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings.php: Move uncondition/no-sideeffect includes up (duration: 00m 57s)
  • 14:35 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean up MWMultiVersion check in CommonSettings.php (duration: 00m 59s)
  • 14:33 XioNoX: test bgp med on dns4002
  • 14:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SpecialVersionVersionUrl: Don't use confusing local variable name (duration: 00m 58s)
  • 14:30 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Remove EOL REL1_32 (duration: 00m 58s)
  • 13:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.32
  • 12:43 godog: swift eqiad-prod: decom ms-be101[678] - T252008
  • 12:21 XioNoX: repool ulsfo - T243080
  • 12:11 XioNoX: cr4-ulsfo re-activate transit/ix/4/6 - T243080
  • 12:03 XioNoX: cr4-ulsfo> request vmhost reboot - T243080
  • 12:01 XioNoX: cr4-ulsfo deactivate transit/ix/4/6 - T243080
  • 11:49 XioNoX: cr3-ulsfo> request vmhost reboot - T243080
  • 11:42 XioNoX: cr4-ulsfo> request vmhost software add ... - T243080
  • 11:28 XioNoX: cr3-ulsfo> request vmhost software add ... - T243080
  • 11:27 awight: nnwiki updateCollation.php script has finished.
  • 11:26 XioNoX: depool ulsfo for routers upgrade - T243080
  • 11:16 awight: EU SWAT done (pending a maintenance script to updateCollation)
  • 11:14 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add 'deletedtext' permission to researcher group (T253420) (duration: 01m 06s)
  • 11:06 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [nnwiki] Change category collation to (T253559) (duration: 01m 10s)
  • 10:46 marostegui: Stop tendril's event scheduler
  • 10:18 jynus: stop db2097 for hw maintenance T252492
  • 09:48 vgutierrez: rolling upgrade to ats 8.0.7-1wm11
  • 09:41 _joe_: all jobrunners converted to use envoy for TLS termination
  • 09:38 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw131[0-1].eqiad.wmnet
  • 09:38 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw133[4-8].eqiad.wmnet
  • 09:37 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw130[0-9].eqiad.wmnet
  • 09:37 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw130[0-3].eqiad.wmnet
  • 09:36 oblivian@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=mw129[3-9].eqiad.wmnet
  • 09:31 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw130[0-3].eqiad.wmnet
  • 09:27 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw130[4-7].eqiad.wmnet
  • 09:22 gehel: repool wdqs1007, catched up on lag
  • 09:09 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw13(0[89]|1[01]).eqiad.wmnet
  • 09:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:02 mutante: decom'ing people1001 - replaced by people1002
  • 09:01 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:01 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw13(1|3)8.eqiad.wmnet
  • 08:57 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw133[4-7].eqiad.wmnet
  • 08:55 _joe_: progressively converting jobrunners to envoy
  • 08:41 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw1337.eqiad.wmnet
  • 07:20 moritzm: installing libssh security updates
  • 07:03 vgutierrez: upgrade to ats 8.0.7-1wm11 on cp3064 and cp3065
  • 06:49 marostegui: Deploy schema change on s3 directly on the master with 1 minute sleep in between wikis T253342
  • 06:47 marostegui: Deploy schema change on s1 directly on the master T253342
  • 06:44 marostegui: Deploy schema change on s4 directly on the master T253342
  • 06:35 XioNoX: reboot scs-ulsfo - T253609
  • 06:29 marostegui: Deploy schema change on s7 directly on the master T253342
  • 06:24 marostegui: Deploy schema change on s8 directly on the master T253342
  • 06:01 marostegui: Deploy schema change on s2 directly on the master T253342
  • 04:35 marostegui: Repool labsdb1011 - T249188
  • 04:14 marostegui: Stop slaves and stop mysql on labsdb1011 T249188
  • 03:55 tstarling@deploy1001: Synchronized php-1.35.0-wmf.31/includes/export/XmlDumpWriter.php: T253468 (duration: 01m 06s)
  • 03:53 tstarling@deploy1001: Synchronized php-1.35.0-wmf.32/includes/export/XmlDumpWriter.php: T253468 (duration: 01m 07s)
  • 03:20 tstarling@deploy1001: Synchronized php-1.35.0-wmf.32/includes/specials/SpecialChangeContentModel.php: for UBN T252963 (duration: 01m 07s)
  • 03:18 tstarling@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 32s)

2020-05-25

  • 23:34 ejegg: re-enabled fundraising queue consumers and job runners, except audits, dedupe, and recurring
  • 21:38 eileen: civicrm revision changed from 5428c5c449 to d1cd99166f, config revision is 6b05d6bb25
  • 21:18 eileen: civicrm revision is 7380e0e8ce, config revision is 6b05d6bb25
  • 21:01 ejegg: updated fundraising CiviCRM from 737d88a5ee to 7380e0e8ce
  • 17:17 ejegg: updated fundraising CiviCRM from 6b1d5902dd to 737d88a5ee
  • 17:09 ejegg: enabled contribution tracking queue on payments-wiki
  • 16:24 ejegg: updated standalone SmashPig from 2702b04329 to 44690f761c
  • 16:17 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 16:16 XioNoX: enable IX4/6 BGP group on cr4-ulsfo - T237575
  • 16:00 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:55 XioNoX: disable IX4/6 BGP group on cr4-ulsfo - T237575
  • 15:17 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:15 ejegg: updated payments-wiki from 3c465cb11c to d11efeb1cf, put it into maintenance mode
  • 15:15 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:39 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:06 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:00 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:46 _joe_: uploaded doxygen 1.8.17-1 to wikimedia-buster component/ci
  • 13:43 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift
  • 13:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:10 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:09 vgutierrez: upgrade ATS to version 8.0.7-1wm11 on cp4026 and cp4032
  • 12:52 godog: roll-restart pybal in low-traffic codfw
  • 12:44 ema: upload atskafka 0.7 to buster-wikimedia, upgrade cp3050 T253551
  • 12:37 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 12:30 marostegui: Deploy schema change on s5 directly on the master T253342
  • 12:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:09 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:01 _joe_: converting the remaining appservers to use envoy for TLS termination
  • 11:57 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 11:54 marostegui: Install a new tendril_purge_global_status_log event on db1115 (tendril) T252331
  • 11:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 11:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:48 marostegui: Stop event scheduler on db1115 (tendril) - T252331
  • 11:46 moritzm: uploaded CAS 6.1.5-1 to apt.wikimedia.org T233947
  • 11:36 _joe_: switch mw[1349-1355,1364-1373].eqiad.wmnet to envoy
  • 11:27 marostegui: Extend /srv 1100G on db213[6-9] T252985
  • 11:23 marostegui: Extend /srv 1100G on db114[1-9] T252512
  • 11:21 marostegui: Extend db1141's (temporary labsdb test host) /srv 1TB extra - T249188
  • 11:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 11:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 11:01 ema: upload prometheus-rdkafka-exporter to buster-wikimedia T253197
  • 10:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (598439) (duration: 01m 05s)
  • 10:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (598439) (duration: 01m 06s)
  • 10:20 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 09:56 _joe_: transition done
  • 09:49 _joe_: depooled mw1337, it was getting all traffic supposed to go to the jobrunners
  • 09:45 vgutierrez: upload trafficserver 8.0.7-1wm10 to apt.wm.o (buster)
  • 09:42 _joe_: converting mw1319-1333 to use envoy for TLS termination
  • 09:17 _joe_: migrated mw1337 to use envoy for TLS termination T247389
  • 09:10 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 09:04 godog: turn on sni by default for check_http --ssl icinga invocations - T253292
  • 08:52 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:39 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:21 filippo@cumin1001: conftool action : set/pooled=yes:weight=100; selector: service=thanos-swift
  • 08:05 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 07:36 moritzm: installed linux-image-amd64 on labstore1005 (current meta package for kernels following the Stretch update) T224582
  • 07:36 moritzm: installed linux-imageamd64 on labstore (current meta package for kernels following the Stretch update) T224582
  • 07:02 marostegui: Stop event scheduler on tendril T252331
  • 05:11 marostegui: Deploy schema change on s6, directly on the master - T253342
  • 04:54 marostegui: Depool labsdb1011 - T249188
  • 04:11 kart_: Updated cxserver to 2020-05-22-083137-production (T246317, T252871)
  • 04:07 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:04 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:02 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .

2020-05-24

  • 17:36 gehel: restarting elasticsearch psi on elastic1052
  • 16:44 gehel: depool wdqs1007 to catch on lag
  • 16:43 gehel: restart blazegraph on wdqs1007

2020-05-23

  • 19:04 krinkle@deploy1001: Synchronized php-1.35.0-wmf.31/includes/filerepo/file/LocalFile.php: I0f7e885997d60 (duration: 01m 06s)
  • 18:58 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/filerepo/file/LocalFile.php: I0f7e885997d60 (duration: 01m 08s)
  • 18:06 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/filerepo/: I31a9bb6672 (duration: 01m 06s)
  • 18:05 krinkle@deploy1001: Synchronized php-1.35.0-wmf.31/includes/filerepo/: I31a9bb6672 (duration: 01m 10s)
  • 15:44 krinkle@deploy1001: Synchronized wmf-config/mc.php: I5ad8fe - Disable coalesceKeys on commonswiki (duration: 01m 09s)
  • 14:58 Krinkle: scap-pull to reset state on mwdebug1002
  • 14:50 Krinkle: Testing mc.php changes on mwdebug1002
  • 08:04 elukey: powercycle an-presto1004 - unresponsive, racadm getsel shows CPU overheating alerts

2020-05-22

  • 22:42 krinkle@deploy1001: Synchronized php-1.35.0-wmf.31/includes/filerepo/: Ie19613ef7643a (duration: 01m 06s)
  • 22:40 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/filerepo/: Ie19613ef7643a (duration: 01m 08s)
  • 15:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:57 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:53 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:47 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:25 cdanis: fixing prometheus-nic-firmware-textfile.service wherever it is broken T253374
  • 15:25 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:24 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:06 marostegui: Decrease tendril_purge_global_status_log_5m storing rows time from 2 days to 1 day T252331
  • 15:01 kormat@cumin1001: dbctl commit (dc=all): 'Pool db2137 into s4+s5 T252985', diff saved to https://phabricator.wikimedia.org/P11292 and previous config saved to /var/cache/conftool/dbconfig/20200522-150120-kormat.json
  • 14:53 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/maintenance/blockUsers.php: (no justification provided) (duration: 01m 08s)
  • 14:51 reedy@deploy1001: Synchronized php-1.35.0-wmf.32/maintenance/blockUsers.php: (no justification provided) (duration: 01m 09s)
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11290 and previous config saved to /var/cache/conftool/dbconfig/20200522-143541-marostegui.json
  • 14:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 14:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11289 and previous config saved to /var/cache/conftool/dbconfig/20200522-141513-marostegui.json
  • 14:13 sukhe: upload dnsdist_1.4.0-1~deb10u1 to apt.wm.o (buster) - T252132
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11288 and previous config saved to /var/cache/conftool/dbconfig/20200522-140847-marostegui.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11286 and previous config saved to /var/cache/conftool/dbconfig/20200522-131452-marostegui.json
  • 13:10 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:10 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:09 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:08 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1144:3314 and db1144:3315 to the list of hosts', diff saved to https://phabricator.wikimedia.org/P11284 and previous config saved to /var/cache/conftool/dbconfig/20200522-130707-marostegui.json
  • 12:56 vgutierrez: depool cp4032 for some ats tests
  • 12:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:04 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:03 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:48 marostegui: Stop MySQL on db1097:3314, db1097:3315 to clone db1144 - T252512
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314, db1097:3315 - T252512', diff saved to https://phabricator.wikimedia.org/P11281 and previous config saved to /var/cache/conftool/dbconfig/20200522-104437-marostegui.json
  • 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:32 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:10 marostegui: Stop event_scheduler on db1115 - T252331
  • 10:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:05 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:05 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:00 jbond42: update pdns-recursor on dns recursors
  • 09:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:22 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 09:09 elukey@deploy1001: Finished deploy [analytics/superset/deploy@be203c8]: Rollback superset to 0.35.2 (duration: 00m 43s)
  • 09:09 elukey@deploy1001: Started deploy [analytics/superset/deploy@be203c8]: Rollback superset to 0.35.2
  • 08:41 vgutierrez: reverting hugepages experiment on cp2041
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11278 and previous config saved to /var/cache/conftool/dbconfig/20200522-082700-marostegui.json
  • 08:18 elukey@deploy1001: Finished deploy [analytics/superset/deploy@59ba01d]: Upgrade Superset to 0.36 (duration: 01m 01s)
  • 08:17 elukey@deploy1001: Started deploy [analytics/superset/deploy@59ba01d]: Upgrade Superset to 0.36
  • 08:13 vgutierrez: test hugepages allocator on ATS in cp2041
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11277 and previous config saved to /var/cache/conftool/dbconfig/20200522-080629-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11276 and previous config saved to /var/cache/conftool/dbconfig/20200522-074853-marostegui.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11275 and previous config saved to /var/cache/conftool/dbconfig/20200522-072000-marostegui.json
  • 07:07 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=druid1008.eqiad.wmnet
  • 07:04 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1007.eqiad.wmnet
  • 07:04 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1007.eqiad.wmnet
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 - T252512', diff saved to https://phabricator.wikimedia.org/P11272 and previous config saved to /var/cache/conftool/dbconfig/20200522-043418-marostegui.json

2020-05-21

  • 23:58 ejegg: updated civicrm from b658fd8233 to 6b1d5902dd
  • 23:54 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/content/ContentHandlerFactory.php: If578893f5689 (duration: 01m 06s)
  • 23:47 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/LiquidThreads/classes/Thread.php: If3418cba06e (duration: 01m 07s)
  • 23:41 krinkle@deploy1001: Synchronized wmf-config/mc.php: I222457729a5b (duration: 01m 08s)
  • 21:46 eileen: civicrm revision changed from ed4c9522ac to b658fd8233, config revision is 9babae3954
  • 21:10 foks: removing two files for legal compliance
  • 20:44 bstorm_: labstore1005 is now running stretch and drbd devices are resyncing after several reboots and some significant effort T224582
  • 18:24 twentyafterfour: restarting phabricator on phab1001 to deploy https://phabricator.wikimedia.org/rPHEX2687d08786a9dadcbaa96709de991f471f239830
  • 17:24 bblack: anycast experiment done, all back to normal
  • 17:20 bblack: anycast experimentation commencing in ulsfo (test route withdrawal)...
  • 17:04 bstorm_: starting labstore1005 upgrades T224582
  • 16:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:04 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Update mitigations for T250887 (duration: 01m 08s)
  • 15:48 andrewbogott: rebuilding cloudnet1003.eqiad.wmnet with Debian Buster for T253124
  • 15:22 XioNoX: Add BGP between cr1/2-eqiad and authdns1001 - T253196
  • 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:08 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw217[0-2].codfw.wmnet
  • 14:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw216[0-9].codfw.wmnet
  • 14:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw215[8-9].codfw.wmnet
  • 14:50 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:47 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
  • 14:33 akosiaris: upload helmfile 0.109.0 to apt.wikimedia.org/buster-wikimedia and stretch-wikimedia, component main
  • 13:51 vgutierrez: depool cp4032 for some ats tests
  • 13:22 mutante: cloudnet1004 - reboot to test PXE boot
  • 12:44 andrewbogott: reimaging cloudnet1004.eqiad.wmnet for T253124
  • 12:29 elukey: roll restart druid-public cluster (druid100[4-6], backend for the AQS API) to apply new settings + openjdk upgrade - T252771
  • 12:13 mutante: depooled mw2158 through mw2172 to make room again in C3 as planned (T247018)
  • 12:12 marostegui: Repool labsdb1011 into the analytics role 🀞- T249188
  • 12:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw217[0-2].codfw.wmnet
  • 12:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw216[0-9].codfw.wmnet
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11270 and previous config saved to /var/cache/conftool/dbconfig/20200521-120555-marostegui.json
  • 12:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw215[8-9].codfw.wmnet
  • 11:18 hnowlan: Removed changeprop from scb hosts
  • 11:04 vgutierrez: rolling restart of ncredir servers for kernel update
  • 10:17 vgutierrez: restart of acme-chief servers for kernel update
  • 10:13 jbond42: deploy CI for pupet privcate repo
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11268 and previous config saved to /var/cache/conftool/dbconfig/20200521-101100-marostegui.json
  • 10:07 mutante: replaced backend of people.wikimedia.org - people1001 will be inaccessible, replaced with people1002 on buster. all home dirs have been synced over, there should be no difference except you have to use people1002 now for uploads (T247649)
  • 10:06 godog: test adding --sni to check_http -S on icinga2001 - T253292
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11267 and previous config saved to /var/cache/conftool/dbconfig/20200521-095100-marostegui.json
  • 09:28 mutante: deneb - sudo systemctl reset-failed to clear Icinga alerts about systemd degraded state
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11266 and previous config saved to /var/cache/conftool/dbconfig/20200521-091245-marostegui.json
  • 09:01 mutante: LDAP - added lmata to wmf group (T253277)
  • 08:55 XioNoX: Advertise Anycast 198.35.27.0/24 from esams - T253196
  • 08:52 XioNoX: Advertise Anycast 198.35.27.0/24 from eqsin - T253196
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1143 with minimal weight for the first time T252512', diff saved to https://phabricator.wikimedia.org/P11265 and previous config saved to /var/cache/conftool/dbconfig/20200521-084933-marostegui.json
  • 08:47 XioNoX: Advertise Anycast 198.35.27.0/24 from eqiad/eqord - T253196
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1143 to the list of s4 hosts, depooled - T252512', diff saved to https://phabricator.wikimedia.org/P11264 and previous config saved to /var/cache/conftool/dbconfig/20200521-084226-marostegui.json
  • 08:34 XioNoX: Advertise Anycast 198.35.27.0/24 from dfw - T253196
  • 08:27 XioNoX: Advertise Anycast 198.35.27.0/24 from ulsfo - T253196
  • 08:20 XioNoX: Delete ARIN route object for 198.35.26.0/23 - T253196
  • 08:13 XioNoX: Delete ROA for 198.35.26.0/23 - T253196
  • 08:10 XioNoX: repool ulsfo - T253196
  • 08:03 XioNoX: Shrink ulsfo's 198.35.26.0/23 to 198.35.26.0/24 - T253196
  • 07:29 XioNoX: depool ulsfo - T253196
  • 07:22 marostegui: Purge events from tendril.global_status_log older than 24h - T252331
  • 07:03 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1019 fully', diff saved to https://phabricator.wikimedia.org/P11263 and previous config saved to /var/cache/conftool/dbconfig/20200521-070335-jynus.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 - T252512', diff saved to https://phabricator.wikimedia.org/P11261 and previous config saved to /var/cache/conftool/dbconfig/20200521-065858-marostegui.json
  • 06:28 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1019 with 50% weight', diff saved to https://phabricator.wikimedia.org/P11260 and previous config saved to /var/cache/conftool/dbconfig/20200521-062823-jynus.json
  • 06:04 vgutierrez: pool cp5012 - T251219
  • 05:42 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1019 with low weight', diff saved to https://phabricator.wikimedia.org/P11259 and previous config saved to /var/cache/conftool/dbconfig/20200521-054231-jynus.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set enwiki as read-only=off after maintenance T251982', diff saved to https://phabricator.wikimedia.org/P11258 and previous config saved to /var/cache/conftool/dbconfig/20200521-050328-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set enwiki as read-only for maintenance T251982', diff saved to https://phabricator.wikimedia.org/P11257 and previous config saved to /var/cache/conftool/dbconfig/20200521-050029-marostegui.json
  • 01:03 krinkle@deploy1001: Synchronized wmf-config/mc.php: Ic9efa98312b (duration: 01m 08s)

2020-05-20

  • 20:16 herron: logstash1011:~# kafka-preferred-replica-election --zookeeper conf1004.eqiad.wmnet,conf1005.eqiad.wmnet,conf1006.eqiad.wmnet/kafka/logging-eqiad
  • 19:27 robh: cp5012 still offline for mem tests, "fast" testing complete without errors and extended testing in progress. system firmware was updated before testing. T251219
  • 18:10 XioNoX: accept 198.35.27.0/24 from Anycast peers on all routers - T253196
  • 18:01 XioNoX: add BGP between authdns2001 and cr1-codfw - T253196
  • 17:57 XioNoX: accept 198.35.27.0/24 from Anycast peers on cr3-ulsfo - T253196
  • 17:44 robh: cp5012 rebooting for troubleshooting
  • 17:02 bblack: dns* + authdns* - disabling puppet to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/597311/
  • 16:53 bblack: kraz.wikimedia.org ( https://wikitech.wikimedia.org/wiki/IRCD ) - stopping ircecho then ircd, then restarting them in reverse order - T239993
  • 16:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 16:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
  • 15:42 elukey: update puppet compiler's facts
  • 15:21 moritzm: installing libssh security updates
  • 15:15 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 15:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T253096 [itwikivoyage] Undeploy Insider and Listings extensions (duration: 01m 08s)
  • 14:43 marostegui: Replace tendril_purge_global_status_log_5m event with the new one (purging every 2d of data and with a higher limit of rows) - T252331
  • 14:34 hnowlan@deploy1001: Finished deploy [restbase/deploy@6d2f88c]: Add awa.wikipedia.org to wikipedia list (duration: 19m 49s)
  • 14:15 hnowlan@deploy1001: Started deploy [restbase/deploy@6d2f88c]: Add awa.wikipedia.org to wikipedia list
  • 14:06 XioNoX: special-ranges6, remove 4000::/2 and 8000::/1
  • 14:03 bblack: authdns1001 - poweroff for T241770
  • 14:00 bblack: cr2-eqiad - re-routing ns[01] public IPs from authdns1001 (going offline for hw work) to dns1002 - T241770 (redo from earlier, commit didn't take for whatever reason)
  • 13:52 bblack: cr[12]-eqiad - re-routing ns[01] public IPs from authdns1001 (going offline for hw work) to dns1002 - T241770
  • 13:51 bblack: authdns1001 - downtimed for physical work - T241770
  • 13:24 milimetric@deploy1001: Finished deploy [analytics/refinery@a891999] (thin): Regular analytics weekly train THIN [analytics/refinery@a891999] (duration: 00m 10s)
  • 13:23 milimetric@deploy1001: Started deploy [analytics/refinery@a891999] (thin): Regular analytics weekly train THIN [analytics/refinery@a891999]
  • 13:23 milimetric@deploy1001: Finished deploy [analytics/refinery@a891999]: Regular analytics weekly train [analytics/refinery@a891999] (duration: 38m 33s)
  • 13:23 godog: remove stale tcp service on lvs codfw low-traffic 10.2.1.53:10902
  • 13:00 Amir1: creating two wikis are done
  • 12:52 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 10m 49s)
  • 12:45 milimetric@deploy1001: Started deploy [analytics/refinery@a891999]: Regular analytics weekly train [analytics/refinery@a891999]
  • 12:41 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating Wiktionary Konkani (gomwiktionary) - T249506 (duration: 01m 06s)
  • 12:40 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 12:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating Wiktionary Konkani (gomwiktionary) - T249506 (duration: 01m 05s)
  • 12:35 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating Wiktionary Konkani (gomwiktionary) - T249506
  • 12:33 ladsgroup@deploy1001: Synchronized dblists: Creating Wiktionary Konkani (gomwiktionary) - T249506 (duration: 01m 06s)
  • 12:28 godog: roll-restart pybal on codfw low-traffic - T233956
  • 12:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 12:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:22 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 01s)
  • 12:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:18 ladsgroup@deploy1001: Synchronized langlist: Create Awadhi Wikipedia (awawiki) - T251371 (duration: 01m 06s)
  • 12:16 ladsgroup@deploy1001: Synchronized static/images/project-logos: Create Awadhi Wikipedia (awawiki) - T251371 (duration: 01m 06s)
  • 12:14 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create Awadhi Wikipedia (awawiki) - T251371 (duration: 01m 06s)
  • 12:12 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Create Awadhi Wikipedia (awawiki) - T251371
  • 12:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 08s)
  • 11:37 mutante: rebooting ganeti1009 and ganeti1011 to hopefully clear icinga alerts about microcode mitigations
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool new host db1142 and db1084', diff saved to https://phabricator.wikimedia.org/P11253 and previous config saved to /var/cache/conftool/dbconfig/20200520-111013-marostegui.json
  • 11:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1018, es1015 fully', diff saved to https://phabricator.wikimedia.org/P11252 and previous config saved to /var/cache/conftool/dbconfig/20200520-110732-jynus.json
  • 11:04 jbond42: roll out update or exim4
  • 10:46 moritzm: installing 4.19.118 Linux packages on Buster hosts
  • 10:28 vgutierrez: rolling restart of ats-tls in text@esams - T249335
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1142 and db1084 on s4', diff saved to https://phabricator.wikimedia.org/P11250 and previous config saved to /var/cache/conftool/dbconfig/20200520-101928-marostegui.json
  • 10:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1018, es1015 at 50% weight', diff saved to https://phabricator.wikimedia.org/P11249 and previous config saved to /var/cache/conftool/dbconfig/20200520-100726-jynus.json
  • 09:43 vgutierrez: disable KA for POST/PUT requests on esams - T249335
  • 09:36 XioNoX: create ROAs for 198.35.26.0/24 and 198.35.27.0/24 - T253196
  • 09:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1142 and db1084 on s4', diff saved to https://phabricator.wikimedia.org/P11247 and previous config saved to /var/cache/conftool/dbconfig/20200520-093141-marostegui.json
  • 09:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:28 XioNoX: create ARIN inetnum 198.35.27.0/24 and route 198.35.26.0/24 + 198.35.27.0/24 - T253196
  • 09:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:26 marostegui: Upgrade db1083 (s1 master) to 10.1.43-2 without restarting T251982
  • 09:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for new host db1142 and start to repool db1084', diff saved to https://phabricator.wikimedia.org/P11246 and previous config saved to /var/cache/conftool/dbconfig/20200520-091153-marostegui.json
  • 09:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:01 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 09:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1142 with minimum weight for the first time T252512', diff saved to https://phabricator.wikimedia.org/P11245 and previous config saved to /var/cache/conftool/dbconfig/20200520-085757-marostegui.json
  • 08:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:49 _joe_: converting mw1266-1275 to use envoy T247389
  • 08:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:42 XioNoX: Remove bogons4 for policy options on all routers - gerrit 597272
  • 08:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:33 _joe_: disabling puppet on mw1266-1275 for migration to envoy
  • 08:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:41 marostegui: alter table categorylinks engine=Innodb ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8,force on all labsdb1011 wikis - T249188
  • 07:24 moritzm: install systemd security updates
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 to clone db1142 T252512', diff saved to https://phabricator.wikimedia.org/P11241 and previous config saved to /var/cache/conftool/dbconfig/20200520-071010-marostegui.json
  • 00:05 RoanKattouw: Ran namespaceDupes.php on tiwiki and tiwiktionary for T251287
  • 00:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set sitename and meta namespace localizations for tiwiki and tiwiktionary (T251287) (duration: 01m 06s)

2020-05-19

  • 23:59 RoanKattouw: Ran namespaceDupes.php on jvwiki and jvwiktionary for T252754
  • 23:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/Insider/includes/InsiderHooks.php: T252846 Use SidebarBeforeOutput hook with correct format (duration: 01m 06s)
  • 23:55 catrope@deploy1001: Finished scap: i18n scap for namespace localizations (T251287, T252754) (duration: 62m 26s)
  • 22:53 catrope@deploy1001: Started scap: i18n scap for namespace localizations (T251287, T252754)
  • 18:46 herron: performing rolling restarts of codfw/eqiad ELK clusters for java updates
  • 18:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant template editors editcontentmodel on enwiki (T253081) (duration: 01m 06s)
  • 18:35 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments features on frwiki (T252420) (duration: 01m 08s)
  • 17:09 arturo: added tesseract suite to stretch-wikimedia component/tesseract-410-bpo (T247422)
  • 16:24 godog: power cycle thanos-fe* / thanos-be*
  • 15:23 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2073 into s4 T252985', diff saved to https://phabricator.wikimedia.org/P11236 and previous config saved to /var/cache/conftool/dbconfig/20200519-152340-kormat.json
  • 15:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:16 cdanis: canary on ~150 hosts looks great, re-enabling puppet on all physical hosts βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•₯β˜• sudo cumin 'F:virtual = physical' 'enable-puppet "cdanis deploying I68c97d5"'
  • 15:04 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:04 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:59 moritzm: installing fuse update from Buster point release
  • 14:47 cdanis: disabling puppet on all physical hosts βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•₯β˜• sudo cumin 'F:virtual = physical' 'disable-puppet "cdanis deploying I68c97d5"'
  • 14:38 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 14:26 XioNoX: Set minimum-links 2 to AMS-IX LACP - T253122
  • 13:53 XioNoX: configure new AMS-IX port as quarantine - T251121
  • 13:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:09 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 13:09 jayme: updated helm: 2.16.7-1 -> 2.16.7-2 on deploy[1,2]001 and contint[1,2]001
  • 13:09 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:03 kormat@cumin1001: dbctl commit (dc=all): 'Pool db2136 into s4 T252985', diff saved to https://phabricator.wikimedia.org/P11233 and previous config saved to /var/cache/conftool/dbconfig/20200519-130313-kormat.json
  • 12:40 ariel@deploy1001: Finished deploy [dumps/dumps@a329605]: make page content fixup script move inprog files into place if good (duration: 00m 04s)
  • 12:40 ariel@deploy1001: Started deploy [dumps/dumps@a329605]: make page content fixup script move inprog files into place if good
  • 12:37 jayme: imported helm 2.16.7-2 to main for buster-wikimedia, stretch-wikimedia, jessie-wikimedia
  • 12:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:51 jynus: starting backups of es1, es2, es3 on eqiad into backup1002
  • 11:41 jynus@cumin1001: dbctl commit (dc=all): 'Depool es1018, es1015, es1019', diff saved to https://phabricator.wikimedia.org/P11232 and previous config saved to /var/cache/conftool/dbconfig/20200519-114148-jynus.json
  • 11:12 marostegui: Deploy schema change on db2124 (frwiki, jawiki, ruwiki) T238966
  • 10:34 mutante: releases2001 - restarted failed jenkins
  • 10:33 mutante: releases2001 - Failed to restart jenkins.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
  • 10:32 volans: flushed all Netbox caches (manage.py invalidate all) - T253091
  • 10:29 volans: start Netbox restore - T253091
  • 10:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:13 akosiaris: upgrade etherpad-lite to 1.8.4 on etherpad1002
  • 09:58 hnowlan: roll-restart of eqiad restbase hosts for java security updates
  • 09:58 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 09:55 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 09:55 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
  • 09:55 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 09:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 09:10 godog: eqiad-prod: decom ms-be101[678] - T252008
  • 08:07 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - eqsin
  • 08:04 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - esams
  • 08:01 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - eqiad
  • 07:55 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide: (duration: 00m 06s)
  • 07:54 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
  • 07:52 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - *dfw
  • 07:49 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - ulsfo
  • 07:45 vgutierrez: rolling upgrade to trafficserver 8.0.7-1wm10 with puppet disabled on cp hosts
  • 07:09 jynus: starting es4 & es5 eqiad backups with low concurrency
  • 06:35 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:29 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:24 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:17 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 05:57 volker-e@deploy1001: Finished deploy [design/style-guide@7bfbd2a]: Deploy design/style-guide: (duration: 00m 06s)
  • 05:57 volker-e@deploy1001: Started deploy [design/style-guide@7bfbd2a]: Deploy design/style-guide:
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 and s8 as read-only=off for maintenance T251981', diff saved to https://phabricator.wikimedia.org/P11227 and previous config saved to /var/cache/conftool/dbconfig/20200519-050346-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 and s8 as read-only for maintenance T251981', diff saved to https://phabricator.wikimedia.org/P11226 and previous config saved to /var/cache/conftool/dbconfig/20200519-050043-marostegui.json
  • 04:27 marostegui: Repool labsdb1011 T249188
  • 03:29 volker-e@deploy1001: Finished deploy [design/style-guide@4b4bc51]: Deploy design/style-guide: (duration: 00m 07s)
  • 03:28 volker-e@deploy1001: Started deploy [design/style-guide@4b4bc51]: Deploy design/style-guide:

2020-05-18

  • 23:50 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:12 ryankemper: Restarted `wdqs-updater` across all wdqs nodes and restarted `wdqs-categories` across all nodes except 1010 (test wdqs server) and 1009 (automated deployment server)
  • 22:55 Krinkle: Clear module_deps on dewiki (group2, old mw version, s5) to monitor regeneration
  • 22:48 Krinkle: Clear module_deps on group0 (mostly s3) to monitor regeneration
  • 22:35 Krinkle: Clear module_deps on commonswiki (group1, s4) to monitor regeneration
  • 22:33 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@4886dc3]: 0.3.32 (duration: 17m 12s)
  • 22:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:18 Krinkle: Clear module_deps on s2 wikis to monitor regeneration
  • 22:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:15 ryankemper@deploy1001: Started deploy [wdqs/wdqs@4886dc3]: 0.3.32
  • 22:02 Krinkle: Clear module_deps on hewiki (group1, s7) to monitor regeneration, ref T247028
  • 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:23 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/resourceloader/dependencystore/: I015fa5885, I972a93806006 (duration: 01m 07s)
  • 21:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:27 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@12efc14]: Update mobileapps to c960b349 (duration: 03m 31s)
  • 20:24 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@12efc14]: Update mobileapps to c960b349
  • 19:07 herron: performing rolling maintenance on kafka-main to pick up java security updates
  • 19:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Ic005093778d (duration: 01m 08s)
  • 18:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Ic005093778d (duration: 01m 06s)
  • 18:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:38 volans: upgraded spicerack to 0.0.37-1 on cumin[12]001
  • 18:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix English Wikipedia wordmark dimensions (T252143) (duration: 01m 06s)
  • 17:14 XioNoX: update domain object for 56.15.185.in-addr.arpa - T247972
  • 17:06 bblack: dns1001 - removing downtimes, back in service - T241770
  • 16:45 bstorm_: updated views on labsdb1011 for the wb_terms changes T251598
  • 16:32 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:17 bblack: dns1001 - reimaging for new NIC - T241770
  • 16:10 volans: uploaded spicerack_0.0.37-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 15:52 hnowlan: rolling codfw cassandra for java security updates
  • 15:51 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 15:11 Krinkle: krinkle@mc1021 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 14:57 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:56 hnowlan: roll-restart of sessionstore cassandra hosts for java security update
  • 14:55 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:53 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 14:35 hnowlan@deploy1001: Finished deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this (duration: 01m 22s)
  • 14:34 hnowlan@deploy1001: Started deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this
  • 14:33 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of esams T133821
  • 14:29 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqiad T133821
  • 14:23 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqsin, ulsfo T133821
  • 14:19 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of codfw T133821
  • 14:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2073 while replacing it T252985', diff saved to https://phabricator.wikimedia.org/P11216 and previous config saved to /var/cache/conftool/dbconfig/20200518-141505-kormat.json
  • 14:12 bblack: dns1001 - shutting down for T241770
  • 14:09 volans: uploaded spicerack_0.0.36-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 14:07 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching back to authdns1001 (oops, that's not the server we're taking offline today!)
  • 14:06 vgutierrez: upload trafficserver 8.0.7-1wm9 to apt.wm.o (buster)
  • 14:02 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 14:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 13:57 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching from authdns1001 to dns1002 for T241770
  • 13:29 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 13:00 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: VectorTemplate: SkinTemplateToolboxEnd hook isn't deprecated - T252906 (duration: 01m 07s)
  • 11:52 marostegui: Install 10.1.43-2 on db1122 and db1109 - T251981
  • 11:27 Lucas_WMDE: EU SWAT done
  • 11:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/Wikibase/: SWAT: Fix core's TitleFactory not being used correctly (T252803) (duration: 01m 12s)
  • 11:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update GrowthExperiments mentor list page for viwiki (duration: 01m 06s)
  • 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Make the threshold for Chinese WP to prevent publishing 5% more strict (T252786) (duration: 01m 06s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (597033) (duration: 01m 06s)
  • 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (597033) (duration: 01m 32s)
  • 10:37 elukey: copy prometheus-druid-exporter 0.8-1 from stretch to buster wikimedia
  • 10:20 _joe_: upgrading purged in the remaining datacenters
  • 10:07 elukey: upload druid 0.12.3-1.1 to stretch|buster-wikimedia
  • 10:02 vgutierrez: upload trafficserver 8.0.7-1wm8 to apt.wm.o (buster)
  • 09:53 _joe_: upgrading purged in codfw, ulsfo
  • 09:46 mutante: contint2001 - apt-get remove --purge openjdk-11-* - T224591
  • 09:43 _joe_: upload purged 0.13 to buster-wikimedia
  • 08:44 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 08:13 godog: set weight to 0 for all but objects in ms-be10[678] - T252008
  • 07:57 mutante: replacing apache module with httpd module on deployment servers
  • 07:47 moritzm: installing apt security updates on jessie systems
  • 07:36 marostegui: Remove and add pc2007 from tendril as the Act is frozen after reimage - T250666
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088 after upgrade', diff saved to https://phabricator.wikimedia.org/P11214 and previous config saved to /var/cache/conftool/dbconfig/20200518-072234-marostegui.json
  • 07:20 marostegui: Upload MariaDB 10.4.13 to the buster repo - T250666
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:41 marostegui: Stop MySQL on db2088
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 for upgrade', diff saved to https://phabricator.wikimedia.org/P11213 and previous config saved to /var/cache/conftool/dbconfig/20200518-062452-marostegui.json
  • 05:55 _joe_: installing purged 0.12 on cp2027
  • 05:54 _joe_: uploaded purged 0.12 to apt.w.o
  • 05:00 marostegui: Stop MySQL on labsdb1011 to copy its content to backup1001 T249188

2020-05-16

  • 22:04 Krinkle: krinkle@mc1022 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 21:56 Krinkle: krinkle@mc1019 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 20:23 Krinkle: krinkle@mc1034,mc1035,mc1036 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 20:04 Krinkle: krinkle@mc1033 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:57 Krinkle: krinkle@mc1032 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:51 Krinkle: krinkle@mc1031 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:42 Krinkle: krinkle@mc1030 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:25 Krinkle: krinkle@mc1029 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 19:10 Krinkle: krinkle@mc1028 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
  • 18:58 Krinkle: krinkle@mc1027 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 18:54 Krinkle: krinkle@mc1026 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 18:30 Krinkle: krinkle@mc1024 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 18:24 Krinkle: krinkle@mc1025 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref T252945
  • 17:56 Krinkle: krinkle@mc1023 Pruning old echo:seen: Redis keys that didn't use a ttl yet, ref T252945
  • 17:49 Krinkle: krinkle@mwmaint1002: Running cleanupRemovedModules.php to prune old module_deps rows T113916
  • 17:24 Krinkle: krinkle@mc1020 Prune old echo:seen: keys that have ttl:-1 from Redis main stash, ref T252945
  • 15:16 Krinkle: krinkle@mc1020 Looking at why there are still over 2M echo:seen keys in redis main stash
  • 00:55 krinkle@deploy1001: Synchronized wmf-config/logging.php: I046868190b472 (duration: 01m 13s)
  • 00:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:13 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:10 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:06 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:05 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer

2020-05-15

  • 23:50 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:46 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:46 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:46 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:43 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 23:35 ryankemper: Pooled wdqs2007 following successful query tests (all data transfers are done now)
  • 22:53 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I1b1578a57ef5 (duration: 01m 07s)
  • 22:51 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Iaa240eb8cf9 (duration: 01m 06s)
  • 21:41 ryankemper: depooled wdqs2007 while it catches up on lag
  • 21:40 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:36 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:33 ryankemper: pooled wdqs2003 and wdqs1007 following successful query tests
  • 19:46 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: If0fd1b51 (duration: 01m 08s)
  • 18:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:34 ryankemper: depooled wdqs2003 while lag catches up
  • 18:32 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:55 vgutierrez: upload acme-chief 0.25 to apt.wm.o (buster) - T252881
  • 17:27 XioNoX: renumber cr2-eqord:xe-0/1/1 to xe-0/1/3 - T221259
  • 17:02 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 17:01 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:00 ryankemper: depooled wqds1007 in preparation for impending wdqs data xfer
  • 16:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:02 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:56 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:52 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:49 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:45 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:44 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:40 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:36 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:32 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:31 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:27 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 14:19 cdanis: reverting sysctl net.ipv4.udp_mem to original on netflow3001
  • 14:18 cdanis: re-enable puppet on netflow*
  • 14:14 cdanis: disable puppet on netflow*
  • 14:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:47 ema: cp2029, cp3050: varnish-fe-restart to clear 'child restarted' alerts
  • 13:47 vgutierrez: downgrade ats to version 8.0.7-1wm7 on cp4032
  • 13:42 vgutierrez: upgrade ats to version 8.0.7-1wm8 on cp4032
  • 13:37 mutante: rsyncing gerrit git data from gerrit1001 to gerrit1002 (T200739)
  • 13:13 cdanis: increase samplicator recvbuf on netflow3001 & restart samplicator
  • 13:01 cdanis: increasing sysctl net.ipv4.udp_mem on netflow3001
  • 09:57 vgutierrez: upload trafficserver 8.0.7-1wm7 to apt.wm.o (buster)
  • 09:21 ema: cp2029: attempt forced discard of stuck VCL T236754
  • 09:09 elukey: restart druid brokers on druid100[4-6] - locked up due to datasources dropped - T226035
  • 08:51 ema: cp2029: try out varnish 5.1.3-1wm15 T236754
  • 07:36 XioNoX: bumps prefix limit for AS16735 in eqiad
  • 05:35 jynus: stop replication on pc2009, pc2010 for benchmarking T252761
  • 04:53 volker-e@deploy1001: Finished deploy [design/style-guide@dc956a3]: Deploy design/style-guide: (duration: 00m 10s)
  • 04:52 volker-e@deploy1001: Started deploy [design/style-guide@dc956a3]: Deploy design/style-guide:
  • 04:42 vgutierrez: repool cp5006
  • 04:28 vgutierrez: depool and reboot cp5006

2020-05-14

  • 23:24 catrope@deploy1001: Synchronized static/images/project-logos/: Revert temporary 20k logo for vecwiki (T252770) (duration: 01m 06s)
  • 23:23 RoanKattouw: Ran namespaceDupes.php for T252343
  • 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create Gapura (Portal) namespace on jvwiki (T252343) (duration: 01m 06s)
  • 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.ub.uni-heidelberg.de and hq.eso.org to $wgCopyUploadDomains (T252600, T252726) (duration: 01m 07s)
  • 21:43 ryankemper: depooled wdqs2006 while lag recovers
  • 21:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:08 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:16 volans: moved codereview.tar.gz and with_r.tar.gz from miscweb1002 to cumin1001 to free space
  • 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: Allow plain text labels in side bar - T252727 (duration: 01m 06s)
  • 19:51 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 19:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:49 ryankemper: Depooled wqds1006 in preparation for impending wdqs data xfer
  • 18:36 Urbanecm: Morning SWAT done
  • 18:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 15adbbc: [thwikisource] Set ProofReadPage separator to an empty string (T252610) (duration: 01m 06s)
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 4b8399c: Undeploy graphoid from mediawikiwiki (T242855) (duration: 01m 05s)
  • 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f03a45c: Adding import to test wikis from mediawikiwiki (T242855) (duration: 01m 07s)
  • 17:03 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 1 member 1 - T252797
  • 16:55 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 3 member 1 - T252797
  • 16:51 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port set pic-slot 0 port 48 member 2 - T252797
  • 16:50 XioNoX: request virtual-chassis vc-port set pic-slot 1 port 2 member 1 - T252797
  • 16:42 XioNoX: request virtual-chassis vc-port delete pic-slot 1 port 2 member 1 - T252797
  • 16:36 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 0 port 48 member 2 - T252797
  • 15:59 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:25 XioNoX: disable asw2-d1-eqiad:et-1/1/0 - T251663
  • 14:39 mutante: kuai kuai is https://twitter.com/Arlieth/status/1257714333133357056 | https://en.wikipedia.org/wiki/Kuai_Kuai_culture
  • 13:31 _joe_: updating purged to 0.11 in eqiad,eqsin,esams
  • 12:47 vgutierrez: rolling upgrade ats to version 8.0.7-1wm7
  • 12:46 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 12:43 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 12:22 kormat: reverted iosched on pc1010 to `mq-deadline` T252761
  • 11:47 kormat: changed iosched on pc1010 to `none` as a test T252761
  • 11:07 matthiasmullie: EU swat done
  • 11:05 mlitn@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/WikibaseMediaInfo/: [MediaInfo] Enable media search for all users by default (duration: 01m 12s)
  • 11:04 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp3064
  • 10:31 fdans@deploy1001: Finished deploy [analytics/refinery@6f13979]: Regular analytics weekly train (duration: 17m 14s)
  • 10:14 fdans@deploy1001: Started deploy [analytics/refinery@6f13979]: Regular analytics weekly train
  • 09:58 elukey: remove matomo 3.11 from the main component of stretch-wikimedia
  • 09:56 elukey: upgrade matomo on matomo1001 to 3.13.3 (latest upstream) - T252741
  • 09:30 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 09:29 elukey: upload matomo-3.13.3 to thirdparty/matomo on stretch|buster-wikimedia
  • 09:22 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 08:57 elukey: imported gpg key 1FD752571FE36FF23F78F91B81E2E78B66FED89E in apt1001 (Matomo public debian repo)
  • 08:56 moritzm: installing Java security updates on Presto
  • 08:43 jayme: updated helm: 2.12.2-1 -> 2.16.7-1 on deploy[1,2]001 and contint1001. 2.12.2-4 -> 2.16.7-1 on contint2001
  • 08:39 jayme: imported helm 2.16.7-1 to main for jessie-wikimedia
  • 08:32 moritzm: installing Java security updates on Hadoop/AQS/Druid
  • 08:20 jayme@deploy2001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 08:00 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp5011
  • 07:03 moritzm: installing apt security updates
  • 06:33 ryankemper: Pooled wdqs2005 following successful test queries
  • 04:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:02 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:59 ryankemper: wdqs1005 has been de-pooled pending wdqs data xfer
  • 02:57 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 02:57 ryankemper: wdqs1004 was repooled after successful test queries
  • 02:55 ryankemper: wdqs2006 was repooled after successful test queries
  • 01:32 ryankemper: depooled wdqs2006 while waiting for lag to recover
  • 00:54 foks: change password for "Python eggs"
  • 00:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:31 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:08 twentyafterfour: phabricator update appears to be stable.
  • 00:05 twentyafterfour: updating phabricator. 1 patch + new translations. Expect only brief downtime.

2020-05-13

  • 23:46 cstone: SmashPig revision changed from cd1a49da5f to 2702b04329
  • 23:43 ejegg: updated payments-wiki from dabba1804c to 3c465cb11c
  • 23:36 ejegg: rolled back payments-wiki to dabba1804c
  • 23:29 ejegg: updated payment-wiki from dabba1804c to 3c465cb11c
  • 22:40 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:39 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper: Depooled wdqs1004 for subsequent wdqs data xfer
  • 22:29 ryankemper: Pooled wdqs2005 given that lag has returned to normal levels and the instance is responding to queries correctly
  • 22:26 ryankemper: Pooled wdqs1008 given that lag has returned to normal levels and the instance is responding to queries correctly
  • 21:30 elukey: powercycle analytics1055
  • 21:05 eileen: civicrm revision changed from cfb6101e39 to ed4c9522ac, config revision is 2eb75f8dff
  • 20:16 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T242430 Stop loading the ParsoidBatchAPI extension (duration: 01m 08s)
  • 19:09 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.32 (duration: 01m 05s)
  • 19:08 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32
  • 18:54 twentyafterfour: restarted php-fpm on phab1001
  • 18:53 thcipriani: restarting gerrit
  • 18:52 twentyafterfour: restarting apache on phab1001 for lack of a better idea
  • 18:50 herron: restarted kafka broker on kafka-main1001 for java security updates
  • 18:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 38db3e0: Update production wordmarks (T252143) (duration: 01m 07s)
  • 18:17 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 38db3e0: Update production wordmarks (T252143) (duration: 01m 09s)
  • 17:55 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:24 ryankemper: Manually depooled wdqs2005 while lag catches up following the data xfer
  • 17:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:18 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:12 urandom: restarted cassandra-c, restbase2017
  • 17:04 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 16:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 16:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 16:11 James_F: Running AbuseFilter updateVarDumps on group0 on mwmaint1002 T246539
  • 16:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:32 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp4032
  • 15:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:30 jayme: imported scap 3.14.0-1 to main for buster-wikimedia
  • 15:30 jayme: imported scap 3.14.0-1 to main for jessie-wikimedia
  • 15:29 ryankemper: Manually de-pooling `wdqs1008.eqiad.wmnet` in preparation for wdqs data transfer
  • 15:29 jayme: imported scap 3.14.0-1 to main for stretch-wikimedia
  • 15:26 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 15:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 15:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:55 _joe_: upgrading + restarting purged across ulsfo and codfw T133821
  • 14:50 filippo@deploy1001: Finished deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 T251222 (duration: 00m 10s)
  • 14:50 filippo@deploy1001: Started deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 T251222
  • 14:35 vgutierrez: upload trafficserver 8.0.7-1wm6 to apt.wm.o (buster) - T249335 T251537
  • 13:59 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:57 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:55 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 11:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.deutsche-digitale-bibliothek.de to the wgCopyUploadsDomains (T252296) (duration: 01m 06s)
  • 11:17 Amir1: EU SWAT is done
  • 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable wgLegacyJavaScriptGlobals on fawiki and wikidatawiki (T72470) (duration: 01m 06s)
  • 11:09 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:06 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Anchor RegExp for Data Bridge in Beta (BETA-ONLY) (duration: 01m 06s)
  • 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
  • 10:55 volans: imported tqdm 4.11.2-1 packages into buster-wikimedia component/spicerack
  • 10:34 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:09 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master T252182 (duration: 01m 05s)
  • 09:55 jbond42: deployed a fix to ferm-status script. unmanaged ferm rules may get removed
  • 09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:37 marostegui: Upgrade db2102 to the new 10.4.13 - T250666
  • 09:32 _joe_: installing purged 0.11 on cp2027 T133821
  • 09:21 _joe_: installing purged 0.11 on cp2028 T133821
  • 09:11 moritzm: re-enabling puppet
  • 09:08 mutante: rsyncing /home dirs from people.wikimedia.org to new backend people1002
  • 09:00 moritzm: disabling puppet temporarily
  • 08:53 _joe_: uploaded purged 0.11
  • 08:52 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1010 as pc1 master T252182 (duration: 01m 17s)
  • 07:42 jayme: imported helm 2.16.7-1 to main for stretch-wikimedia
  • 07:41 jayme: imported helm 2.16.7-1 to main for buster-wikimedia
  • 07:29 godog: roll-restart logstash in codfw/eqiad for configuration change
  • 07:14 elukey: upload spark2_2.4.4-bin-hadoop2.6-2 for buster/stretch on apt1001
  • 05:33 ryankemper: wdqs2004 was depooled ~3 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
  • 05:32 ryankemper: wdqs1003 was depooled ~6 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
  • 05:27 _joe_: restarting php-fpm on mw1374, children dying with SIGILL
  • 05:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 05:11 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
  • 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 05:10 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 04:52 kart_: Updated cxserver to 2020-05-11-082207-production (T250004)
  • 04:47 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:44 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 04:42 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 02:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:33 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer

2020-05-12

  • 23:09 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 23:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/includes/revisionlist/RevisionItemBase.php: Fix RevisionItemBase::getId to actually return an int, as intended - T252076 (duration: 01m 06s)
  • 19:55 dpifke@deploy1001: Finished deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - T238086 (duration: 00m 05s)
  • 19:55 dpifke@deploy1001: Started deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - T238086
  • 19:05 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.32
  • 18:41 legoktm: started codereview-archiver script in screen on mwmaint1002
  • 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:49 bblack: 'gdnsdctl replace' on all authdns to load new maxmind data
  • 17:43 bblack: updating maxmind database on puppetmasters (usually automated weekly; we're mid-cycle)
  • 17:10 James_F: Running AbuseFilter updateVarDumps on testwikis on mwmaint1002 T246539
  • 16:55 James_F: Running AbuseFilter updateVarDumps on closed wikis on mwmaint1002 T246539
  • 16:55 mstyles@deploy1001: Finished deploy [wdqs/wdqs@f617307]: v0.3.31 (duration: 14m 53s)
  • 16:40 mstyles@deploy1001: Started deploy [wdqs/wdqs@f617307]: v0.3.31
  • 16:35 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:34 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query
  • 15:15 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:15 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:14 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:13 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:13 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:04 moritzm: installing 4.9.118 Linux updates on Buster nodes (reboots happening later)
  • 15:02 moritzm: upgrading contint2001 to openjdk-8 u252
  • 15:01 godog: bounce pybal on lvs2010 and lvs2009 - T252186
  • 14:40 moritzm: imported openjdk-8 u252 forward port for buster-wikimedia component/jdk8
  • 14:40 ema: rolling thumbor upgrade to 2.8-1+deb10u1 T252509 T219569 T236240
  • 14:39 andrewbogott: rebuilding cloudcontrol1003 and 1004
  • 14:38 hashar: 1.35.0-wmf.22 is on test wikis. Will be pushed to group0 later today during the american window (19:00 - 21:00 UTC) # T249964
  • 14:34 ema: thumbor2001: repool
  • 14:33 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - Test everywhere, SearchSatisfaction on testwiki only - T249261 (duration: 01m 06s)
  • 14:33 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.8-1+deb10u1 T252509 T219569 T236240
  • 14:23 moritzm: installing Java security updates on WDQS hosts
  • 14:20 hashar@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.32 (duration: 72m 04s)
  • 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:00 ema: thumbor2001: depool due to minor bug in 2.7-1+deb10u1 T252509 T219569 T236240
  • 13:54 ema: thumbor2001: pool thumbor 2.7-1+deb10u1 for prod traffic T252509 T219569 T236240
  • 13:50 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.7-1+deb10u1 T252509 T219569 T236240
  • 13:42 jbond42: disable puppet on all CP hosts to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/583342
  • 13:36 kormat: reimaging pc2007 to buster T252182
  • 13:36 moritzm: rebooting netflow* hosts for kernel update
  • 13:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:33 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm5 - T249335
  • 13:31 moritzm: rebooting deneb for kernel update
  • 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:08 hashar@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.32
  • 13:05 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.28 (duration: 23m 47s)
  • 12:37 moritzm: installing iputils update from Buster point release
  • 12:08 hashar: Cutting branch 1.35.0-wmf.32 # T249964
  • 12:08 gehel: restart blazegraph + updater on wdqs2002 - JVM upgrade
  • 11:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 11:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:55 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp5011 - T249335
  • 10:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:43 kormat: reimaging pc2010 to buster T252182
  • 10:30 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp4032 - T249335
  • 10:30 ema: rolling thumbor upgrade to 2.6-1+deb10u1 T226707
  • 10:19 ema: repool thumbor2001 with upgraded python-thumbor-wikimedia
  • 10:13 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.6-1+deb10u1
  • 10:04 godog: update compiler facts
  • 09:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:34 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 09:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:29 filippo@cumin1001: conftool action : set/pooled=yes:weight=100; selector: cluster=thanos
  • 09:07 moritzm: rebooting contint2001 for kernel update
  • 09:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:46 godog: reboot thanos hosts for kernel upgrade
  • 07:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:41 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:12 moritzm: rebooting the IDP hosts, SSO sessions will need to be renewed
  • 07:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:56 vgutierrez: upload trafficserver 8.0.7-1wm4 to apt.wm.o (buster) - T242767 T249335
  • 05:29 marostegui: Restart docker-report-releng on deneb
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only=off for maintenance T251502', diff saved to https://phabricator.wikimedia.org/P11180 and previous config saved to /var/cache/conftool/dbconfig/20200512-050339-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T251502', diff saved to https://phabricator.wikimedia.org/P11179 and previous config saved to /var/cache/conftool/dbconfig/20200512-050054-marostegui.json
  • 04:46 marostegui: Stop mysql on labsdb1011 to transfer its content - T249188
  • 02:14 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:45 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-05-11

  • 21:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 21:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 19:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:03 Zoranzoki21: T235414 is wrong task number, T235415 is correct
  • 19:02 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.bollywoodhungama.in and *.britishmuseum.org to $wgCopyUploadDomains (T235414, T251882) (duration: 00m 57s)
  • 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove "Create a book" link on enwiki (T241683) (duration: 00m 57s)
  • 18:44 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable modern Vector on officewiki, reveal preference on testwiki (T251285) (duration: 00m 58s)
  • 18:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:40 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add tw-photometa.de to $wgCopyUploadsDomains (T252141) (duration: 00m 58s)
  • 18:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:28 catrope@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Drop mainpage special casing for scowiki and itwiki (T252048, T252065) (duration: 00m 58s)
  • 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:20 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/includes/Revision/RevisionStore.php: T252156 T212428 RevisionStore: fall back to master db if main slot is missing (duration: 00m 58s)
  • 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:30 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/AbuseFilter/maintenance/updateVarDumps.php: updateVarDumps: wait for replication after each batch (duration: 00m 58s)
  • 17:27 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/skins/Vector/includes/VectorTemplate.php: T251521 Correctly populate the language variants drop-down rather than breaking early (duration: 00m 59s)
  • 17:24 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/skins/Vector/includes/VectorTemplate.php: T251521 Correctly populate the language variants drop-down rather than breaking early (duration: 00m 59s)
  • 17:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
  • 16:47 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 04m 43s)
  • 16:42 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 16:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
  • 16:40 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 16:34 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
  • 16:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: mediawikiwiki to 1.35.0-wmf.31 (T249963) for testing T252179
  • 16:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:06 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikimediaMaintenance: Revert "Remove use of WikiPage::doEditContent" (duration: 01m 06s)
  • 16:05 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/UploadWizard: Revert "Remove use of WikiPage::doEditContent" (duration: 01m 06s)
  • 16:04 hnowlan@deploy1001: Finished deploy [changeprop/deploy@82276cb]: Enabling consumption of purges topic (duration: 01m 58s)
  • 16:04 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Babel: Revert "Remove use of WikiPage::doEditContent" (duration: 01m 07s)
  • 16:03 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Translate: Revert "Remove uses of WikiPage::doEditContent" (duration: 01m 08s)
  • 16:02 hnowlan@deploy1001: Started deploy [changeprop/deploy@82276cb]: Enabling consumption of purges topic
  • 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:52 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:49 cdanis@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=eventgate-analytics.*
  • 15:45 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:42 brennen: syncing backports to 1.35.0-wmf.31 (T249963) for T252179
  • 15:42 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:01 moritzm: installing puma security updates
  • 14:29 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:44 vgutierrez: upgrade ATS to 8.0.7-1wm4 in cp4032 - T249335
  • 13:36 hashar: Rolling back CI system switch to previous known state # T224591
  • 13:20 marostegui: Upgrade mysql package on s4 master in preparation for tomorrow's maintenance T251502
  • 12:50 hashar: Pointing CI Jenkins to contint2001 Gearman server T224591
  • 12:46 mutante: contint2001 - chown -R jenkins-slave:jenkins-slave /srv/.git
  • 12:45 mutante: contint1001 - rsync -avz --delete /srv/.git/ rsync://contint2001.wikimedia.org/ci--srv/.git/
  • 12:43 mutante: contint1001 - rsync -avz --delete /srv/.git/ rsync://contint2001.wikimedia.org/ci--srv-/org/.git/
  • 12:40 mutante: contint1001 - rsync -avz --delete /srv/org/wikimedia/integration/ rsync://contint2001.wikimedia.org/ci--srv-/org/wikimedia/integration/
  • 12:24 mutante: contint2001 - find /var/lib/jenkins/ -group bacula -exec chown jenkins:jenkins {} \;
  • 12:21 mutante: contint2001 - find /var/lib/jenkins/ -user statsite -exec chown jenkins {} \;
  • 12:19 mutante: contint2001 - chown -R jenkins:jenkins /srv/jenkins/*
  • 12:19 mutante: contint1001 - rsync -avz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/
  • 12:17 mutante: contint1001 - rsync -avz --delete /var/lib/jenkins/ rsync://contint2001.wikimedia.org/ci--var-lib-jenkins-
  • 12:14 hashar: shutting down Zuul and Jenkins for system switch # T224591
  • 12:02 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:59 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:32 Lucas_WMDE: EU SWAT done
  • 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/WikimediaEvents/: SWAT: Update Banner Interaction Schema (T250791, wmf.30) (duration: 01m 08s)
  • 11:23 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikimediaEvents/: SWAT: Update Banner Interaction Schema (T250791, wmf.31) (duration: 01m 07s)
  • 11:14 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 595478|Revert limit adjustment for Chinese translation with ContentTranslation (T252371) (duration: 01m 09s)
  • 10:58 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (595498) (duration: 01m 06s)
  • 10:56 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (595498) (duration: 01m 07s)
  • 10:15 vgutierrez: upload trafficserver 8.0.7-1wm3 to apt.wm.o (buster) - T242767 T249335
  • 09:44 mutante: contint2001 - find /var/lib/jenkins -user statsite -exec chown jenkins:jenkins {} \;
  • 09:31 hashar: contint2001 started zuul-merger again (had permission issues in /var/lib/zuul )
  • 09:07 mutante: contint1001 - rsync -avpz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/ (T224591)
  • 09:05 mutante: contint2001 - mkdir /srv/jenkins
  • 08:55 hashar: contint2001 stopping zuul-merger , permission problem
  • 08:46 godog: bounce ferm on kubernetes1007 to resolve icinga UNKNOWN
  • 08:40 mutante: rsyncing /var/lib/jenkins from contint1001 to contint2001 with --delete
  • 08:32 mutante: rsynced data from contint1001 to contint2001 - pathes per T224591#6039192 for the migration later today
  • 08:30 ema: cp3050: upgrade atskafka to 0.6 T237993
  • 08:30 _joe_: removing the iptables DROP rule on mc1020 T251378
  • 07:54 moritzm: installing squid security updates
  • 07:21 moritzm: updated buster netboot images to 10.4 (updated to latest point release)
  • 07:09 _joe_: dropping requests to mc1020 via a firewall rule T251378
  • 06:04 elukey: restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system

2020-05-10

  • 12:18 marostegui: Start event scheduler on db1115 after a massive delete - T252324
  • 11:05 marostegui: Stop event scheduler on db1115 to perform a massive delete - T252324
  • 10:27 dcausse: restarting blazgraph on wdqs1004: T242453
  • 09:56 marostegui: Change scaling_governor from powersave to performance on db1115 - T252324
  • 09:25 marostegui: Stop MySQL and restart db1115 - T252324
  • 08:50 marostegui: Restart mysql on db1115 to change buffer pool size from 20GB to 40GB T252324 (
  • 08:44 elukey: Power cycle analytics1052 after eno1 issue
  • 08:01 marostegui: Disable unused events like %_schema T252324 T231185
  • 07:11 marostegui: Restart mysql on db1115 T231185
  • 07:11 marostegui: Truncate tendril. processlist_query_log T231185

2020-05-08

  • 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 T251598
  • 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view on labsdb1012 T251598
  • 21:33 bstorm_: cleaning up wb_terms_no_longer_updated view on labsdb1009 T251598
  • 21:06 ottomata: running prefered replica election for kafka-jumbo to get preferred leaders back after reboot of broker earlier today - T252203
  • 19:16 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:12 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 19:07 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 18:12 andrewbogott: reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for T252121
  • 17:59 marostegui: Extend /srv by 500G on labsdb1011 T249188
  • 16:55 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:53 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:36 ottomata: starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - T252203
  • 15:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:27 ottomata: stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - T252203
  • 14:50 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s)
  • 14:50 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only
  • 14:05 akosiaris: T243106 undo experiment with DROP iptable rules this time around. Use mw1331, mw1348
  • 13:22 vgutierrez: rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - T249335
  • 13:20 akosiaris: T243106 redo experiment with DROP iptable rules this time around. Use mw1331, mw1348
  • 13:16 akosiaris: T243106 undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure.
  • 12:49 akosiaris: T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348
  • 12:49 akosiaris: T243106 redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle
  • 11:49 hnowlan: restarting cassandra on restbase2009 for java updates
  • 11:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:08 akosiaris: repool eqiad eventgate-analytics. Test concluded
  • 11:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 09:54 mutante: disabling puppet on puppetmasters temporarily to switch them carefully to use httpd module and not apache module which we want to get rid of
  • 09:52 akosiaris: depool eqiad eventgate-analytics for a test involving reinitializing the eqiad kubernetes cluster
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 09:51 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 09:45 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-analytics.*
  • 08:20 vgutierrez: rolling restart of ats-tls on esams - T249335
  • 07:19 vgutierrez: ats-tls restart on cp3050 and cp3052 (max_connections_active_in experiment) - T249335
  • 07:07 mutante: phabricator rmdir /var/run/phd/pid - empty and now unused
  • 07:01 moritzm: installing php5 security updates
  • 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:10 marostegui: Upgrade pc1010
  • 00:30 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert all wikis except test to 1.35.0-wmf.30 for T252179
  • 00:19 brennen: rolling 1.35.0-wmf.31 train back to group0 for T252179

2020-05-07

  • 22:36 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
  • 22:31 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Scribunto/includes/engines/LuaCommon/TitleLibrary.php: Handle RevisionAccessException with try-catch (T252156) (duration: 01m 08s)
  • 20:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 20:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingStreamNames: set initial stream names, as yet unused - T238230 (duration: 01m 07s)
  • 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.35.0-wmf.30
  • 19:09 brennen: rolling 1.35.0-wmf.31 back to group1
  • 19:09 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki1001 - T252010
  • 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
  • 18:25 ppchelko@deploy1001: Finished deploy [changeprop/deploy@383fba5]: Enable both purging types T252142 (duration: 01m 17s)
  • 18:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@383fba5]: Enable both purging types T252142
  • 18:15 Urbanecm: Morning SWAT done
  • 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 899c175: Update project icons to refreshed SVGs (T249047; part 2/2) (duration: 01m 06s)
  • 18:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 899c175: Update project icons to refreshed SVGs (T249047; part 1/2) (duration: 01m 08s)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 54bd2f1: Add the investigate right to the checkuser group on testwiki (T251932) (duration: 01m 08s)
  • 17:50 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:46 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:44 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 17:44 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: (no justification provided) (duration: 05m 31s)
  • 17:38 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: (no justification provided)
  • 17:18 ejegg: updated payments-wiki from afb84cc391 to dabba1804c
  • 16:46 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption (duration: 01m 05s)
  • 16:45 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption
  • 16:42 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:36 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:29 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:26 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic (duration: 01m 45s)
  • 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:24 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic
  • 16:23 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic (duration: 00m 24s)
  • 16:23 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic
  • 15:59 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:51 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:36 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Collection/includes/Specials/SpecialCollection.php: T251460 Set skin on BaseTemplates if you are using getSkin (duration: 01m 08s)
  • 15:28 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:27 vgutierrez: rolling restart of ats-tls on text@esams - T249335
  • 15:26 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:12 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:09 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:59 moritzm: imported component/facter3 for stretch-wikimedia into "main"
  • 14:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:50 moritzm: imported component/puppet5 for stretch-wikimedia into "main"
  • 14:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 14:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:17 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:06 moritzm: imported component/facter3 for jessie-wikimedia into "main"
  • 13:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 13:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:04 jynus: disabling puppet on all db hosts to control deployment of new paging alert T172489
  • 13:02 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers (duration: 02m 43s)
  • 13:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:59 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers
  • 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:43 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI (duration: 16m 20s)
  • 12:27 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI
  • 12:13 addshore@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Wikibase: gerrit:594920 T252079 Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s)
  • 12:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 moritzm: imported component/puppet5 for jessie-wikimedia into "main"
  • 11:31 jbond42: enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102
  • 11:10 matthiasmullie: EU swat done
  • 11:07 mlitn@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s)
  • 10:07 moritzm: installing Java security updates on restbase/sessionstore
  • 09:11 elukey: roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary)
  • 08:32 moritzm: upgrading restbase-dev to latest OpenJDK security update
  • 08:06 jynus: setting pc2007, pc2009 as read-write
  • 07:44 godog: further decrease weight for ms-be10[678] - T252008
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:33 elukey: restart hadoop yarn nodemanager on analytics1071
  • 05:22 marostegui: Reimage db2078
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only=off for maintenance T251158', diff saved to https://phabricator.wikimedia.org/P11167 and previous config saved to /var/cache/conftool/dbconfig/20200507-050419-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only for maintenance T251158', diff saved to https://phabricator.wikimedia.org/P11166 and previous config saved to /var/cache/conftool/dbconfig/20200507-050046-marostegui.json
  • 02:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.30 for T252079
  • 02:55 brennen: reverting group1 to 1.35.0-wmf.30 for T252079
  • 00:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2020-05-06

  • 23:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable GrowthExperiments guidance on testwiki (duration: 01m 07s)
  • 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable password-reset-update on Wikipedias (T245791) (duration: 01m 07s)
  • 22:22 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/includes/revisionlist/RevisionItem.php: RevisionItem: Fix providing timestamp in getRevisionLink (duration: 01m 09s)
  • 21:45 andrewbogott: updating puppet compiler facts
  • 21:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:05 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:04 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:35 ejegg: updated Fundraising CiviCRM from b15b2cfbb5 to cfb6101e39
  • 19:08 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 01m 08s)
  • 19:07 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
  • 19:03 brennen: CORRECTION: 1.35.0-wmf.31 train unblocked (T249963), rolling forward to group1
  • 19:03 brennen: 1.35.0-wmf.31 train unblocked (T249963), rolling forward to group0
  • 18:58 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594778/ fixes UBN T252052 (duration: 01m 09s)
  • 18:54 volans: upgraded spicerack to spicerack_0.0.34-1_amd64.deb on cumin[12]001
  • 18:45 volans: uploaded spicerack_0.0.34-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 18:44 volans@deploy1001: Finished deploy [homer/deploy@8224f0a]: Release v0.2.2 (duration: 00m 18s)
  • 18:44 volans@deploy1001: Started deploy [homer/deploy@8224f0a]: Release v0.2.2
  • 18:28 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594768/ fixes T252043 (duration: 01m 08s)
  • 17:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 16:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:41 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 15:27 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 13:36 mutante: puppetmaster - revoking cert for webserver-misc-apps , recreating it with static-codereview.wikimedia.org as addiitonal SAN (T243056)
  • 13:32 hashar: Restarting CI Jenkins
  • 13:27 mutante: puppetmaster - revoking cert for webserver-misc-static, not used anymore, merged into webserver-misc-apps
  • 13:27 moritzm: installing graphicsmagick security updates
  • 13:26 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki2001 - T252010
  • 13:25 XioNoX: add routinator 3000 0.7.0 to buster-wikimedia - T252010
  • 13:19 ema: cp: upgrade purged to v0.10
  • 13:08 godog: start swift decom ms-be101[678] - T252008
  • 11:22 kart_: EU SWAT done.
  • 11:13 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 594668|Enable ContentTranslation in Armenian WP as a default tool (T249229) (duration: 01m 08s)
  • 10:27 ema: cp2027: test purged v0.10
  • 10:20 moritzm: restarting apache on dbmonitor/grafana/miscweb/graphite/netmon to pick up openldap update
  • 10:00 moritzm: installing remaining openldap security updates (client-side libs, tools)
  • 09:52 jbond42: enable rember me feature of CAS
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 and remove db1103:3314 from vslow in s4', diff saved to https://phabricator.wikimedia.org/P11159 and previous config saved to /var/cache/conftool/dbconfig/20200506-093940-marostegui.json
  • 09:12 marostegui: Upgrade package on s3 and s7 master (db1123 and db1086) in preparation for tomorrow's restart - T251158
  • 08:56 jbond42: restarting ps1-a4-eqiad.mgmt.eqiad.wmnet.
  • 08:53 jynus: kill FTWRL on db2101
  • 08:43 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting change on mw1407 T99740 (duration: 01m 16s)
  • 08:02 _joe_: restarted php-fpm with tweaked parameters on mw1407, now briefly pooling for traffic (T99740)
  • 07:38 kormat@cumin1001: dbctl commit (dc=all): 'Set es1023 (es5 master) to 0 weight after reimaging es1024 T250666', diff saved to https://phabricator.wikimedia.org/P11158 and previous config saved to /var/cache/conftool/dbconfig/20200506-073856-kormat.json
  • 07:32 vgutierrez: downgrade to ATS 8.0.7-1wm3 on cp4026, cp4031, cp5006 and cp5011
  • 06:00 elukey: powercycle analytics1060 - host stuck - T251973
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1103:3314 in vslow on s4 while db1121 is out T250055', diff saved to https://phabricator.wikimedia.org/P11157 and previous config saved to /var/cache/conftool/dbconfig/20200506-050340-marostegui.json
  • 05:02 marostegui: Deploy schema change on db1121

2020-05-05

  • 23:44 catrope@deploy1001: Synchronized wmf-config/flaggedrevs.php: Restore the reviewer group on fawiki (T249643) (duration: 01m 06s)
  • 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3) (duration: 00m 11s)
  • 23:22 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3)
  • 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 14s)
  • 23:21 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
  • 23:21 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 20s)
  • 23:20 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
  • 22:00 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: T251952 take 2 (duration: 01m 06s)
  • 21:57 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: T251952 (duration: 01m 05s)
  • 21:55 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/SpecialNewpages.php: T251950 (duration: 01m 06s)
  • 20:02 herron: added ryankemper to wmf and ops ldap groups T251572
  • 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 08s)
  • 19:38 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 25m 18s)
  • 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
  • 19:13 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 19:12 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.31 (duration: 97m 23s)
  • 19:02 brennen: train status: 1.35.0-wmf.31: presently pressing enter through scap-cdb-rebuild; at 8% (T249963, T223287)
  • 18:39 cdanis: depool mw2221 for some manual testing
  • 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 09s)
  • 18:35 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 18:34 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 18m 54s)
  • 18:15 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
  • 17:35 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.31
  • 16:48 brennen: 1.35.0-wmf.31 was branched at 4d3fed3 for T249963
  • 16:34 brennen: triggering branch cut for 1.35.0-wmf.31 (T249963) via https://releases-jenkins.wikimedia.org/job/MediaWiki%20Train%20Branch%20Cut/build?delay=0sec
  • 16:18 brennen: notice: planning branch cut for 1.35.0-wmf.31 (T249963) at 16:30 UTC
  • 15:47 cstone: SmashPig revision changed from 8c30ed7fe5 to cd1a49da5f
  • 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 100% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11153 and previous config saved to /var/cache/conftool/dbconfig/20200505-153843-kormat.json
  • 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:58 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb (duration: 01m 31s)
  • 14:56 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb
  • 14:45 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 14:43 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 14:32 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 75% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11149 and previous config saved to /var/cache/conftool/dbconfig/20200505-143158-kormat.json
  • 13:46 akosiaris: deploy cxserver chart 0.0.15 to staging, codfw, eqiad. T219921
  • 13:45 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:41 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:41 hashar: Updated Jenkins job https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler to have it defined in JJB # T97513
  • 13:36 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:18 vgutierrez: upgrade ATS to version 8.1 () on cp4026, cp4032, cp5006 and cp5011
  • 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 50% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11147 and previous config saved to /var/cache/conftool/dbconfig/20200505-131520-kormat.json
  • 12:52 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 at 25% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11145 and previous config saved to /var/cache/conftool/dbconfig/20200505-125254-kormat.json
  • 12:37 XioNoX: push pfw policy - T251769
  • 12:07 jbond42: updating cas login page
  • 12:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:03 moritzm: rolling restart of apache on puppetboard* to pick up OpenLDAP update
  • 11:47 moritzm: rolling restart of apache on kibana hosts
  • 11:41 mutante: LDAP - added eamedia to wmf group (T251358)
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 T248086', diff saved to https://phabricator.wikimedia.org/P11144 and previous config saved to /var/cache/conftool/dbconfig/20200505-113152-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T248086', diff saved to https://phabricator.wikimedia.org/P11143 and previous config saved to /var/cache/conftool/dbconfig/20200505-113100-marostegui.json
  • 11:30 marostegui: Drop T248086_wb_terms table on labsdb hosts - T248086
  • 11:26 moritzm: rolling restart of apache/FPM on mw1261-mw1265
  • 11:22 kart_: EU SWAT done.
  • 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 592479|Adjust ContentTranslation MT threshold for Chinese WP to 70% (T246383) (duration: 01m 01s)
  • 11:01 moritzm: installing remaining openldap security updates (client-side libs, tools)
  • 11:00 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1024 for reimaging, add es1023 (master) for reading in the meantime T250666', diff saved to https://phabricator.wikimedia.org/P11141 and previous config saved to /var/cache/conftool/dbconfig/20200505-110031-kormat.json
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 T248086', diff saved to https://phabricator.wikimedia.org/P11140 and previous config saved to /var/cache/conftool/dbconfig/20200505-104540-marostegui.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 T248086', diff saved to https://phabricator.wikimedia.org/P11139 and previous config saved to /var/cache/conftool/dbconfig/20200505-104441-marostegui.json
  • 10:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:23 arturo: copy prometheus-rabbitmq-exporter v0.4 from stretch-wikimedia to buster-wikimedia in apt1001 (T251660)
  • 10:18 arturo: copy prometheus-pdns-exporter v0.5.1 from stretch-wikimedia to buster-wikimedia in apt1001 (T251575)
  • 10:16 mutante: temp disabling puppet on all ganeti hosts to carefully deploy change related to rapi cert location
  • 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:36 moritzm: removing boron.eqiad.wmnet
  • 09:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 09:03 gehel: restarting wdqs updater on all servers
  • 08:53 moritzm: installing Java security updates on releases*
  • 08:44 kormat: reimaging es1024 to buster T250666
  • 08:27 ema: cp2028 and cp2030 (both upload): varnish-fe restart to clear cache and evaluate 'exp' admission policy T144187 T249809
  • 08:26 moritzm: upgrading slapd on serpens/seaborgium
  • 08:19 ema: cp2027 and cp2029 (both text): varnish-fe restart to clear cache and evaluate 'exp' admission policy T144187 T249809
  • 08:08 moritzm: installing Java security updates on notebook/stat hosts
  • 07:54 gehel@deploy1001: Finished deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22 (duration: 04m 18s)
  • 07:50 gehel@deploy1001: Started deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22
  • 07:36 zpapierski@deploy1001: Started deploy [wdqs/wdqs@d37a059]: fix for the duplicated jars
  • 06:59 addshore: depool wdqs1006 heavy lag
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only=off for maintenance T251154', diff saved to https://phabricator.wikimedia.org/P11133 and previous config saved to /var/cache/conftool/dbconfig/20200505-052334-marostegui.json
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only for maintenance T251154', diff saved to https://phabricator.wikimedia.org/P11132 and previous config saved to /var/cache/conftool/dbconfig/20200505-052058-marostegui.json
  • 05:19 marostegui: Start s5 and s6 maintenance - T251154
  • 04:39 marostegui: Restart mysql on tendril host: db1115 - T231769

2020-05-04

  • 23:38 mstyles@deploy1001: Finished deploy [wdqs/wdqs@6518a8d]: v.0.3.26 (duration: 14m 39s)
  • 23:37 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use namespaced EventBus classes (duration: 00m 57s)
  • 23:35 reedy@deploy1001: Synchronized wmf-config/logging.php: Use namespaced EventBus classes (duration: 00m 56s)
  • 23:33 reedy@deploy1001: Synchronized rpc/RunSingleJob.php: Use namespaced EventBus classes (duration: 00m 58s)
  • 23:29 reedy@deploy1001: Synchronized wmf-config/logging.php: Replace AuthManagerStatsdHandler with WikimediaEventsAuthManagerStatsdHandler::class (duration: 00m 57s)
  • 23:23 mstyles@deploy1001: Started deploy [wdqs/wdqs@6518a8d]: v.0.3.26
  • 22:42 sbassett@deploy1001: Synchronized private/PrivateSettings.php: T251835: Restore dc752af (duration: 00m 57s)
  • 22:16 eileen: process-control config revision is 2eb75f8dff
  • 22:06 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Partial mitigation for T250887 (duration: 00m 57s)
  • 21:45 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Revert partial mitigation for T250887 (duration: 00m 57s)
  • 21:41 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deploy partial mitigation for T250887 (duration: 00m 57s)
  • 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - T249822, T238086 (duration: 00m 05s)
  • 18:19 dpifke@deploy1001: Started deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - T249822, T238086
  • 18:16 Urbanecm: Morning SWAT done
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c04fbdd: Adding upload_by_url user right to all registered users on Commons (T251474) (duration: 00m 57s)
  • 18:11 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/DiscussionTools/includes/DiscussionToolsHooks.php: SWAT: b85fc16: Enable on all ExtraSignaturesNamespaces (T249036) (duration: 01m 00s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 18c1efb: Load DiscussionTools on en.wiki (T249376) (duration: 00m 58s)
  • 17:57 XioNoX: configure singtel interface on cr1-eqsin
  • 17:36 volans: upgraded spicerack on cumin[12]001 to 0.0.33-1
  • 17:02 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [2252f9a] (duration: 00m 09s)
  • 17:02 joal@deploy1001: Started deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [2252f9a]
  • 17:01 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [2252f9a] (duration: 16m 45s)
  • 16:44 joal@deploy1001: Started deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [2252f9a]
  • 16:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.30
  • 15:59 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 05s)
  • 15:58 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30
  • 15:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 15:53 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
  • 15:53 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 15:52 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 15:52 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 15:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2025 after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11128 and previous config saved to /var/cache/conftool/dbconfig/20200504-154747-kormat.json
  • 15:45 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/libs/rdbms/database/DatabaseMysqlBase.php: T251457 rdbms: don't treat lock() as a write operation (duration: 01m 04s)
  • 15:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/resources/src/mediawiki.diff.styles/diff.less: T250393 Follow-up I07dd6f7: Fix font size in diff (duration: 01m 05s)
  • 15:34 volans: uploaded spicerack_0.0.33-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 15:26 volans: deploy1001: deleted old .hhvm.hhbc files (/home/*/.hhvm.hhbc) https://phabricator.wikimedia.org/P11127
  • 15:23 volans: deploy1001: deleted old .hhvm.hhbc files moved from tin (/home/*/home-tin/.hhvm.hhbc) https://phabricator.wikimedia.org/P11126
  • 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 fully after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11125 and previous config saved to /var/cache/conftool/dbconfig/20200504-151243-kormat.json
  • 15:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints (duration: 14m 36s)
  • 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [3396279] (duration: 00m 10s)
  • 15:05 joal@deploy1001: Started deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [3396279]
  • 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [3396279] (duration: 15m 07s)
  • 15:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:57 ppchelko@deploy1001: Started deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints
  • 14:50 joal@deploy1001: Started deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [3396279]
  • 14:19 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 fully and db1101:3318 to 75% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11123 and previous config saved to /var/cache/conftool/dbconfig/20200504-141919-kormat.json
  • 14:15 XioNoX: add static nat for fran1001 - T251763
  • 13:50 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2025 for reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11122 and previous config saved to /var/cache/conftool/dbconfig/20200504-135039-kormat.json
  • 13:34 kormat: reimaging es2025 to buster T250666
  • 13:27 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 some more after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11121 and previous config saved to /var/cache/conftool/dbconfig/20200504-132744-kormat.json
  • 13:02 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T248664 Stop setting legacy wmgWikibase(Repo/Client)Repositories for TEST wikis (duration: 01m 06s)
  • 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11120 and previous config saved to /var/cache/conftool/dbconfig/20200504-124659-kormat.json
  • 12:10 marostegui: Temporary enable slow query log on db1099:3311 - T206103
  • 12:09 Amir1: EU SWAT is done
  • 11:53 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increase wmgMemoryLimit from 660MB to 666MB (duration: 01m 06s)
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 T206103 after removing tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11119 and previous config saved to /var/cache/conftool/dbconfig/20200504-114727-marostegui.json
  • 11:46 tgr@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: Help panel: Check if guidance feature flag is set before loading mobile peek (T251589) (duration: 01m 06s)
  • 11:46 marostegui: Remove index tmp_2 from recentchanges on db1099:3311 T206103
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T206103 to remove tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11118 and previous config saved to /var/cache/conftool/dbconfig/20200504-114539-marostegui.json
  • 11:43 tgr@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: Help panel: Check if guidance feature flag is set before loading mobile peek (T251589) (duration: 01m 10s)
  • 11:38 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
  • 11:30 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 4d00236: Enable cross-project search on frwikibooks (T251683) (duration: 01m 05s)
  • 11:25 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/elwikiversity*.png (T251050)
  • 11:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 64556ba: Correct typo in Greek Wikiversity logo (T248391) (duration: 01m 06s)
  • 11:20 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/jvwiki*.png (T251050)
  • 11:20 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 3b8c618: Update jvwiki logos (T251050) (duration: 01m 05s)
  • 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cc94ea7: Enable VisualEditor for more namespaces on vecwiki (T250419) (duration: 01m 07s)
  • 10:49 arturo: update packages in buster-wikimedia | thirdparty/kubead-k8s-1-15 and thirdparty/kubeadm-k8s-1-16 (T250866)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 01m 05s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 01m 29s)
  • 10:39 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm3
  • 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:30 arturo: running `aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished` to cleanup buster-wikimedia|thirdparty/kubeadm-k8s (T250866)
  • 09:46 vgutierrez: upload trafficserver 8.0.7-1wm2 to apt.wm.o (buster)
  • 09:22 kormat: reimaging db1101 to buster T250666
  • 08:50 XioNoX: configure BGP peering with AS132203
  • 08:20 godog: add 50G to prometheus-ops on prometheus100[34]
  • 08:17 marostegui: Deploy schema change on s5 codfw - T251188
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for reimage', diff saved to https://phabricator.wikimedia.org/P11113 and previous config saved to /var/cache/conftool/dbconfig/20200504-075148-marostegui.json
  • 07:31 marostegui: Drop unused flagged* tables from mediawikiwiki - T248298
  • 07:26 moritzm: removed jmorgan from cn=wmf
  • 07:24 marostegui: Install 10.1.43-2 on s5 (db110) and s6 (db1131) masters in preparations for tomorrow's restart - T251154
  • 07:24 moritzm: removed Kerberos principal for lexnasser and jmorgan
  • 07:23 moritzm: removed lexnasser from cn=nda
  • 07:07 elukey: execute ifdown eno1; ifup eno1 on analytics1052 - interface neg speed flapping
  • 06:41 elukey: upload prometheus-druid-exporter 0.8-1 to stretch-wikimedia

2020-05-03

  • 22:52 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/593929
  • 22:42 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/591459
  • 21:37 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service (duration: 04m 22s)
  • 21:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service

2020-05-02

  • 07:49 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(49|5[0-9]|6[0-2])\.eqiad\.wmnet
  • 07:08 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 0 member 1
  • 02:36 volker-e@deploy1001: Finished deploy [design/style-guide@f0d467b]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:36 volker-e@deploy1001: Started deploy [design/style-guide@f0d467b]: Deploy design/style-guide:

2020-05-01

  • 19:56 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw13(5[6-9]|6[0-2]).eqiad.wmnet
  • 18:57 gehel: restart blazegraph on wdqs1006 - T242453
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11110 and previous config saved to /var/cache/conftool/dbconfig/20200501-142354-marostegui.json
  • 14:18 hknust: holger@mwmaint1002 finished renameInvalidUsernames.php (fail) as part of T219279
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11109 and previous config saved to /var/cache/conftool/dbconfig/20200501-140603-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11108 and previous config saved to /var/cache/conftool/dbconfig/20200501-134707-marostegui.json
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly warm up db1104 - T232446', diff saved to https://phabricator.wikimedia.org/P11107 and previous config saved to /var/cache/conftool/dbconfig/20200501-132804-marostegui.json
  • 13:06 hknust: holger@mwmaint1002 Starting renameInvalidUsernames.php as part of T219279
  • 13:01 vgutierrez: rolling restart of ats-tls in text@esams - T249335
  • 12:24 mutante: mw230* - rolling restart of php-fpm - icinga warnings about opcache health in codfw
  • 12:20 mutante: mw2376 - restarting php-fpm - icinga warnings about opcache health in codfw
  • 12:07 mutante: notebook1004 - puppet was failed due to removal of jmorgan while one of his processes was still running. "change to absent failed.. user jmorgan currently used by process 29038". killing 29038, running puppet T251560
  • 12:05 mutante: notebook1003 - puppet was failed due to removal of jmorgan while one of his processeswas still running. "change to absent failed.. user jmorgan currently used by porcess 3288". killing 3288, running puppet T251560
  • 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 _joe_: depooled all servers in the app pool in rack D1
  • 08:54 oblivian@cumin1001: conftool action : set/pooled=no:weight=30; selector: name=mw13(49|5[0-5])\.eqiad\.wmnet
  • 08:50 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw13(49|5[0-5])\.eqiad\.wmnet
  • 08:48 _joe_: repooling mw1407 with LCStoreStaticArray, increased opcache, puppet disabled
  • 08:45 _joe_: repooling mw1409
  • 08:39 _joe_: repool mw1352
  • 08:37 _joe_: depooling mw1352
  • 07:44 marostegui: Copy wikireplica dump from labsdb1009 to labsdb1011 - T249188
  • 01:36 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service (duration: 04m 33s)
  • 01:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service