Server Admin Log/Archive 38

2019-08-31

13:33 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: 88ba4f8f4d49 (duration: 00m 55s)

2019-08-30

22:30 ejegg: disabled fundraising targetsmart import jobs
22:09 gehel: regenerating tiles around Lake Huron for maps eqiad / codfw - T231691
22:04 gehel: forcing osm replication on maps[12]004 - lake Huron has dried up
19:33 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Kartographer/includes/ApiQueryMapData.php: T231561 UBN fix for PHP fatal when ParserOutput has no map data (duration: 00m 56s)
19:24 ebernhardson: cloudelastic-chi index.merge.policy.deletes_pct_allowed=20
16:45 urandom: restarting restbase2017-b with live hack startup script (adds logging) -- T231027
16:38 ebernhardson: cloudelastic-chi all indices auto_expand_replicas set to '0-1'
14:17 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Thanks/includes/: T231617 - 8a3c458c4d937 (duration: 00m 54s)
13:19 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
13:19 ema: cp1075: pause ats-be testing during the weekend T228629
12:43 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
12:12 marostegui: Start replication s2 on labsdb1009 and labsdb1010
11:57 marostegui: Start replication s2 on labsdb1011
11:48 marostegui: Start s2 replication on labsdb1012
11:33 jynus: switching db1125:s2 (eqiad sanitarium) to replicate from codfw T231638
11:31 marostegui: Temporary stop s2 replication on labsdb1009-labsdb1012
10:23 jynus: reseting db1074 from iLo
10:10 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Mirror dbctl depool of db1074 (duration: 00m 55s)
09:57 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1074 after crash', diff saved to https://phabricator.wikimedia.org/P9013 and previous config saved to /var/cache/conftool/dbconfig/20190830-095747-jynus.json
09:24 ema: cp1075: depool ats-be due to low but constant 504 rate after 8.0.5-1wm4 upgrade
09:20 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
09:13 ema: cp1075: upgrade ATS to 8.0.5-1wm4
08:50 ema: repool ats-be on cp1075 and verify if T231504 is fixed
08:49 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9011 and previous config saved to /var/cache/conftool/dbconfig/20190830-080334-marostegui.json
07:42 marostegui: Upgrade db2055 db2071 db2072 db2092
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9010 and previous config saved to /var/cache/conftool/dbconfig/20190830-071043-marostegui.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9009 and previous config saved to /var/cache/conftool/dbconfig/20190830-063949-marostegui.json
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9008 and previous config saved to /var/cache/conftool/dbconfig/20190830-062517-marostegui.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1076 after upgrade', diff saved to https://phabricator.wikimedia.org/P9007 and previous config saved to /var/cache/conftool/dbconfig/20190830-061546-marostegui.json
06:07 marostegui: Upgrade db1076
06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for upgrade - T230785', diff saved to https://phabricator.wikimedia.org/P9006 and previous config saved to /var/cache/conftool/dbconfig/20190830-060702-marostegui.json
05:25 marostegui: Stop MySQL on db2060 - T231625
05:23 marostegui: Remove db2060 from tendril and zarcillo - T231625
05:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2060 from config T231625 (duration: 00m 53s)
05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2060 from config T231625 (duration: 00m 53s)
05:10 marostegui: Restart wikibugs

2019-08-29

23:23 ejegg: updated payments-wiki from 1d5d7503b0 to 51d9ed79b6
23:15 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: 4cdfebe (duration: 00m 54s)
21:36 ejegg: re-enabled fundraising python jobs
20:18 ejegg: updated fundraising python tools from c0f4e7a379 to b42bda6bf3
20:14 foks: removing two files for legal compliance
20:14 ejegg: disabled fundraising python jobs
19:56 ebernhardson: cloudelastic-chi run frwiki_content/_forcemerge?only_expunge_deletes=true to try and fix 5gb segments with 96% deleted documents
18:59 ebernhardson: restart elasticsearch on cloudelastic1003 (T231517)
18:50 ebernhardson: restart elasticsearch on cloudelastic1002 (T231517)
18:41 ebernhardson: set index.merge.scheduler.max_thread_count to null to accept default values on cloudelastic-chi (T231517)
18:36 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/AbuseFilter/includes/AbuseFilterVariableHolder.php: T231542 f37f0bd50cf (duration: 00m 53s)
18:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/CentralAuth/modules/ext.centralauth.ForeignApi.js: e7cd3cd (duration: 00m 55s)
18:23 ebernhardson: restart elasticsearch on cloudelastic1001 (T231517)
18:22 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Fix "Assign all rights assigned to suppress group to oversight group" (T230601) (duration: 00m 54s)
18:07 ebernhardson: increase index.refresh_interval to 5m for all indices on cloudelastic-chi
17:22 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
17:19 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
17:15 dcausse: restarted elasticsearch on cloudelastic1004 (T231517)
17:10 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
17:09 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
17:09 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
16:59 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
16:49 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
16:49 crusnov@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
16:49 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm
14:17 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
14:16 ema: depool ats-be on cp1075 to investigate T231504
11:54 Lucas_WMDE: EU SWAT done
11:45 mlitn@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/UploadWizard: [SDC] Add "copy statements" functionality (UploadWizard part) (duration: 00m 52s)
11:44 mlitn@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/WikibaseMediaInfo: [SDC] Add "copy statements" functionality (MediaInfo part) (duration: 00m 54s)
11:37 mutante: scholarships.wikimedia.org app moving to new backend and using TLS. backend upgraded from jessie to stretch and PHP7 (T210411)
09:19 mutante: iegreview.wikimedia.org switched to new stretch backend and using TLS (T210411)
09:08 marostegui: Reboot db1133 to upgrade kernel - T229657
08:43 marostegui: Change min_replicas to 4 on s2 for eqiad and codfw T231019
08:41 mutante: cp1085 - puppet run stuck after Loading facts, possibly related to ACKed IPMI sensor status issue in Icinga
08:39 mutante: cp1085 - kill stuck puppet processes and run manually
08:36 marostegui: Change min_replicas to 4 on s4 for eqiad and codfw T231019
08:30 marostegui: Change min_replicas to 2 on s3 for eqiad and codfw T231019
08:26 mutante: running puppet on cp-text_eqiad
08:23 mutante: switching iegreview app to stretch backend with TLS and discovery record
08:23 kart_: Updated cxserver to 2019-08-29-074757-production (T230200)
08:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
08:18 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
08:15 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
08:11 _joe_: disabling zend GC on mw1347, testing an hypothesis for T231011
08:03 _joe_: live tweak on mw1270: apc.ttl removed; apc size 4 GB; tideways disabled.
05:00 marostegui: Stop MySQL on db2053 for decommission T231407
04:59 marostegui: Remove db2053 from tendril and zarcillo T231407
03:28 ejegg: updated payments-wiki from 231b7b0850 to 1d5d7503b0

2019-08-28

23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 8bfe43a: Add scielo.br to wgCopyUploadsDomains for commonswiki (T231402) (duration: 00m 55s)
21:39 bd808: Set downtime/ack for showmount on labstore1004 (T229448)
21:03 ejegg: deleted fredge_multiqueue_consumer process-control job
19:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/includes/upload/UploadFromChunks.php: T231488 Speculatively hot-deploy fix ahead of landing in git (duration: 00m 54s)
19:15 James_F: Live hacking php-1.34.0-wmf.20/includes/upload/UploadFromChunks.php on mwdebug1002 for T231488
18:57 XioNoX: update cloud firewall policies on cr1/2-eqiad - T231418
18:32 urandom: rebooting restbase-dev1006 -- T229421
18:25 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.20 (duration: 00m 53s)
18:24 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.20
17:42 XioNoX: re-enable both sides of the reline link between knams and esams - T230448
17:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Kartographer/includes/ApiQueryMapData.php: T231453 Fix array access as object (duration: 00m 54s)
17:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/MobileFrontend/includes: T231014 Postpone call to MobileContext::shouldDisplayMobileView() (duration: 00m 55s)
16:51 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgGraphIsTrusted (no longer used) (duration: 00m 56s)
16:06 hashar: upgrading Jenkins on contint1001
16:03 mutante: imported new jenkins package to thirdparty/ci stretch-wikimedia
16:01 hashar: contint2001: upgraded Debian packages / Jenkins
15:15 jeh: restart puppetdb on compiler1002.puppet-diffs.eqiad.wmflabs
14:35 mutante: racktables - down for maintenance
13:59 ema: cp1075 ats-be repooled to resume testing T228629
13:58 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
13:38 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.34.0-wmf.20"
13:30 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.20 (duration: 00m 55s)
13:29 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.20
13:20 marostegui: Change min_replicas to 4 on s8 for eqiad and codfw T231019
13:18 marostegui: Change min_replicas to 3 on s6 for eqiad and codfw T231019
13:15 marostegui: Optimize pc2010 after deleting old rows - T210725
12:17 hashar: contint1001: manually gzip a few mw-debug-cli.log.gz files # T219850
12:06 Urbanecm: Closing EU SWAT
12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 389919f: [rowiki] Allow sysops to name patrollers (T231099) (duration: 00m 53s)
12:03 Urbanecm: EU SWAT is taking few mins out of the sanity break, last patch
12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 34f1552: Disable search engine indexing in some namespaces of Icelandic Wikipedia (T231179) (duration: 00m 54s)
11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2aebc15: Enable Page Previews as default on zhwikivoyage (T230624) (duration: 00m 52s)
11:55 Urbanecm: Purge /static/images/project-logos/specieswiki-1.5x.png and /static/images/project-logos/specieswiki-2x.png (T230113)
11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f86baa3: Create HIDPI logo for Wikispecies (T230113) (duration: 00m 52s)
11:52 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: f86baa3: Create HIDPI logo for Wikispecies (1/2, T230113) (duration: 00m 54s)
11:48 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WRITE_BOTH for items term store for testwikidatawiki (T225055) (duration: 00m 54s)
11:42 mlitn@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/WikibaseMediaInfo: [SDC] Check existence of objects before using it (duration: 00m 54s)
11:31 marostegui: Optimize pc1010 after deleting old rows - T210725
11:30 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 52s)
11:30 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 622cb63: Enable AMC Outreach modal (T231206) (duration: 00m 54s)
11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 4ebddb8: Enable webfonts for ru,uk,be of wikisource, and for sourceswiki (T220752) (duration: 00m 55s)
11:04 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T212886)
10:42 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T212886)
09:58 vgutierrez: repooling cp5001 - T231287
09:56 vgutierrez: upgrading trafficserver on cp5001 to version 8.0.5-1wm4 - T231287
09:28 mutante: notebook1004 - systemctl start jupyter-ebernhardson-singleuser (T231365)
09:19 mutante: notebook1003 - systemctl start jupyter-iflorez-singleuser
09:14 mutante: mwdebug1002 - restart php-fpm
09:11 mutante: miscweb2001 - edit /etc/apache2/ports.conf and replace port 444 with 443 again; a2dismod ssl; systemctl restart apache2; systemctl restart envoyproxy; now also has envoy listening on 443, matches miscweb1001 and manual hack removed (T210411)
09:06 mutante: miscweb1001 - a2dismod ssl; restart apache - stop listening on 443 to make room for envoy
08:17 marostegui: Deploy grants on labsdb hosts for dbproxy1018 - T202367
08:10 vgutierrez: uploaded trafficserver-8.0.5-1wm4 to apt.wikimedia.org (stretch) - T231287
08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
06:41 marostegui: Upgrade mysql on s7 codfw hosts: db2054, db2061, db2068, db2077 - T230106
06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2053 from config T231407 (duration: 00m 53s)
06:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2053 from config T231407 (duration: 00m 55s)
05:54 marostegui: Remove old rows from pc1010 - T210725
05:19 marostegui: Start dropping neodymium grants across all the databases, parsercache, es, dbstore... T229796
05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 after optimize T210725 (duration: 00m 54s)

2019-08-27

23:45 Urbanecm: Evening SWAT done
23:44 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.20/skins/MinervaNeue/: SWAT: 4d04797: Restore contributions icon to non-AMC menu (T231363) (duration: 00m 54s)
23:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 1422870: [sqwikiquote] Enable WikiLove and SandboxLink (T230390) (duration: 00m 54s)
23:36 Urbanecm: Run mwscript extensions/WikimediaMaintenance/createExtensionTables.php sqwikiquote wikilove (T230390)
23:30 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/MobileFrontend/resources/dist/: SWAT: a109b25: Build assets reflecting edit change (duration: 00m 55s)
23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 3704bb7: Enable partial blocks on ruwiki (T231298) (duration: 00m 54s)
23:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 1687ec9: Whitelist *.wikimedia.cz in wgCopyUploadsDomains for commonswiki (T231247) (duration: 00m 54s)
23:02 eileen: civicrm revision is 049c9666b6, config revision is 24aed9745e
22:40 eileen: civicrm revision changed from 517e6ee4e0 to 049c9666b6, config revision is 24aed9745e
22:29 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.GeneratedContentNode.js: T231381 Follow-up I196f5bd88: Fix typo (set node=this) (duration: 00m 57s)
21:51 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
21:11 XioNoX: disable both sides of the reline link between knams and esams - T230448
20:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unusued wgEnableBlockNoticeStats setting (duration: 00m 54s)
19:08 gehel: starting deployment of Apache config for lexemes / SDoC - T222321
18:59 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/includes/gallery/ImageGalleryBase.php: T231340 T231353 BadFileLookup::isBadFile() expects null, not false for galleries (duration: 00m 53s)
18:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/includes/api/ApiQueryImageInfo.php: T231340 T231353 BadFileLookup::isBadFile() expects null, not false for the API (duration: 00m 53s)
18:56 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/skins/MinervaNeue/skin.json: T231358 Fix userSandbox image path (duration: 00m 53s)
17:46 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1869f79]: Fix definition endpoint TypeError (T230503) (duration: 04m 39s)
17:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/includes/password/PasswordPolicyChecks.php: 098755622f7 (duration: 00m 54s)
17:41 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1869f79]: Fix definition endpoint TypeError (T230503)
17:04 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/Echo: 34084279089f (duration: 00m 55s)
16:38 krinkle@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/TwoColConflict/extension.json: d6b5d441b, T229791 (duration: 00m 55s)
15:41 James_F: That was T231279 Set `$wgRelatedArticlesDescriptionSource` to `wikidata`
15:41 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T231279 Set to (duration: 00m 54s)
14:52 _joe_: running scap pull on mw1280
14:50 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
14:49 _joe_: powercycling mw1280
14:49 bblack: deploying anycast recdns resolv.conf setting to all codfw - T228190
14:45 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.20
{{safesubst:SAL entry|1=14:39 ema: cp1081: restart crashed services varnishkafka-{statsv,webrequest}.service}}
14:38 vgutierrez: depool cp5001 - T231287
14:33 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.20 and rebuild l10n cache (duration: 30m 48s)
14:03 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.20 and rebuild l10n cache
13:52 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.16 [keeping static files] (duration: 01m 35s)
13:47 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.15 (duration: 06m 44s)
13:43 vgutierrez: repool cp5001 - T231287
12:36 ema: pool cp1075 w/ ATS backend (for real) T228629
12:29 marostegui: Rename table filejournal on enwiki on db1089 - T51195
12:17 ema: depool cp1075, confd is not watching the key "ats-be"
12:15 ema: pool cp1075 w/ ATS backend T228629
11:55 mutante: miscweb1001 - a2dismod mpm_event ; a2enmod php7.0 ; systemctl restart apache2 (T224247, T196968) please also see https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206
11:52 dcausse: EU Swat done
11:51 mutante: miscweb1001 - manually remove tin.eqiad.wmnet (!) from /srv/iegreview/iegreview-cache/.config and replace with deploy1001 after first puppet run. still existing bug that tin is not fully removed (T224247, T175288, T197470)
11:49 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T231194 [cirrus] Stop generating new cirrusSearchChecker jobs (duration: 00m 45s)
11:43 dcausse: reopening EU SWAT
11:18 raynor: EU SWAT finished
11:15 vgutierrez: depooling cp5001
11:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Drop MobileWebUIActionsTracking sampling rate to 0.01% (T220016) (duration: 00m 46s)
11:10 mutante: ganeti1001 - starting and OS install of new VM miscweb1001
10:25 marostegui: Remove grants from sarin from all the dbs, dbstore, parsercache, es, labsdb - T229796
10:25 mutante: ganeti eqiad - creating new VM with same specs as krypton to replace it with a stretch instance and mirror miscweb2001. krypton to be removed (T224323, T105507, T224247)
10:12 dcausse: cirrus: reindexing lost updates since 2019-08-12T10:00:00Z for wikitech (T230994)
09:39 marostegui: Deploy grants for dbproxy1016 on m3 - T202367
09:21 marostegui: Remove grants for dbproxy1004 and dbproxy1009 from m4 hosts (db1107 and db1108) - T231280
09:21 vgutierrez: upgrading trafficserver to version 8.0.5-1wm3 on cp5001 - T221594
09:20 vgutierrez: uploaded trafficserver-8.0.5-1wm3 to apt.wikimedia.org (stretch) - T221594
09:11 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@c2bc1a3]: Increase cirrusSearchLinksUpdate concurrency to 150 - T231194 (duration: 01m 09s)
09:09 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@c2bc1a3]: Increase cirrusSearchLinksUpdate concurrency to 150 - T231194
08:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:44 ema@cumin1001: START - Cookbook sre.hosts.downtime
08:36 vgutierrez: repooling cp5001 - T231262
08:18 ema: depool cp1075 and reimage as text_ats T228629
07:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s6 codfw weights and roles T230106 (duration: 00m 44s)
07:48 marostegui@cumin1001: dbctl commit (dc=codfw): 'Reorganize s6 codfw weights and roles T230106', diff saved to https://phabricator.wikimedia.org/P8983 and previous config saved to /var/cache/conftool/dbconfig/20190827-074802-marostegui.json
07:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2129 as s6 codfw master T230106 (duration: 00m 46s)
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2046, this host will be decommissioned T230106', diff saved to https://phabricator.wikimedia.org/P8982 and previous config saved to /var/cache/conftool/dbconfig/20190827-072847-marostegui.json
07:25 marostegui@cumin1001: dbctl commit (dc=codfw): 'Promote db2129 to codfw s6 master T230106', diff saved to https://phabricator.wikimedia.org/P8981 and previous config saved to /var/cache/conftool/dbconfig/20190827-072556-marostegui.json
07:16 marostegui: Switchover codfw s6 master from db2046 to db2129 T230106
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2129 weight to 0 before promoting it to codfw s6 master T230106', diff saved to https://phabricator.wikimedia.org/P8980 and previous config saved to /var/cache/conftool/dbconfig/20190827-071456-marostegui.json
07:07 vgutierrez: depooling cp5001 - T231262
07:04 vgutierrez: repooling cp5001 - T231262
06:11 _joe_: updating reprepro sources for jessie-wikimedia
05:36 XioNoX: update cloud acls on cr1/2-eqiad - T230980
05:28 marostegui: Optimize pc1009 - T210725
05:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 for optimize T210725 (duration: 00m 45s)
05:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc1009 for optimize T210725 (duration: 00m 45s)
05:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2009 after optimize T210725 (duration: 00m 47s)
03:59 vgutierrez: depooling cp5001 - T231262
03:53 vgutierrez: repooling cp5001 - T231262
02:59 vgutierrez: rebooting cp5001
01:47 eileen: process-control config revision is 24aed9745e
00:18 eileen: civicrm revision changed from ab2a9b264b to 517e6ee4e0, config revision is 8c900d909f

2019-08-26

20:50 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@0463394]: Update mobileapps to 6bdc333 (duration: 06m 18s)
20:44 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@0463394]: Update mobileapps to 6bdc333
20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@d9042a1]: Update mobileapps to fbe3cc6 (duration: 13m 08s)
20:12 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@d9042a1]: Update mobileapps to fbe3cc6
18:30 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:528546 lvwiki damaging model adjustment (duration: 00m 46s)
18:15 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:528506 Enable Related Article cards in Timeless across all projects (duration: 00m 46s)
17:53 XioNoX: add new IP to labsdb-tcp4 on cr1/2-eqiad - T230980
17:34 herron: beginning roll out of prometheus-ipsec-exporter in ulsfo T230236
15:38 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T228051 Load the Translate extension via static extension registration (duration: 00m 46s)
15:02 vgutierrez: depooling cp5001
14:59 marostegui: Change min_replicas to 3 on s5 for eqiad and codfw T231019
14:16 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
14:05 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
14:05 vgutierrez: repooling cp5001 using trafficserver as TLS termination layer - T221594
14:02 herron: uploaded prometheus-ipsec-exporter-0.3.1-1 pacakge to stretch-wikimedia and buster-wikimedia
14:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
13:58 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
13:49 vgutierrez: upgraded trafficserver to version 8.0.5-1wm2 in cp5001
13:49 marostegui: Rename table filejournal on enwiki on db2112 - T51195
13:38 mobrovac@deploy1001: Finished deploy [restbase/deploy@38c313d]: Expose RB on both 7231 and 7233 - T223953 (duration: 23m 00s)
13:28 vgutierrez: Replacing nginx with ats-tls in cp5001 - T221594
13:21 marostegui: Change MySQL.monitoring queries latency graph parameters to support buster+mariadb 10.3 - T231190
13:15 mobrovac@deploy1001: Started deploy [restbase/deploy@38c313d]: Expose RB on both 7231 and 7233 - T223953
13:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@38c313d] (dev-cluster): Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953 (duration: 03m 22s)
13:06 mobrovac@deploy1001: Started deploy [restbase/deploy@38c313d] (dev-cluster): Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953
13:06 mobrovac@deploy1001: deploy aborted: Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953 (duration: 00m 04s)
13:06 mobrovac@deploy1001: Started deploy [restbase/deploy@38c313d]: Bring the dev cluster up to date and expose RB on both 7231 and 7233 in it - T223953
12:48 marostegui: Restart MySQL on db2114 to pick up binlog format change
12:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2114 status (duration: 00m 45s)
11:57 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@e742ecf]: Increase the concurrency of cirusSearchCheckerJobs to 20 - T231194 (duration: 01m 31s)
11:55 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@e742ecf]: Increase the concurrency of cirusSearchCheckerJobs to 20 - T231194
11:36 Amir1: EU SWAT is done
11:34 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/UniversalLanguageSelector: SWAT: Revert "Return target of redirect languages in mw.uls.getFrequentLanguageList" (T217770 T121747) (duration: 00m 46s)
11:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_NEW on client wikis (T225053) (duration: 00m 46s)
10:47 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 46s)
10:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 46s)
10:26 vgutierrez: uploaded trafficserver-8.0.5-1wm2 to apt.wikimedia.org (stretch) - T221594
09:54 _joe_: codfw/appserver/*/mw2231.codfw.wmnet: pooled changed yes => inactive T231192
09:43 Urbanecm: Run scap pull on mwdebug1001, test ended
09:38 Urbanecm: Enable partial blocks on test2wiki and mwdebug1001 to test something
08:46 _joe_: hard powercycle of mw2231, down with a blank console
06:51 ema: cp-upload: rolling ats-backend-restart to enable compress plugin
05:25 marostegui: Upload new mariadb 10.3 packages to repo
05:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2009 for optimize T210725 (duration: 02m 53s)
05:08 marostegui: Optimize tables on pc2009 - T210725

2019-08-25

13:46 volans: uploaded spicerack_0.0.27-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
02:22 cdanis: clear downtimes on cr2-eqiad/cr2-codfw, link supposedly stable now
00:35 herron: set icinga downtimes on flapping cr2-eqiad and cr2-codfw alerts until monday

2019-08-24

15:27 Urbanecm: Run mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Shangkuanlc /home/urbanecm/T231129 (T231129)

2019-08-23

23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:34 robh@cumin1001: START - Cookbook sre.hosts.decommission
23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
22:05 eileen: process-control config revision is 8c900d909f
21:48 XioNoX: increase ospf cost of zayo codfw-eqiad link to 1320 (was 320) to make it secondary
19:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
19:11 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:14 James_F: Dropped 2FA for User:DBrant (WMF), per request.
17:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
17:39 robh@cumin1001: START - Cookbook sre.hosts.decommission
12:24 _joe_: pooling mw1270 temporarily, debugging performance issues
12:15 _joe_: depooling mw1270 temporarily, performance issues
11:13 marostegui: Upgrade db1114 from 10.3.16 to 10.3.17
10:06 dcausse: elastic: reindexing wikis with old mappings in eqiad & codfw (T230990)
05:52 moritzm: installing squid3 security updates
05:11 marostegui: Stop MySQL on db2066 for decommission T230885
05:08 marostegui: Remove db2066 from tendril and zarcillo T230885

2019-08-22

23:41 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/WikibaseLexeme: SWAT: e4a5457: Fix Lexemes RDF generation (T230974) (duration: 00m 49s)
23:32 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: eb1c4ea: Rename globals and rights in AbuseFilter config (duration: 00m 47s)
23:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 66b719d: General cleanup of `groupOverrides` (T231041) (duration: 00m 47s)
23:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 872f4b0: Change language code for punjabiwikimedia, resyncing, got broken pipe at the end (T230680) (duration: 00m 47s)
23:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 872f4b0: Change language code for punjabiwikimedia (T230680) (duration: 00m 48s)
23:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a5917e4: Clean up `wgRateLimits` to remove unneeded entries (T231040) (duration: 00m 48s)
22:07 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Update MachineVision Beta config (duration: 00m 47s)
21:19 eileen: tools revision changed from 5c080bac63 to c0f4e7a379
20:35 ejegg: updated payments-wiki from 85dce8f79f to 231b7b0850
17:14 elukey: remove analytics-tool1002 from ganeti - T231021
17:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
17:12 elukey@cumin1001: START - Cookbook sre.hosts.decommission
14:37 _joe_: restarting php-fpm on mw1348 to observe the effect on the slowdown, T231011
13:47 elukey: update puppet compiler's facts
13:42 jijiki: Restart php-fpm on mw1348 and mw1347
13:41 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.19
13:36 joal@deploy1001: Finished deploy [analytics/refinery@a9b99e9]: Regular weekly analytics deployment train (1 day late) (duration: 18m 57s)
13:32 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting PHP7 traffic back to 20% - T219150 (duration: 00m 57s)
13:27 tarrow@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/Wikibase/client/: Revert "Use the backwards-compatible HTML ID for the wikidata item link" (T230958, T66315) (duration: 00m 58s)
13:18 joal@deploy1001: Started deploy [analytics/refinery@a9b99e9]: Regular weekly analytics deployment train (1 day late)
12:56 _joe_: restarting mw1270 with slowlog disabled
12:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
12:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime
12:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime
12:38 _joe_: disabled slowlog on mw1348, repooling after reload
12:37 jijiki: Pooling mv1347 not mw1247
12:35 jijiki: Pooling mw1247
12:16 moritzm: upgrading mariadb (packaged Debian version) on matomo1001
12:15 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
12:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime
12:07 jijiki: Depooling mw1347 and mw1348
10:55 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 33.3% - T219150 (duration: 01m 01s)
09:09 ema: rolling ats-backend-restart to enable @debug system call family
09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
08:17 moritzm: restarting oozie on an-coord1001
07:54 tarrow@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/Wikibase/repo/: Backport for UBN Hack to avoid trying to termbox render page before save (T230937) (duration: 00m 56s)
07:46 marostegui: Deploy grants on labsdb1009-labsdb1012 to allow connections for haproxy from dbproxy1019 - T202367
06:52 moritzm: installing mariadb-10.1 updates from Stretch 9.9 point release (unrelated to wmf-mariadb, mostly client-side clients/libraries as shipped in Debian)
06:37 moritzm: installing python-pip updates from Stretch 9.9 point release
05:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2066 from config T230885 (duration: 00m 54s)
05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2066 from config T230885 (duration: 00m 54s)
05:14 marostegui: Remove db2059 from tendril and zarcillo - T230884
05:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2059 from config T230884 (duration: 00m 55s)
05:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2059 from config T230884 (duration: 00m 59s)
00:37 XioNoX: run /usr/local/sbin/restart-php7.2-fpm on mwdebug1001/2
00:27 XioNoX: push L3 ECMP to eqiad routers - T230955
00:23 XioNoX: push L3 ECMP to esams routers - T230955
00:22 XioNoX: push L3 ECMP to eqsin routers - T230955
00:21 twentyafterfour: phabricator update completed without incident
00:19 XioNoX: push L3 ECMP to codfw routers - T230955
00:15 twentyafterfour: Starting phabricator upgrade from tag release/2019-08-14/1 to release/2019-08-22/1

2019-08-21

21:44 eileen: civicrm revision changed from d7370a9d0b to ab2a9b264b, config revision is 58cd6b7ae6
20:51 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@fc270fd]: bulk_daemon: Retune popularity_score bulk sizing (duration: 03m 49s)
20:48 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@fc270fd]: bulk_daemon: Retune popularity_score bulk sizing
20:17 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@67103e9]: bulk_daemon: Correct super() call (duration: 04m 19s)
20:13 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@67103e9]: bulk_daemon: Correct super() call
20:02 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@556c4d0]: bulk_daemon: Track timeouts, log indices used, increase thread counts (duration: 04m 42s)
20:00 XioNoX: test l3 ECMP in ulsfo
19:57 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@556c4d0]: bulk_daemon: Track timeouts, log indices used, increase thread counts
19:54 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Track timeouts, log indices used, increase thread counts (duration: 02m 34s)
19:52 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Track timeouts, log indices used, increase thread counts
19:34 XioNoX: repool codfw and eqsin - T226422
19:31 XioNoX: Rollback: Varnish: redirect eqsin/ulsfo text to eqiad - T226422
19:29 ayounsi@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
19:29 ayounsi@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
19:26 XioNoX: rollback: increase OSPF cost on cr2-codfw links - T226422
19:25 XioNoX: rollback deactivate transit links on cr2-codfw - T226422
19:24 XioNoX: rollback: move VRRP master from cr2-codfw to cr1-codfw - T226422
19:16 XioNoX: restart both REs on cr2-codfw - T226422
19:14 XioNoX: failover master RE to RE0 on cr2-codfw - T226422
18:37 XioNoX: shutdown re0:cr2-codfw (backup) - T226422
18:32 XioNoX: failover master RE to RE1 on cr2-codfw - T226422
18:19 XioNoX: shutdown re1:cr2-codfw (backup) - T226422
18:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.19/includes/specialpage/RedirectSpecialPage.php: T230932 RedirectSpecialArticle: Fix PHP notice about undefined index (duration: 00m 54s)
18:18 XioNoX: move VRRP master from cr2-codfw to cr1-codfw - T226422
18:15 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
18:15 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
18:15 tarrow@deploy1001: Synchronized php-1.34.0-wmf.19/extensions/Wikibase/client/: SWAT: Use the backwards-compatible HTML ID for the wikidata item link (T66315) (duration: 00m 58s)
18:14 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
18:14 ayounsi@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
18:12 XioNoX: deactivate transit links on cr2-codfw - T226422
18:04 XioNoX: increase OSPF cost on cr2-codfw links - T226422
17:56 XioNoX: rollback: apply BGP graceful shutdown to cr1-codfw transits - T226422
17:55 XioNoX: Rollback: increase OSPF cost on ulsfo-codfw link - T226422
17:53 XioNoX: rollback: disable BGP from cr1-codfw to lvs2001/2/3 - T226422
17:43 XioNoX: restart both REs on cr1-codfw - T226422
17:33 XioNoX: failover master RE to RE0 on cr1-codfw - T226422
17:33 cmjohnson1: cloudvirt1015 down for a new motherboard
17:25 XioNoX: shutdown RE0 on cr1-codfw - T226422
17:17 bstorm_: reboot cloudvirt1024 to try and reset raid T230289
17:17 XioNoX: failover master RE to RE1 on cr1-codfw - T226422
17:08 XioNoX: disable BGP from cr1-codfw to lvs2001/2/3 - T226422
17:02 cmjohnson1: rebooting cloudvirt1024
17:00 tarrow: continuing the SWAT window to backport train blocker fixes
16:56 XioNoX: Varnish: redirect eqsin/ulsfo text to eqiad - T226422
16:51 XioNoX: increase OSPF cost on ulsfo-codfw link - T226422
16:46 XioNoX: apply BGP graceful shutdown to cr1-codfw transits - T226422
16:37 XioNoX: depool eqsin and codfw - T226422
16:01 moritzm: fixed apt config on krypton, broken getenvoy-jessie.list made apt-get update fail
15:16 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Rollback to 0.32 (duration: 00m 25s)
15:15 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Rollback to 0.32
15:07 moritzm: installing python-cryptography update from Stretch point release
15:00 jbond42: adding interface::add_ip6_mapped to media wiki servers
14:46 elukey@deploy1001: Finished deploy [analytics/superset/deploy@868635a]: Upgrading superset to 0.34rc1 (duration: 00m 33s)
14:46 elukey@deploy1001: Started deploy [analytics/superset/deploy@868635a]: Upgrading superset to 0.34rc1
14:42 moritzm: installing java-common update from Stretch point release
14:36 moritzm: installing dns-root-data update from Stretch point release
14:29 godog: silence average mw appserver latency alerts for 24h, too noisy
14:28 elukey: swap turnilo backend in varnish from analytics-tool1002 to an-tool1007
14:27 moritzm: installing ca-certificates-java update from Stretch point release
14:10 marostegui: Upgrade mysql on db2075
13:12 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.19 (duration: 00m 55s)
13:11 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.19
11:59 jbond42: add ipv6 mapped address to mw codfw servers
11:41 Amir1: EU SWAT is done
11:38 jijiki: Restarting ores on ores1004 and ores1005
11:37 elukey: restart celery-ores-worker on ores1002
10:57 Urbanecm: Run scap pull on mwdebug1002 (T230601)
10:52 Urbanecm: Move 0a87e3c's code to abusefilter.php on mwdebug1002 (T230601)
10:49 Urbanecm: Previous log entry was for mwdebug1002
10:49 Urbanecm: Wrapped code added to CommonSettings.php in T230601 to wgExtensionFunctions
10:45 Urbanecm: Run mwscript namespaceDupes.php --wiki=zhwikisource --add-prefix=FIXME --fix (T230548)
10:02 moritzm: installing puppetdb1002
09:46 tarrow: finished enabling termbox on wikidatawiki
09:36 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Termbox on wikidatawiki (T230896) (duration: 00m 55s)
09:29 moritzm: rebooting db2102 (reverting to a proper stretch 4.9 kernel, it used a bpo kernel due to some hardware debuging a while back)
09:20 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
09:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
09:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:09 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
09:07 _joe_: uploaded python-poolcounter to stretch,buster
08:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
08:29 moritzm: upgrading PHP on contint*
08:18 moritzm: installing puppetdb2002
08:11 marostegui: Stop MySQL on db2052 T230883
08:11 marostegui: Remove db2052 from tendril and zarcillo T230883
08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2052 from config T230883 (duration: 00m 54s)
08:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2052 from config T230883 (duration: 00m 54s)
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1122', diff saved to https://phabricator.wikimedia.org/P8953 and previous config saved to /var/cache/conftool/dbconfig/20190821-075813-marostegui.json
07:56 ema: upload@eqsin: rolling ats-backend-restart to enable compress plugin
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1122', diff saved to https://phabricator.wikimedia.org/P8952 and previous config saved to /var/cache/conftool/dbconfig/20190821-054542-marostegui.json
05:28 eileen: civicrm revision changed from 0d1b7f107a to d7370a9d0b, config revision is 58cd6b7ae6
05:26 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1122', diff saved to https://phabricator.wikimedia.org/P8951 and previous config saved to /var/cache/conftool/dbconfig/20190821-052613-marostegui.json
05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122 after restart', diff saved to https://phabricator.wikimedia.org/P8950 and previous config saved to /var/cache/conftool/dbconfig/20190821-051441-marostegui.json
05:05 marostegui: Restart MySQL on db1122 for binlog format change - T230785
05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for binlog format change', diff saved to https://phabricator.wikimedia.org/P8949 and previous config saved to /var/cache/conftool/dbconfig/20190821-050501-marostegui.json
05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1122 status: candidate master for s2 - T230785 (duration: 00m 55s)
02:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Handle non-integer status_code in json response (duration: 04m 09s)
02:24 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@adff5ad]: bulk_daemon: Handle non-integer status_code in json response

2019-08-20

23:53 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@a7bf6cf]: bulk_daemon: Increase bulk request_timeout (duration: 03m 40s)
23:50 eileen: that just changes us to php7 csv so watch for any fail mail
23:49 eileen: civicrm revision changed from 9c7b2ffbc9 to 0d1b7f107a, config revision is 58cd6b7ae6
23:49 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@a7bf6cf]: bulk_daemon: Increase bulk request_timeout
23:42 Urbanecm: Evening SWAT aborted due to no logs logged for some period of time (T230847), no patches were reverted
23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fd2cece: Enable RelatedArticles on all skins on eswikinews (T230660) (duration: 00m 52s)
23:22 urbanecm@deploy1001: Synchronized wmf-config/throttle-analyze.php: SWAT: a3927a7: Grant skipcaptcha to everyone coming from whitelisted IP (T227487) (duration: 00m 54s)
23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 13be059: Disable Wikimedia ReadingDepth (T229042) (duration: 00m 56s)
23:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0c08257: Remove unused remnant from old menu click tracking (T228681) (duration: 00m 55s)
23:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: b94b647: Update wgSkipSkins to experiment with not showing skins to users (T223824) (duration: 00m 58s)
21:20 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9b40607]: bulk_daemon: Increase max_poll_interval_ms to 15 minutes (duration: 06m 22s)
21:14 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9b40607]: bulk_daemon: Increase max_poll_interval_ms to 15 minutes
20:49 XioNoX: push BGP_Wikimedia_pops to ams - T227808
19:28 XioNoX: push BGP_Wikimedia_pops to eqsin - T227808
19:25 thcipriani: cleanup old (pre 1.34.0-wmf.14) wmf/* branches for core and extensions on gerrit
19:25 XioNoX: push BGP_Wikimedia_pops to cr4-ulsfo - T227808
19:04 XioNoX: push BGP_Wikimedia_pops to cr3-ulsfo - T227808
19:00 cdanis@deploy1001: Synchronized docroot/noc/db.php: 80a6743dd noc: read dbctl JSON T229631 (duration: 00m 58s)
17:57 bblack: deploying anycast recdns settings to resolv.conf on 41 live hosts in eqiad - https://gerrit.wikimedia.org/r/528524 - T228190
16:54 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕐☕ sudo chmod g+w -R /srv/mediawiki-staging/
16:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.17/includes/resourceloader/ResourceLoaderWikiModule.php: T229433 - f84a4abb418de8 (debugging) (duration: 00m 56s)
16:42 Krinkle: php-1.34.0-wmf.17/extensions/TimedMediaHandler is dirty. A merged patch was not deployed - https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/TimedMediaHandler/+/530558/
16:25 hoo: Updated the Wikidata property suggester with data from the 2019-08-12 JSON dump and applied the T132839 workarounds
16:15 krinkle@deploy1001: Synchronized php-1.34.0-wmf.19/includes/resourceloader/ResourceLoaderWikiModule.php: T229433 - 44607c984016b (debugging) (duration: 00m 55s)
16:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fa903b7: Enable DNS blacklist for es.wikiquote (T230796) (duration: 00m 55s)
16:04 oblivian@deploy1001: Pruned MediaWiki: 1.34.0-wmf.13 (duration: 04m 09s)
15:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5ab38dc: Restrict account creation on es.wikiquote to 1 day/IP (T230796) (duration: 01m 00s)
15:49 urandom: creating Parsoid/PHP storage schema in restbase-dev -- T230792
15:48 Urbanecm: Run sudo -u mwdeploy chmod g+w /srv/mediawiki-stagging/wmf-config on deploy1001
15:17 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.19
14:54 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.19 and rebuild l10n cache (duration: 30m 31s)
14:29 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
14:24 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.19 and rebuild l10n cache
14:21 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.15 [keeping static files] (duration: 01m 43s)
14:13 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.14 (duration: 06m 44s)
13:58 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
13:32 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
13:07 cdanis: ✔️ cdanis@cobalt.wikimedia.org ~ 🕘 sudo systemctl restart gerrit.service
13:03 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
12:05 awight@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase: SWAT: Initialize DatabaseTermIdsResolver and DatabaseTypeIdsStore with repo database name in client. (T230119, T225053) (duration: 00m 52s)
10:51 marostegui: Stop MySQL on db2051 and db2056 for decommission T230777 T230778
10:30 ema: cp5002: restart trafficserver for compress.so config change
10:11 tarrow: termbox 2nd smoketests finished
09:52 marostegui: Remove db2051 and db2056 from tendril and zarcillo - T230777 T230778
09:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2051 and db2056 from config T230777 T230778 (duration: 00m 48s)
09:00 tarrow: Starting 2nd smoketest of termbox service on eqiad: T229907
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2123 as codfw s5 master - T230106', diff saved to https://phabricator.wikimedia.org/P8936 and previous config saved to /var/cache/conftool/dbconfig/20190820-082802-marostegui.json
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s5 codfw weights - T230106', diff saved to https://phabricator.wikimedia.org/P8935 and previous config saved to /var/cache/conftool/dbconfig/20190820-082411-marostegui.json
08:19 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2123 to s5 codfw master T230106 (duration: 00m 48s)
08:05 marostegui: Switchover s5 codfw master db2052 -> db2123 T230106
07:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s5 codfw weights T230106 (duration: 00m 47s)
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2059 and db2066, those two will be decommissioned T228258', diff saved to https://phabricator.wikimedia.org/P8934 and previous config saved to /var/cache/conftool/dbconfig/20190820-074900-marostegui.json
06:59 moritzm: installing failoid1001/2001 T229903
05:59 marostegui: Stop MySQL and shutdown db1114 for on-siste maintenance - T229452
05:55 marostegui: Stop MySQL on db2044 for decommissioning - T221594
05:37 marostegui: Remove db2049 from tendril and zarcillo T230721
05:35 marostegui: Stop MySQL on db2049 for decommissioning - T230721
05:24 marostegui: Reload haproxy on dbproxy2002 T230705
05:18 marostegui: Switchover m2 codfw master, db2044 -> db2067 T230705

2019-08-19

21:21 ejegg: updated payments-wiki from 7b8091ba87 to 85dce8f79f
21:21 ejegg: updated payments-wiki subdesarrollo
19:35 ejegg: updated payments-wiki from e3b378f65d to 7b8091ba87
18:57 Urbanecm: Morning SWaT done
18:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Raise rollback limit for all groups (T228708) (duration: 00m 48s)
18:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 26317c7: Fix zhwikisource wgExtraNamespaces entry (T230294) (duration: 00m 48s)
18:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: b21bbc0: Add `WS` and `CAT` as aliases for zhwikisource namespaces (T230548) (duration: 00m 47s)
18:26 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 0a87e3c: Assign all rights assigned to suppress group to oversight group (T230601) (duration: 00m 48s)
17:56 ebernhar1son: freeze cloudelastic writes to let prod clear 30 min backlog
17:23 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@2d36896]: Fix Blazegraph dictionary mixup (duration: 18m 18s)
17:17 shdubsh: restarting icinga to disable UI autocomplete
17:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@2d36896]: Fix Blazegraph dictionary mixup
16:45 onimisionipe: pool elastic2050. mgmt issue has been resolved - T230597
15:39 ejegg: updated payments-wiki from 00eb090dcc to e3b378f65d
13:57 vgutierrez: repooling cp5001
12:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2049 from config T230721 (duration: 00m 48s)
12:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2049 from config T230721 (duration: 00m 48s)
12:38 vgutierrez: depooling cp5001 prior to ats-tls deployment
12:02 Urbanecm: EU SWAT done
11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert 483691c (T225053) (duration: 00m 48s)
11:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 483691c: Revert "Revert "Switch property terms migration to WRITE_NEW on client wikis"" (T225053) (duration: 00m 48s)
11:15 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
11:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
11:03 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
11:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
11:00 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
10:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
10:53 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
10:52 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
10:32 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
10:22 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
09:57 jbond42: add mapped ipv6 to conf200* servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/528475
09:26 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
08:57 godog: add 100G to graphite1004 / graphite2003 /srv LVs
07:59 onimisionipe: shutdown elastic2050 to prepare for mgmt reset - T230597
07:40 marostegui: Redact napwikisource on db1124 and db2094 - T210762
07:19 moritzm: installing golang-1.11 security updates on buster
07:08 moritzm: installing ffmpeg security updates on buster
06:37 vgutierrez: upgrading acme-chief to version 0.20 on production servers - T229096
06:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet
06:29 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet
06:28 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir1002.eqiad.wmnet
06:27 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet
06:26 moritzm: installing ghostscript security updates on scb/proton/notebook* hosts
06:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2001.codfw.wmnet
06:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2001.codfw.wmnet
06:24 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
06:22 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
06:21 vgutierrez: rolling upgrade of nginx in ncredir hosts
06:03 moritzm: installing php5 security updates
05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2067 from config T230705 (duration: 00m 47s)
05:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2067 from config T230705 (duration: 00m 50s)
05:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2067, will be moved to m1 T230705', diff saved to https://phabricator.wikimedia.org/P8930 and previous config saved to /var/cache/conftool/dbconfig/20190819-054606-marostegui.json
05:29 elukey: reboot cp2004 due to bnx2x crash (kern.log saved into my home on the host if needed)

2019-08-18

08:28 onimisionipe: running `_cluster/reroute?pretty&explain=true&retry_failed` on eqiad production-search cluster to force allocation of shards

2019-08-16

19:48 sbassett: Deployed security patch for T230576 (ex:MobileFrontend)
18:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
16:38 XioNoX: add BGP sessions to Scaleway (AS12876) in esams
16:12 elukey: upload prometheus-druid-exporter 0.7-1 to stretch/buster-wikimedia
15:42 elukey: roll restart of druid broker/historicals to pick up new logging/metrics settings
14:39 onimisionipe: run `bmc-device --cold-reset; echo $?` in elastic2050 hoping it resets mgmt interface -T230597
14:24 gehel: rolling reboot of cloudelastic
13:52 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision (beta): Request labels targeting Beta Wikidata (duration: 00m 50s)
08:18 _joe_: stopping php on phab1003, to restart it with systemd
06:50 _joe_: upgrading envoyproxy across production (http2 CVEs)
02:51 vgutierrez: repooling cp5002, running compress.so experiment

2019-08-15

23:35 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to T230588 (duration: 09m 48s)
23:25 smalyshev@deploy1001: Started deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to T230588
21:54 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@fce8177]: Weekly deploy (duration: 25m 28s)
21:28 smalyshev@deploy1001: Started deploy [wdqs/wdqs@fce8177]: Weekly deploy
21:27 ebernhardson: finish restarting cloudelastic-chi-eqiad with -XX:NewRatio=3
21:18 ebernhardson: increase cloudelastic indices.recovery.max_bytes_per_sec from 40mbit to 512mbit as these have 10G networking
21:07 ebernhardson: restart cloudelastic1002 with -XX:NewRatio=3 to match cloudelastic1001
20:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
19:37 ema: depool cp5002 during the EU night, running compress.so experiment
19:28 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
19:19 sbassett: Deployed security patch for T230402 (1.34.0-wmf.17)
19:18 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:18 sbassett: Deployed security patch for T229541 (1.34.0-wmf.17)
19:17 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
19:17 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:01 ebernhardson: restart elasticsearch on cloudelastic1001 with -XX:NewRatio=3
18:51 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
17:58 mbsantos@deploy1001: Finished deploy [proton/deploy@fb0b2a5]: Update chromium-renderer to 3f1cc72 (T218220) (duration: 00m 43s)
17:58 mbsantos@deploy1001: Started deploy [proton/deploy@fb0b2a5]: Update chromium-renderer to 3f1cc72 (T218220)
17:47 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@1bd2bea]: Update service-mobileapp-node to 5c1da03 (T230067 T229984) (duration: 05m 53s)
17:41 mbsantos@deploy1001: Started deploy [mobileapps/deploy@1bd2bea]: Update service-mobileapp-node to 5c1da03 (T230067 T229984)
17:11 ejegg: updated payments-wiki from 44eae2d65f to 00eb090dcc
17:02 cstone: civicrm revision changed from 3caf54a0d2 to 9c7b2ffbc9
16:53 reedy@deploy1001: Synchronized docroot/noc/db.php: Use WmfClusters from seperate file (duration: 00m 47s)
16:52 reedy@deploy1001: Synchronized src/WmfClusters.php: Move WmfClusters.php (duration: 00m 47s)
16:27 XioNoX: advertise core v4 range (208.80.152.0/22) from eqord - T167841
16:09 ori: Finished messing around with mwdebug1002
16:06 reedy@deploy1001: Synchronized docroot/: phpcs fixes (duration: 00m 47s)
16:05 reedy@deploy1001: Synchronized wmf-config/arclamp.php: phpcs (duration: 00m 47s)
16:04 reedy@deploy1001: Synchronized tests/: phpunit (duration: 00m 47s)
16:03 reedy@deploy1001: Synchronized phpcs.xml: more exclusions! (duration: 00m 47s)
15:40 ebernhardson: unfreeze writes to cloudelastic cluster
15:37 ema: cp5002: re-pool with compress.so cache:false
15:34 herron: performing rolling restarts of eqiad kafka-main brokers for security updates
15:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
15:13 ori: Messing around with CommonSettings.php on mwdebug1002 to profile config loading
14:58 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
14:58 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.reboot-wdqs (exit_code=97)
14:56 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
14:52 reedy@deploy1001: Synchronized wmf-config/: phpcs cleanup (duration: 00m 47s)
14:51 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.reboot-wdqs (exit_code=97)
14:51 reedy@deploy1001: Synchronized multiversion/: phpcs cleanup (duration: 00m 47s)
14:50 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
14:50 ema: cp5002 depool due to compress.so crash
14:50 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
14:49 reedy@deploy1001: Synchronized phpcs.xml: remove exclusions (duration: 00m 49s)
14:47 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
14:44 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot-wdqs (exit_code=0)
14:41 gehel@cumin1001: START - Cookbook sre.wdqs.reboot-wdqs
14:33 papaul: shutting down db2063 for maintenance
13:17 reedy@deploy1001: Synchronized phpcs.xml: remove excess lines (duration: 00m 46s)
12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove account creation restrictions (T230304, T230521) (duration: 00m 48s)
12:21 Urbanecm: EU SWAT done
12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: d036388: Increase default thumb size to 260px on Dutch Wikipedia (T215106) (duration: 00m 48s)
12:16 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/AbuseFilter/extension.json: SWAT: e9422c5: Rearrange config to provide better experience (T191740, T200032, T226987) (duration: 00m 47s)
12:14 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: 7e95f6d: Update AbuseFilter config to keep the status quo (T191740, T200032, T226987) (duration: 00m 49s)
12:04 Urbanecm: EU SWAT is going a few minutes out of its window
12:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:00 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
11:37 Urbanecm: Run mwscript namespaceDupes.php --wiki=zhwikisource --add-prefix="FIXME" --fix (T230294)
11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fe9b6ed: Add Portal namespace on zhwikisource (T230294) (duration: 00m 47s)
11:29 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 377cc53: Add new throttle rule for cawiki editathon (T230313) (duration: 00m 47s)
11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove napwikisource from wgProofreadPageNamespaceIds (T230541) (duration: 00m 47s)
11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0d8c516: Fix addition of Hubblesite.org and Spacetelescope.org to commons wgCopyUploadsDomains (T230083) (duration: 00m 48s)
10:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T230533: Add more import sources for napwikisource (duration: 00m 52s)
08:54 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
08:54 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
08:52 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
08:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
07:35 ema: cp5002: ats-backend-restart to enable compress plugin
06:38 ema: wdqs1009: restart wdqs-updater.service
00:15 robh: scs-ulsfo offline due to networking issues, rob returning tomorrow with fix T230077
00:03 twentyafterfour: starting phabricator upgrade to 2019-08-14/1 refs T215697

2019-08-14

23:13 ebernhardson: leave cloudelastic writes paused, and dropping from backlog queue, to allow primary clusters to catch up
22:41 eileen: civicrm revision changed from 569e52e23d to 3caf54a0d2, config revision is 1c76e94ac3
22:38 ebernhardson: freeze writes to cloudelastic for real this time
22:03 ejegg: updated fundraising python tools from 827ce3750e to 5c080bac63
22:01 robh: starting scs-ulsfo replacement. There will be icinga errors and they are intentionally being allowed so we know when things dont recover properly T230077
21:37 XioNoX: advertise core v6 range (2620:0:860::/46) from eqord - T167841
21:30 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
21:26 ebernhardson: thaw writes to cloudelastic
21:24 ejegg: updated payments-wiki from 9533f70fab to 44eae2d65f
21:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
21:13 ebernhardson: apply freeze to cloudelastic writes, to determine if backlog processing can catchup while deferring cloudelastic writes
20:49 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
20:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
20:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
20:44 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
18:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
17:29 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
16:32 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:32 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:31 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
15:50 ema: cp5002: ats-backend-restart to disable compress plugin while I'm not around
15:45 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
15:41 gehel: powercycling elastic101[789]
15:30 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
14:55 vgutierrez: upgrade nginx to 1.13.9-1wm2 in cp3032
14:17 fsero: upgrading envoy package to 1.11.1
14:09 vgutierrez: rolling back nginx upgrade in cp3032
14:01 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 04s)
13:58 reedy@deploy1001: Synchronized static/images/project-logos/: T210752 (duration: 00m 47s)
13:56 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T210752 (duration: 00m 47s)
13:55 reedy@deploy1001: rebuilt and synchronized wikiversions files: T212881
13:53 reedy@deploy1001: Synchronized dblists/: T212881 (duration: 00m 48s)
12:48 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
12:47 James_F: <sadtrombone> Wiki creation is still not working correctly, unfortunately.
Away: We're going to try making a new wiki. T212881
12:20 vgutierrez: rolling upgrade of nginx to 1.13.9-1+wmf2 in the cache cluster
12:17 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
11:20 vgutierrez: repooling cp5002
11:19 tarrow: termbox smoketests finished
11:06 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
10:46 ema: depool cp5002 after crash. See /var/log/trafficserver/crash-2019-08-14-104502.log
10:28 tarrow: Starting smoketest of termbox service on eqiad: T229907
09:40 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
09:20 ema: cp5002: ats-backend-restart to enable compress plugin
08:52 vgutierrez: upgrading nginx to 1.13.9-1+wmf2 in cp1075, cp2001, cp3030 and cp4027 (text) and cp1076, cp2002, cp3034, cp4021 (upload)
08:25 vgutierrez: upgrading nginx to 1.13.9-1+wmf2 in cp5001 (upload) and cp5007 (text)
08:17 vgutierrez: uploaded nginx-1.13.9-1+wmf2 to apt.wikimedia.org (stretch)
08:16 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
08:12 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
08:10 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
07:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2063 from config T230459 (duration: 00m 47s)
07:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2063 from config T230459 (duration: 00m 48s)

2019-08-13

20:43 ejegg: rolled back payments-wiki from 9ed8be0532 to 9533f70fab
20:34 ejegg: updated payments-wiki from 9533f70fab to 9ed8be0532
20:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix MachineVision provider config (duration: 00m 47s)
19:48 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
19:23 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625 (duration: 00m 58s)
19:22 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@3882ddb]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625
19:19 ppchelko@deploy1001: deploy aborted: Revert on canary (duration: 00m 18s)
19:18 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f1a562e]: Revert on canary
19:17 ppchelko@deploy1001: deploy aborted: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625 (duration: 01m 30s)
19:15 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f1a562e]: Increase cirrusSearchLinksUpdatePrioritized concurrency 150 -> 200 T220625
19:03 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
18:50 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
18:41 ebernhardson: set cpufreq scaling_governor to performance on cloudelastic100[1-4] to test any changes to indexing performance
18:38 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable MachineVision on Beta (4/4) (duration: 00m 48s)
18:34 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable MachineVision on Beta (3/4) (duration: 00m 47s)
18:33 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (fix perms) (duration: 00m 09s)
18:33 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (fix perms)
18:33 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (duration: 00m 43s)
18:32 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292
18:32 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292 (duration: 00m 36s)
18:31 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: Update Netbox to v2.6.1-wmf3 affects: T223292
18:30 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MachineVision on Beta (2/4) (duration: 00m 48s)
18:27 mholloway-shell@deploy1001: Synchronized wmf-config/extension-list: Enable MachineVision on Beta (1/4) (duration: 00m 48s)
17:44 XioNoX: set target netflow port to 2000 in eqiad
17:11 XioNoX: repool eqsin
17:06 XioNoX: rollback: disable all peering and transit on cr2-eqsin
16:57 XioNoX: reboot cr2-eqsin
16:46 XioNoX: disable all peering and transit on cr2-eqsin
16:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:25 filippo@cumin1001: START - Cookbook sre.hosts.downtime
16:25 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
16:25 filippo@cumin1001: START - Cookbook sre.hosts.downtime
16:07 ppchelko@deploy1001: Finished deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026, take 2 (duration: 10m 12s)
15:56 ppchelko@deploy1001: Started deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026, take 2
15:56 ppchelko@deploy1001: Finished deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026 (duration: 07m 35s)
15:49 ppchelko@deploy1001: Started deploy [restbase/deploy@8fca708]: Expose transform/wikitext/to/mobile-html endpoint T211026
15:46 XioNoX: fail vrrp master to cr1-eqsin
15:42 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
15:39 bblack: puppet re-enabled on lvs1014, lvs1016, icinga1001
15:35 XioNoX: depool eqsin for cr2-eqsin upgrade
15:32 bblack: disabled pupped on lvs1014, lvs1016, icinga1001 ahead of deploying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528885/ - T229621
15:32 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
15:30 XioNoX: rollback ospf + bgp changes on cr2-eqord
15:19 XioNoX: restart cr2-eqord - T227886
15:12 XioNoX: disable all peering and transit on cr2-eqord
15:01 XioNoX: increase ospf cost of cr2-eqord<->cr2-eqiad link (+1000)
14:57 ema: cp5002: reboot for kernel upgrade
14:42 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
14:42 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
14:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
14:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
14:29 XioNoX: rollback: disable all peering and transit on cr2-eqdfw
14:18 XioNoX: reboot cr2-eqdfw for software upgrade - T227886
14:14 XioNoX: disable all peering and transit on cr2-eqdfw
14:04 volans@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
14:04 volans@cumin2001: START - Cookbook sre.hosts.decommission
13:20 jbond42: rolling update of postgresql-9.6
13:07 jijiki: rolling restart hhvm on api servers in eqiad
12:57 jijiki: Restart hhvm on mw1235
12:17 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
12:08 _joe_: restarted php-fpm on mw1221
12:03 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
12:00 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
11:56 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
11:56 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
11:49 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
11:44 fsero: recreating cxserver blubber and sessionstore namespace - T228836
11:39 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
11:35 gehel: restart wdqs-blazegraph on wdqs2001
11:34 gehel: restart wdqs-updater on wdqs2001
11:30 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
11:29 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
11:25 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
11:21 fsero: recreating citoid eventgate-analytics eventgate-main mathoid namespace - T228836
11:20 fsero@: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
11:18 raynor: EU SWAT finished
11:15 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Undeploy editor gender surveys (T227793) (duration: 00m 48s)
11:13 fsero: recreating termbox namespace - T228836
11:06 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
11:04 fsero: resetting net.netfilter.nf_conntrack_tcp_timeout_time_wait to 65 in kubernetes2006
10:59 _joe_: [eqiad] downtiming zotero on icinga for 10 minutes while recreating the deployment with helmfile
10:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
10:57 oblivian@cumin1001: START - Cookbook sre.hosts.downtime
10:56 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
10:56 oblivian@cumin1001: START - Cookbook sre.hosts.downtime
10:49 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
10:44 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
10:39 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
10:39 _joe_: recreating rbac roles via helmfile
10:32 oblivian@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:29 _joe_: deleting calico deploy and configmap in kubernetes in eqiad, recreating with helmfile
10:25 jbond42: rolling update of ghostscript
10:23 fsero@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
10:10 fsero: initialize_cluster.sh kube-system kubemaster.svc.eqiad.wmnet 6443 - T228836
10:10 fsero: creating tiller in kube-system for helmfile T228836
09:58 vgutierrez: upgrading the rest of cache@upload to 8.0.3-1wm3 - T221594
08:49 marostegui: Stop MySQL on db2057 - T230394
08:48 marostegui: Remove db2057 from tendril and zarcillo T230394
07:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2057 from config T230394 (duration: 00m 47s)
07:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2057 from config T230394 (duration: 00m 48s)
06:59 volans: upgrading spicerack to 0.0.26 on cumin2001
06:49 vgutierrez: Rolling restart of fifo-log-demux and atsmtail services across cache@upload
06:38 vgutierrez: upgrading fifo-log-demux to version 0.5 in cache@upload
06:11 vgutierrez: Upgrading ATS to 8.0.3-1wm3 in cp2002, cp1076, cp3034 and cp4021 - T221594
05:47 marostegui: Stop mysql on db2050 - T230391
05:40 marostegui: Remove db2050 from tendril and zarcillo T230391
05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2050 from config, host will be decommissioned T230391', diff saved to https://phabricator.wikimedia.org/P8904 and previous config saved to /var/cache/conftool/dbconfig/20190813-053514-marostegui.json
05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2050 from config T230391 (duration: 00m 48s)
05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2050 from config T230391 (duration: 00m 48s)
05:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2122 into s7 T228969 (duration: 00m 47s)
05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2122 into s7 T228969 (duration: 00m 49s)
05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Provision db2122 into s7 T228969', diff saved to https://phabricator.wikimedia.org/P8903 and previous config saved to /var/cache/conftool/dbconfig/20190813-051019-marostegui.json

2019-08-12

23:24 XioNoX: add samplicator to buster-wikimedia repo
21:33 eileen: tools revision changed from 2a56e5e283 to 827ce3750e
20:43 eileen: civicrm revision changed from be5b5a150b to 569e52e23d, config revision is 1c76e94ac3
20:17 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to f0a2847 (duration: 05m 05s)
20:12 mbsantos@deploy1001: Started deploy [mobileapps/deploy@615004f]: Update service-mobileapp-node to f0a2847
20:08 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
19:15 mforns@deploy1001: Finished deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to 5418d3b (duration: 39m 23s)
19:14 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
18:35 mforns@deploy1001: Started deploy [analytics/refinery@5418d3b]: deploying analytics-refinery up to 5418d3b
17:42 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build (duration: 05m 04s)
17:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@8579f50]: Updated GUI, New endpoints and New Blazegraph and Updater build
15:05 jijiki: rolling restat php-fpm on mw122[4-8] - T219150
15:01 ema: cp1076, cp500[12]: restart trafficserver with compress plugin disabled
14:39 jijiki: disable puppet on mw122[4-8]
14:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Account creation throttle to 2 everywhere (T230304) (duration: 00m 47s)
13:51 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
13:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
12:51 ema: cp1076,cp5001,cp5002: ats-backend-restart to disable ATS systemd hardening features
11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: More restrictive account creation throttle (T230304) (duration: 00m 47s)
11:34 vgutierrez: restart atsmtail@backend on cp1076
11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable global abuse filters on warwiki as an emergency measure (T230304) (duration: 00m 48s)
10:59 vgutierrez: restarting trafficserver in cp5002
10:47 vgutierrez: Upgrade trafficserver to 8.0.3-1wm3 in cp5002 - T221594
10:47 jijiki: Enabling puppet and rolling restarting nginx across the fleet - T224538
10:39 jijiki: Restarting nginx on mwmaint2001.codfw.wmnet,mwmaint1002.eqiad.wmnet,scandium.eqiad.wmnet,snapshot[1005-1009].eqiad.wmnet, deploy2001.codfw.wmnet,deploy1001.eqiad.wmnet
10:28 jijiki: Disable puppet on all servers running a services_proxy - T224538
10:09 marostegui: Remove empty table globalblocks from s3 (where it exists) - T230055
10:07 vgutierrez: Upgrade trafficserver to 8.0.3-1wm3 in cp5001 - T221594
10:01 marostegui: Remove empty table wikidatawiki.globalblocks from s8 - T230055
09:36 jijiki: Disable puppet on mwmaint for 425027
09:36 marostegui: Remove empty table enwikivoyage.globalblocks from s5 - T230055
09:32 marostegui: Stop MySQL on db2043 T230311
09:24 marostegui: Remove empty table testcommonswiki. globalblocks from s4 - T230055
09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2043 from config T230311 (duration: 00m 47s)
09:22 marostegui: Remove db2043 from tendril and zarcillo T230311
09:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2043 from config T230311 (duration: 00m 48s)
09:06 jijiki: depool and pool back mw1222
08:22 elukey: restart Analytics hadoop HDFS namenodes to pick up new heap settings
08:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s3 codfw weights T220170 (duration: 00m 48s)
08:07 marostegui@cumin1001: dbctl commit (dc=codfw): 'Reorganize s3 codfw weights T220170', diff saved to https://phabricator.wikimedia.org/P8901 and previous config saved to /var/cache/conftool/dbconfig/20190812-080731-marostegui.json
07:46 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2105 as s3 codfw master (duration: 00m 47s)
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2105 to s3 codfw master T230106', diff saved to https://phabricator.wikimedia.org/P8900 and previous config saved to /var/cache/conftool/dbconfig/20190812-074314-marostegui.json
07:34 marostegui: Switchover s3 codfw master db2043 -> db2105 - T230106
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2121 into s7', diff saved to https://phabricator.wikimedia.org/P8899 and previous config saved to /var/cache/conftool/dbconfig/20190812-072617-marostegui.json
07:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2121 into s7 T228969 (duration: 00m 47s)
07:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2121 into s7 T228969 (duration: 00m 48s)
05:04 marostegui: Remove math table from s3 - T196055
05:02 marostegui: Remove math table from s1 - T196055

2019-08-11

22:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (T230304) (duration: 00m 50s)

2019-08-10

01:49 mutante: mwmaint - running (1 of 8, the one for en) refreshLinks maintenance cron manually to verify it works after switching mwscriptwikiset to PHP7.2 (T195392)
00:52 mutante: mwmaint - running update_flaggedrevs_stats - updates the flagged revs statistics table on each wiki
00:47 mutante: mwmaint - running cirrus sanitize jobs maintenance cron

2019-08-09

21:28 mutante: mwmaint - generating new captchas for ConfirmEdit extension by running generatecaptcha maintenance cron command
20:55 mutante: mwmaint - running update_special_pages maintenance cron manually
20:31 mutante: contint1001 - added entry to /etc/fstab for /mnt/docker to survive reboots ( 13 /dev/mapper/contint1001--data-docker /mnt/docker ext4 defaults 0 2$
19:46 mutante: mwdebug1001 - temp stopped puppet, editing nginx config to test making it listen on IPv6 for upstream proxies (529401) (T224538)
19:37 mutante: mwmaint - running cirrussearch maintenance jobs manually (completion indices, sanitize cirrus jobs)
18:14 elukey: add BGP peer for AS 38758 on cr1-eqsin
17:54 mutante: mwmaint - running initsitestats maintenance job - initializes or updates statistics table on all wikis
17:23 elukey: set BGP peer "BrightRidge" on cr2-eqiad
17:19 mutante: mwmaint - running purgeParserCache maintenance cron manually with PHP 7.2 - ..slowly
16:52 mutante: mwmaint - manually running updatePageTriageQueue maintenance cron with PHP 7.2
16:15 arturo: add phamhi to 'wmf' and 'ops' LDAP groups (T228942)
15:48 jijiki: Disable puppet on mw1222 and depool
11:50 ema: root@puppetmaster2001:/srv/private# su -c "export GIT_SSH=/srv/private/.git/ssh_wrapper.sh ; git push ssh://puppetmaster1001.eqiad.wmnet/srv/private master" gitpuppet
11:44 ema: puppetmaster1001: resetting last 3 /srv/private commits due to broken replication
10:38 thcipriani: gerrit restart on cobalt.
09:36 marostegui: Drop math table from s7 T196055
09:04 marostegui: Drop math table from s4 - T196055
08:58 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011 - T196055
08:51 moritzm: upgrading ghostscript on thumbor1001
08:32 marostegui: Stop MySQL on db2069 T230107
08:29 marostegui: Remove db2069 from tendril and zarcillo T230107
08:24 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011 - T196055
07:31 vgutierrez: uploaded trafficserver-8.0.3wm3 to apt.wikimedia.org (stretch) - T220383 T228135
06:19 elukey: powercycle thumbor2004 (no ssh, serial console showing a fronzen os)
05:37 marostegui: Run maintain-views script with --clean to clean up math table views - T196055
02:30 mutante: mwmaint1002 - manually running cleanup_upload_stash maintenance cron to confirm no issues with PHP 7.2 in maintenance/cleanupUploadStash.php
02:24 mutante: mwmaint1002 - manually running purge_expired_userrights maintenance cron to confirm no issues with PHP 7.2 in maintenance/purgeExpiredUserrights.php
02:17 mutante: mwmaint1002 - manually running purge_abusefilter maintenance cron

2019-08-08

23:50 Urbanecm: Evening SWAT done
23:49 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/WikiEditor/modules/jquery.wikiEditor.dialogs.config.js: SWAT: 6dcab39: Follow-up Ia75d685c: Fix the insert file dialog (T230078) (duration: 00m 50s)
23:48 mutante: mwmaint1002 - manually running purge_securepoll maintenance script
23:42 mutante: mwmaint1002 - manually running TranslatioNNotifications DigestEmailer maintenance cron
22:05 mutante: rolling out new scap version 3.12.0-1 on all of eqiad
22:02 mutante: mwdebug2002 - scap pull to test new scap, nothing to do
22:00 mutante: rolling out new scap version 3.12.0-1 on all of codfw
21:54 mutante: (purge unpublished articles from ContentTranslation older than 455 days)
21:52 mutante: mwmwaint1002 - manually running purge_old_cx_drafts maintenance job for ContentTranslation - after switching helper script to PHP 7.2
21:50 mutante: mwmaint1002 - manually running purgeUnusedProjects with PageAssessments extension to confirm no issues after switch to PHP7.2
21:40 mutante: mwmaint1002 - manually running (weekly) echo_mail cron job (user notifications) to confirm it works after switching foreachwikiindblist to use php7.2 (T195392)
21:30 mutante: rolling out new scap package 3.12.0-1 on mw-canary servers via debdeploy (T230144)
21:28 mutante: rolling out new scap package 3.12.0-1 on contint servers
21:22 mutante: built new scap version 3.12.0-1 on boron, imported packages on install1002 (apt.wm.org), copied from stretch to jessie and buster (T230144)
20:33 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
19:36 thcipriani: restart gerrit on cobalt to pick up new config
19:34 thcipriani: restart gerrit-replica on gerrit2001 to pick up new config
19:27 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.17
17:52 XioNoX: run /usr/local/sbin/restart-php7.2-fpm on mwdebug1001
17:33 fdans@deploy1001: Finished deploy [analytics/refinery@cef01d3]: deploy analytics refinery, second attempt (duration: 16m 52s)
17:21 XioNoX: add user jbond to network devices
17:16 fdans@deploy1001: Started deploy [analytics/refinery@cef01d3]: deploy analytics refinery, second attempt
16:56 ppchelko@deploy1001: Finished deploy [changeprop/deploy@069d297]: Remove workaround for ORES not supporting eventgate events T228688 (duration: 01m 24s)
16:55 ppchelko@deploy1001: Started deploy [changeprop/deploy@069d297]: Remove workaround for ORES not supporting eventgate events T228688
16:40 fdans@deploy1001: Started deploy [analytics/refinery@cef01d3]: deploying analytics refinery
15:49 XioNoX: set virtual-chassis vcp-snmp-statistics to all VC - T228824
15:13 herron: rebooting fermium (lists) for security updates
15:11 XioNoX: commit synchronize on cr1-codfw - T226422
14:52 XioNoX: continue cr1-codfw:re1 replacement - T226422
13:09 marostegui: Drop table math from s8 T196055
12:15 tarrow: EU midday SWAT done
12:15 tarrow@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase/: SWAT: Add hook to invalidate cache entries missing TermboxOption (T228978) (duration: 01m 14s)
12:01 tarrow@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Wikibase/: SWAT: Split ParserCache on Termbox (T228978) (duration: 01m 21s)
12:00 tarrow: Running SWAT a little over time because late start and slow jenkins
11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: dfeb2a9: HD logo for enwikivoyage (T230114) (duration: 00m 56s)
11:44 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: dfeb2a9: HD logo for enwikivoyage (T230114) (duration: 00m 56s)
11:31 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zhwikisource.png (T229715)
11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: be886ad: Add hd variations for zhwikiource project logo (T229715) (duration: 00m 55s)
11:28 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: be886ad: Add hd variations for zhwikiource project logo (T229715) (duration: 00m 56s)
11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9a4494a: Add Hubblesite.org and Spacetelescope.org to commons wgCopyUploadsDomains (T230083) (duration: 00m 57s)
11:05 Urbanecm: Run scap pull on mwdebug1001 to revert local modifications (T207627)
10:53 jijiki: Disable puppet, depool and pool mw1221, mw1222, mw1223 for 529061
10:46 Urbanecm: Set $wgContentHandlers["flow-board"] = $wgContentHandlers["wikitext"]; locally on mwdebug1001 to fix few bad pages (T207627)
10:43 moritzm: installing exim4 security updates on buster hosts (our exim config is not vulnerable)
09:41 moritzm: installing OpenJDK security updates on WDQS servers
09:30 jbond42: disabling puppet fleet wide
09:26 marostegui: Drop table math from labswiki (wikitech) and labtestwiki T196055
09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2069 from config T230107 (duration: 00m 55s)
09:19 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2069 from config T230107 (duration: 00m 57s)
08:45 elukey: restart hadoop namenodes on an-master100* to pick up new GC settings (CMS -> G1 switch)
08:44 moritzm: installing OpenJDK security updates on elastic* servers
08:36 marostegui: Remove math table from s5 T196055
08:13 marostegui: Stop MySQL on db2065 to test dbproxy2003
07:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2096 as codfw x1 master T220170 (duration: 00m 57s)
07:39 marostegui: Switchover x1 codfw master db2069 -> db2096 T220170
06:40 _joe_: restarting php-fpm on the application servers to pick up the change
05:54 marostegui: Stop MySQL on db2035 for decommissioning T229784
05:52 marostegui: Remove db2035 from tendril and zarcillo T229784
00:48 mutante: mwdebug2002 - sudo -i restart-php7.2-fpm
00:20 ejegg: re-enabled both recurring charge jobs
00:02 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: hack for Parsoid testing on scandium (duration: 00m 55s)

2019-08-07

23:58 tstarling@deploy1001: Synchronized w/rest.php: Creating rest.php endpoint disabled by default (duration: 00m 55s)
23:46 ejegg: disabled newer recurring charge job to test one at a time on existing recur records
23:22 mutante: elastic2054 - powercycling after it went down unexpectedly and Icinga alerted, this happened before in T227298
23:08 XioNoX: set virtual-chassis vcp-snmp-statistics on asw2-ulsfo - T228824
23:07 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Send writes for all non-private wikis to cloudelastic (duration: 01m 02s)
23:03 XioNoX: set virtual-chassis vcp-snmp-statistics on asw-a-codfw - T228824
22:50 ebernhardson: mwmaint start cirrussearch saneitize.php against all non-private group1 wikis for cloudelastic cluster
22:48 mutante: mwmaint1002 - manually running the purgeOldData cron command to verify it with PHP 7.2 for 528730 (T195392)
22:12 jgleeson: switched on all fundraising process-control except ingenico_recurring_charge
21:50 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@a151f4e]: Prepare for eventgate transition T230049 T230048 (duration: 00m 59s)
21:49 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@a151f4e]: Prepare for eventgate transition T230049 T230048
21:25 mutante: restarting gerrit service to apply config change (528769)
21:00 ebernhardson: apply transient logger settings from prod search clusters to cloudelastic
20:34 reedy@deploy1001: rebuilt and synchronized wikiversions files: labswiki back to .17
20:34 jgleeson: updated civicrm from 727a2c193b to be5b5a150b
20:32 reedy@deploy1001: rebuilt and synchronized wikiversions files: labswiki back to .16 temporarily
20:28 jgleeson: switched off fundraising process-control jobs
19:36 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.17 (duration: 00m 54s)
19:35 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.17
19:16 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert Switch property terms migration to WRITE_NEW on client wikis T225053 (duration: 00m 58s)
18:15 jijiki: Restart hhvm and php-fpm on canary mw hosts
17:54 shdubsh: install2002 add fstab entry for /srv mount - T229997
17:46 shdubsh: install2002 stop nginx and squid for resync /srv to spare disk and restore mount - T229997
17:42 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Retry - Revert "Switch high-traffic jobs to eventgate." (duration: 00m 58s)
16:40 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: JobQueue: Revert switching high-traffic jobs to eventgate (duration: 00m 55s)
16:34 mobrovac@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
16:00 thcipriani: restarting jenkins for update
15:58 jijiki: restart npre on stat1004
15:08 _joe_: freeing APCu on mw1270, which has degraded performance
14:24 marostegui: Reboot dbproxy2003 for kernel upgrades
14:16 jbond42: puppet *now* re-enabled
14:16 jbond42: puppet not re-enabled
14:01 jbond42: disable puppet fleet wide for puppetdb restart
13:57 marostegui: Remove labsdb1004 and labsdb1005 from zarcillo database (instance table), as those hosts were decommissioned months ago
13:55 marostegui: Remove labsdb1004 and labsdb1005 from zarcillo database, as those hosts were decommissioned months ago
13:48 marostegui: Apply grants for dbproxy2003 on m3 - T202367
13:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid backend settings
11:48 Amir1: EU SWAT is done
11:37 kart_: Updated cxserver to 2019-08-06-100812-production (T227571)
11:33 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_NEW on client wikis (T225053) (duration: 00m 56s)
11:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
11:26 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable AMC on all wikipedias (T228916) (duration: 00m 55s)
11:26 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
11:22 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
11:09 marostegui: Restart gerrit
10:11 moritzm: deleting poolcounter1001, poolcounter1003, poolcounter2001, poolcounter2002 in Ganeti (T224572)
10:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
10:03 jmm@cumin2001: START - Cookbook sre.hosts.decommission
09:14 marostegui: Drop math table from s6 - T196055
08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2131 into x1 T228969 (duration: 00m 55s)
08:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2131 into x1 T228969 (duration: 00m 56s)
08:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
08:37 jmm@cumin2001: START - Cookbook sre.hosts.decommission
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2130 into s1 - T228969', diff saved to https://phabricator.wikimedia.org/P8877 and previous config saved to /var/cache/conftool/dbconfig/20190807-080059-marostegui.json
07:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
07:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1100 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P8876 and previous config saved to /var/cache/conftool/dbconfig/20190807-073349-marostegui.json
07:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
07:31 jmm@cumin2001: START - Cookbook sre.hosts.decommission
07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2130 into s1 T228969 (duration: 00m 56s)
07:27 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2130 into s1 T228969 (duration: 00m 55s)
05:57 marostegui: Stop MySQL on db1071 - T229381
05:55 marostegui: Remove db1071 from tendril and zarcillo - T229381
05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1071 from config T229381 (duration: 00m 55s)
05:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1071 from config T229381 (duration: 00m 57s)
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1100 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P8875 and previous config saved to /var/cache/conftool/dbconfig/20190807-053903-marostegui.json
00:48 mutante: restarting gerrit to apply config change 528276 to exclude some projects from github replication
00:21 mutante: gerrit2001 - restarting gerrit to apply 528276

2019-08-06

23:51 catrope@deploy1001: Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 56s)
23:50 catrope@deploy1001: Synchronized php-1.34.0-wmf.17/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature (T229795) (duration: 00m 55s)
23:49 catrope@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Flow/includes/Import/OptInController.php: Unbreak disabling of Flow beta feature (T229795) (duration: 00m 56s)
23:36 mutante: phabricator - added ssingh to acl*sre-team (group 29), WMF-NDA-requests (group 974) and WMF-NDA (group 61) (T229860)
23:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 55s)
23:24 catrope@deploy1001: Synchronized static/images/project-logos/: Update HD logos for enwikisource and sourceswiki (T229769) (duration: 00m 56s)
23:19 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch updateBetaFeaturesUserCounts job to eventgate (T228705) (duration: 00m 57s)
23:12 eileen: civicrm revision changed from 2e03f9bb1e to 727a2c193b, config revision is 84b785d41c
22:33 ebernhardson: restart mjolnir-kafka-daemon across all elasticsearch servers
22:25 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 05m 35s)
22:19 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9e95ab4]: Deploy latest mjolnir daemon to handle bulk imports via swift
21:53 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 16m 35s)
21:36 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8e513f6]: Deploy latest mjolnir daemon to handle bulk imports via swift
21:35 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift (duration: 01m 50s)
21:34 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@860fb33]: Deploy latest mjolnir daemon to handle bulk imports via swift
21:28 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
20:17 subbu: repooled wtp2019 ( after papaul finished upgrade as part of T221572 )
19:52 papaul: shutting down wtp2019 for firmware upgrade
19:50 herron: disabling puppet on logstash collectors for rolling deploy of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528306/ T166107
19:42 subbu: depooled wtp2019 ( to assist papaul with T221572 )
19:22 thcipriani: gerrit restart on cobalt
19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.17
18:38 brennen@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache (duration: 19m 02s)
18:19 brennen@deploy1001: Started scap: testwiki to php-1.34.0-wmf.17 and rebuild l10n cache
18:13 brennen@deploy1001: Pruned MediaWiki: 1.34.0-wmf.14 [keeping static files] (duration: 08m 28s)
17:37 accraze@deploy1001: Finished deploy [ores/deploy@d08fa62]: T229848 (duration: 17m 21s)
17:20 accraze@deploy1001: Started deploy [ores/deploy@d08fa62]: T229848
17:14 volans: uploaded spicerack_0.0.26-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
16:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=codfw
16:52 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
16:50 brennen: cutting branch for 1.34.0-wmf.17
16:50 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=codfw
16:50 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=codfw
16:48 @: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'analytics' .
16:47 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=mathoid,name=codfw
16:43 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=codfw
16:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Re-sync enable group1 on cloudelastic, job runners are claiming its not enabled while app servers are sending jobs (duration: 00m 47s)
16:39 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
16:37 @: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
16:36 @: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
16:33 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
16:33 @: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
16:33 @: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
16:32 @: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
16:31 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
16:19 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Turn on cloudelastic writes for group1 (duration: 00m 47s)
16:08 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=citoid,name=codfw
15:13 moritzm: installing bind9 security updates (client-side tools/libs only) for jessie
15:04 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s all
14:55 moritzm: rebooting mwlog1001 for kernel update
14:55 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo cumin -p99 -b100 'A:all' 'apt-get update'
14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:52 herron: restarting logstash service on logstash1007 to pick up puppet managed log4j2 config
14:50 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-08-06-conftool.yaml -s mw-canary
14:45 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.1.4-2+deb10u1_amd64.changes
14:44 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥☕ sudo -E reprepro -C main include stretch-wikimedia conftool_1.1.4-2_amd64.changes
14:37 cdanis: ✔️ cdanis@install1002.wikimedia.org ~/conftool-1.1.4-2 🕥 sudo -E reprepro -C main include jessie-wikimedia conftool_1.1.4-2+deb8u1_amd64.changes
14:36 marostegui: Start mysql on db1100 after on-site maintenance - T228732
12:30 elukey: roll restart cassandra on aqs for openjdk-8 upgrades
12:06 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
12:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
11:49 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
11:49 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
11:36 Urbanecm: EU SWAT done
11:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: 8cc96db: Better handling of DNONE (T214674, T228677) (duration: 00m 48s)
11:11 moritzm: rebooting install1002 to pick up MDS-enabled qemu
11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable EntitySchema in production wikidata (duration: 00m 48s)
10:52 moritzm: rebooting install2002 to pick up MDS-enabled qemu
10:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:07 moritzm: rebooting etherpad1001 to pick up MDS-enabled qemu
10:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:59 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:59 filippo@cumin1001: START - Cookbook sre.hosts.downtime
09:59 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
09:58 filippo@cumin1001: START - Cookbook sre.hosts.downtime
08:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
08:39 marostegui: Add db2130 to tendril and zarcillo T228969
08:22 @: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
07:27 marostegui: Stop MySQL on db1100 before powering the host off - T228732
07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool for firmware and BIOS upgrade T228732', diff saved to https://phabricator.wikimedia.org/P8869 and previous config saved to /var/cache/conftool/dbconfig/20190806-072720-marostegui.json
07:10 onimisionipe: pool maps1001. Postgres init complete - T229788
05:59 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/CheckUser: Fix T229893 (duration: 00m 47s)
05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2127 into s3 T228969', diff saved to https://phabricator.wikimedia.org/P8868 and previous config saved to /var/cache/conftool/dbconfig/20190806-055357-marostegui.json
05:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2127 into s3 T228969 (duration: 00m 48s)
05:34 marostegui: Restart wikibugs
05:06 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1010 T222978
03:58 ebernhardson: start importing group[12] to cloudelastic from mwmaint1002
02:08 eileen: civicrm revision changed from 857dcc9461 to 2e03f9bb1e, config revision is 84b785d41c
02:05 MaxSem: Creating local accounts for Community Tech bot on every Wikipedia

2019-08-05

23:34 mutante: mwmaint1002 - remove getJobQueueLengths.php from www-data's crontab (T195392)
23:03 Urbanecm: Evening SWAT done
23:03 urbanecm@deploy1001: Synchronized wmf-config/ProductionServices.php: SWAT: 87b428d: Repoint cloudelastic at LB dns (T220625) (duration: 00m 48s)
21:55 papaul: powering down wtp2011 for BIOS upgrade
21:39 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s all
21:35 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s eqsin
21:29 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin -p99 -b100 'A:all' 'apt-get update'
21:28 mutante: 🔔 scandium - ree-enabled icinga notifications for various services
21:27 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo debdeploy deploy -u 2019-08-05-conftool.yaml -s mw-canary
21:25 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠🍺 sudo -E reprepro -C main include jessie-wikimedia conftool-1.1.4-1/conftool_1.1.4-1+deb8u1_amd64.changes
21:25 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠🍺 sudo -E reprepro -C main include buster-wikimedia conftool-1.1.4-1/conftool_1.1.4-1+deb10u1_amd64.changes
21:24 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕠 sudo -E reprepro -C main include stretch-wikimedia conftool-1.1.4-1/conftool_1.1.4-1_amd64.changes
21:22 ebernhardson: start importing group0 to cloudelastic from mwmaint1002
20:49 ebernhardson: nuke all search indices on cloudelastic preparing for fresh imports and live updates T220625
20:34 arlolra: Updated Parsoid to 7232dff (T228223)
20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@d3a2937]: Updating Parsoid to 7232dff (duration: 09m 02s)
20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@d3a2937]: Updating Parsoid to 7232dff
20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@e774a05]: Update mobileapps to c713c2e (duration: 04m 51s)
20:01 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@e774a05]: Update mobileapps to c713c2e
19:51 gehel: depool wdqs1005 - T229876
19:35 thcipriani: gerrit restart on cobalt for configuration updates
19:34 bblack: fixing up cloudelastic LVS IPv6 stuff on lvs1014, lvs1016, cloudelastic* - possible monitoring noise
19:33 thcipriani: gerrit restart for gerrit-replica on gerrit2001
18:44 Urbanecm: Morning SWAT done
18:39 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: d358f17: Revert "Better handling of DNONE" (T214674, T228677) (duration: 00m 47s)
18:32 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/AbuseFilter/: SWAT: 936a462: Better handling of DNONE (T214674, T228677) (duration: 00m 47s)
18:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/: SWAT: 3ee0e84: Temporarily log search to two schemas (duration: 00m 47s)
18:25 Urbanecm: Deployed patch for T207094
18:21 urbanecm@deploy1001: Synchronized dblists/: SWAT: a9e4ed8: Remove related-articles-footer-blacklisted-skins.dblist (T229644, 3/3) (duration: 00m 46s)
18:20 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a9e4ed8: Remove related-articles-footer-blacklisted-skins.dblist (T229644, 2/3) (duration: 00m 47s)
18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: a9e4ed8: Remove related-articles-footer-blacklisted-skins.dblist (T229644, 1/3) (duration: 00m 49s)
18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 254ecc1: Switch testwiki to use kask (only) for sessions (T222099) (duration: 00m 48s)
18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: e44a6e6: Enable editor gender surveys (T227793) (duration: 00m 48s)
18:06 onimisionipe: reinit postgres on maps1001 - T229788
17:33 jijiki: Pool restbase2009 - T227408
17:28 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
16:53 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
16:53 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
16:52 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
16:37 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
16:32 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
16:22 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
16:22 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
16:18 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
16:16 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
16:10 fsero: recreating citoid eventgate-analytics eventgate-main mathoid sessionstore namespaces and redeploying from helmfile T228837
16:06 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
16:04 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
16:02 crusnov@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
15:58 Urbanecm: Deploy patch for T200104
15:41 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
15:36 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
15:32 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
15:27 fsero: recreating zotero and termbox namespaces and services from helmfile codfw - T228837
15:26 fsero: recreating zotero and termbox from helmfile codfw - T228837
15:21 marostegui: Add db2127 to tendril and zarcillo (s3) - T228969
15:18 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
14:32 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1010 T222978
14:24 papaul: shut down rstbase2009 for battery replacement
14:12 fsero@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
14:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime
14:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:07 jiji@cumin1001: START - Cookbook sre.hosts.downtime
14:06 jijiki: Depool and restart restbase2009 for maint - T227408
14:05 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
14:04 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
14:00 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
13:57 fsero@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
13:56 fsero: deploying calico controller in codfw via helmfile - T228837
13:42 fsero: deploying tiller in kube-system for helmfile changes - T228837
13:37 volans: run cumin 'A:cumin' 'rm -v /usr/local/sbin/{wmf-upgrade-varnish,wmf-upgrade-and-reboot,wmf-downtime-host,wmf-decommission-host}' T205886
13:28 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
13:16 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
13:01 jbond42: rolling update of openjdk-8 on restbase
12:44 moritzm: restarting cassandra on restbase-dev1004
12:44 moritzm: restarting cassandra on restbase-dev1040
12:33 moritzm: uploaded openjdk-8 u222 for jessie-wikimedia
12:26 Krinkle: mwscript deleteEqualMessages.php --wiki fywiktionary (requested at m:Steward_requests/Miscellaneous)
12:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 48s)
12:01 Urbanecm: EU SWAT done
11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0032b0a: Enable Page Previews as default on hewikivoyage (T222017) (duration: 00m 47s)
11:43 jbond@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
11:43 jbond@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
11:42 jbond@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97)
11:42 jbond@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
11:38 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/MobileFrontend/: SWAT: b7ae4fb: Revert "[AMC] [desktop] [mobile] use AMC by default for desktop users" (T229722) (duration: 00m 49s)
11:33 marostegui: Upgrade MySQL on db2074 db2057 db2050 db2035 db2098
11:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: SWAT: 3ecaa57: Add only needed entity usages in AddUsagesForPageJob (T226818, T205045) (duration: 01m 12s)
11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 9eb74c2: Define import sources for fawiki (T229717) (duration: 00m 48s)
10:40 jbond42: update java on sessionstore
10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 46s)
10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
10:27 ema: upload fifo-log-demux 0.5 to stretch-wikimedia
10:12 jbond42: rolling update of openjdk on maps servers
09:30 marostegui: Stop MySQL on db2105 to change binlog format
09:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
09:07 arturo: downtime toolschecker for 5hours
09:05 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
08:56 moritzm: installing vim security updates for jessie (stretch/buster already fixed)
08:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2035 from config T229784 (duration: 00m 46s)
08:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2035 from config T229784 (duration: 00m 47s)
08:43 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
08:32 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8861', previous config saved to /var/cache/conftool/dbconfig/20190805-083254-marostegui.json
08:21 marostegui: Switchover s2 codfw master from db2035 to db2107 - T221533 T220170
07:53 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s2 T228969 (duration: 00m 47s)
07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Reorganize s2 T228969 (duration: 00m 48s)
07:52 marostegui@deploy1001: sync-file aborted: Reorganize s2 T228969 (duration: 00m 06s)
07:49 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8859', previous config saved to /var/cache/conftool/dbconfig/20190805-074930-marostegui.json
07:45 moritzm: installing unzip regression DLA for jessie
07:43 moritzm: removed orespoolcounter[12]00[12] from debmonitor T227640
07:23 marostegui: Move db2095:3312 from db2063 to db2126 - T228969
05:58 marostegui: Update rack column on zarcillo.servers for the new servers T229683
05:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2124 into s6 T228969 (duration: 00m 46s)
05:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2124 into s6 T228969 (duration: 00m 49s)
05:28 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8858', previous config saved to /var/cache/conftool/dbconfig/20190805-052839-marostegui.json

2019-08-04

18:45 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: labs-only noop - f740f89c594979 (duration: 00m 50s)

2019-08-03

12:02 gilles: purging ruwiki articles on mwmaint1002
11:30 gilles: purging eswiki articles on mwmaint1002
10:01 ema: cp1085: restart varnish-be
09:36 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216594 Renew origin trial tokens (duration: 00m 48s)
00:40 ejegg: rolled back fundraising python tools from 493a38f9e0 to 2a56e5e283

2019-08-02

23:58 mutante: scandium - apt-get remove --purge prometheus-hhvm-exporter - not needed here, no HHVM (T228069)
23:16 XioNoX: Make the Level3 link between eqiad-knams primary - T228827
23:06 mutante: mwdebug1001/mwdebug1002 - restart-php7.2-fpm - low opcache
20:48 sbassett: Deployed security patch for T229541
20:14 Urbanecm: Run mwscript deleteEqualMessages.php --wiki=cswiki --delete
19:24 mutante: gerrit2001 - re-enabling puppet, starting as slave for the first time ever, thanks to codfw dbproxy, gerrit service running (T176532)
18:37 mutante: gerrit2001 - disabling puppet, stopping gerrit service
18:36 mutante: adding gerrit2001 to ferm rules on dbproxy for misc
18:14 Lucas_WMDE: recached all WikibaseView messages in ResourceLoader for T229604, cf. https://w.wiki/6kc
17:46 XioNoX: flap NTT link in eqsin
17:42 lucaswerkmeister-wmde@deploy1001: Finished scap: Fix WikibaseView i18n globals (T229604) (duration: 16m 51s)
17:26 XioNoX: add avoid_path to cr1/2-eqsin
17:25 lucaswerkmeister-wmde@deploy1001: Started scap: Fix WikibaseView i18n globals (T229604)
17:19 krinkle@deploy1001: Synchronized docroot/noc/db.php: a75d23ecb1b (duration: 00m 47s)
17:10 krinkle@deploy1001: Synchronized docroot/noc/db.php: ee528e8 (duration: 00m 48s)
16:42 XioNoX: replace rhenium with netflow1001 netflow target + iBGP peer on all routers
15:52 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@250f711]: Fix MCS production crashers (T229521, T229630) (duration: 04m 41s)
15:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@250f711]: Fix MCS production crashers (T229521, T229630)
15:14 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
15:12 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
14:14 mforns@deploy1001: Finished deploy [analytics/refinery@b50a939]: deploying refinery up to b50a939 (rollback of cassandra and edit_hourly hive2 actions to unbreak production) (duration: 16m 47s)
13:57 mforns@deploy1001: Started deploy [analytics/refinery@b50a939]: deploying refinery up to b50a939 (rollback of cassandra and edit_hourly hive2 actions to unbreak production)
13:54 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
13:45 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=api_appserver,dc=eqiad,service=nginx,name=mw12[23].*
12:33 marostegui: Restarted wikibugs a few minutes ago as it was not sending anything on IRC
11:56 Amir1: aborted l10nupdate
11:54 Amir1: start of l10nupdate
11:48 ladsgroup@deploy1001: scap sync-l10n completed (1.34.0-wmf.16) (duration: 00m 44s)
11:39 ladsgroup@deploy1001: Finished scap: Rebuilding l10n cache (duration: 05m 06s)
11:34 ladsgroup@deploy1001: Started scap: Rebuilding l10n cache
10:51 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: Revert "fix eslint errors in lib after moving submodule files into lib" (duration: 01m 08s)
10:01 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
09:22 marostegui: Compress s7 on labsdb1010 - T222978
09:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 48s)
09:12 elukey: umount /sys/kernel/debug/tracing on analytics1043
08:57 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
08:56 @: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
08:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2129 to s6 (duration: 00m 46s)
07:56 marostegui@cumin2001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8852', previous config saved to /var/cache/conftool/dbconfig/20190802-075548-marostegui.json
07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db2129 to the config T228969 (duration: 00m 47s)
07:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db2129 to the config T228969 (duration: 00m 47s)
07:43 marostegui: Restart hhvm on mw1226
07:40 _joe_: restarting php-fpm on mw1270, with 80 pms - static, apc 6 GB no ttl
07:38 _joe_: disabling puppet on mw1270 for testing of different php settings
07:21 marostegui: Add db2124 to tendril and zarcillo T228969
07:00 _joe_: running systemd-tmpfiles --create nutcracker.conf on scandium
06:46 vgutierrez: upgrading acme-chief to version 0.20 in acme-chief test instances - T229096
05:21 vgutierrez: uploaded acme-chief 0.20 to apt.wikimedia.org (buster) - T229096
05:10 marostegui: Stop MySQL on db2058 for decommissioning T229543
05:06 marostegui: Remove db2058 from tendril and zarcillo T229543

2019-08-01

23:32 Urbanecm: Evening SWAT done
23:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 819073a: Add `autopatrolled` group to az wikisource (T229371) (duration: 00m 49s)
23:29 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: 8aca0eb: Remove the "autoreview" user group from ru.wikipedia (T229596) (duration: 00m 47s)
23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cf01272: Add importing to english wikiquote (T228607) (duration: 00m 48s)
23:10 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T229614: Pass proper types to eventlogging to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
22:52 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to 2ee48ab (duration: 04m 34s)
22:47 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5ebf93e]: Update mobileapps to 2ee48ab
22:17 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/WikimediaEvents/extension.json: T229614: Update eventlogging schema version to resolve eventlogging errors in wmf.16 (duration: 00m 47s)
22:13 mutante: scandium apt-get autoremove
22:13 mutante: scandium apt-get remove --purge wikimedia-lvs-realserver (T228069)
21:48 mutante: scandium - apt-get remove --purge hhvm* (T228069)
21:23 brennen@deploy1001: Synchronized php: group1 and group2 to 1.34.0-wmf.16 (duration: 00m 46s)
21:22 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.16
20:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/Revision/RevisionRenderer.php: T229589 - 3f1b32e (duration: 00m 50s)
20:47 mutante: scandium - turning into an mw appserver
20:46 mutante: puppetmaster: create mcrouter certs for scandium.eqiad.wmnet needed to make it an appserver (https://wikitech.wikimedia.org/wiki/Mcrouter#Generate_certs_for_a_new_host) (T228069)
20:29 bblack: restart pybal on lvs1014
19:57 bblack: lvs1016 - restart pybal for slight LVS config change for cloudelastic - T224324
19:40 brennen@deploy1001: Synchronized php: Revert group1 and group2 back to 1.34.0-wmf.15 (duration: 00m 53s)
19:39 twentyafterfour: finished phabricator database dump
19:34 bblack: lvs1014 - puppetize and restart pybal for cloudelastic LVS - T224324
19:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 and group2 to 1.34.0-wmf.15
19:20 brennen: rolling back to wfm.15 on group1 and group2 while we investigate T229575
19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.16
18:52 mutante: scandium (parsoid testing) - added mw application server roles - puppet work / maintenance
18:47 mutante: stat1004 - starting nagios-nrpe-server which got killed again - jbd2/md0-8 invoked oom-killer
18:32 bblack@puppetmaster1001: conftool action : set/pooled=yes; selector: name=^cloudelastic.*
18:30 bblack: lvs1016: puppet re-enabled, pybal restarted, cloudelastic deploy - T224324
18:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 469c42d: Switch testwiki to read sessions from kask, with fallback to redis (T222099) (duration: 00m 55s)
17:42 bblack: disable puppet on lvs1014 + lvs1016 for cloudelastic LVS merge - T224324
17:36 twentyafterfour: running db dump on phab1003 (in tmux). command: sudo ./bin/storage dump --output /srv/dumps/phabricator_db_20190801.sql.gz --compress
16:05 XioNoX: power down msw1-codfw
15:47 XioNoX: start codfw mgmt work - T228112
15:40 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
15:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
15:16 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup (T229482) (duration: 01m 14s)
15:13 mholloway-shell@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/Wikibase: Do not warn about entity that was not found in WikiPageEntityRevisionLookup (T229482) (duration: 01m 20s)
14:51 herron: performing rolling restarts of eqiad logstash cluster for security updates
14:38 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: Iaaa1238 comment-only no-op change (dbctl to 100% of production!) (duration: 00m 55s)
14:22 cdanis@deploy1001: Synchronized wmf-config/etcd.php: Iaaa1238 dbctl to 100% of production! (duration: 00m 54s)
12:38 jbond42: add cp1008 to canary hosts https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/puppetmaster/frontend.yaml#L22
12:18 marostegui: Rename math table on db1089 (enwiki) - T196055
11:42 Urbanecm: EU SWAT done
11:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c51baa3: Add files.geocollections.info to the wgCopyUploadsDomains whitelist for commonswiki (T229547) (duration: 00m 55s)
11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 1e4458e: Add nlm.nih.gov to the wgCopyUploadsDomains whitelist for commonswiki (T229470) (duration: 00m 53s)
11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c164132: Revert "Revert "Switch property terms migration to WRITE_NEW on production wikidata"" (T225053) (duration: 00m 55s)
11:19 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/ExternalGuidance/: SWAT: 9402c36: Provide the messages in the target language of translation (T228019) (duration: 00m 56s)
11:09 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: 7db98f3: flaggedrevs.php: Remove useless wgAddGroups/wgRemoveGroups declarations (duration: 00m 55s)
11:05 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: aa82657: flaggedrevs.php: Allow wikis to remove ability to promote to/demote from autoreview/editor (T229346) (duration: 00m 54s)
10:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2058 from config T229543 (duration: 00m 57s)
10:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2058 from config T229543 (duration: 00m 55s)
10:12 jbond42: rolling upgrade for patch
10:10 _joe_: repooling mw1348 after reimaging as pure-php7
07:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2126 into s2 T228969 (duration: 00m 55s)
07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2126 into s2 T228969 (duration: 00m 54s)
07:35 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8844', previous config saved to /var/cache/conftool/dbconfig/20190801-073459-marostegui.json
07:29 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1348.eqiad.wmnet
07:27 _joe_: removing mw1348 from rotation - reimaging for T228976
07:10 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8843', previous config saved to /var/cache/conftool/dbconfig/20190801-071022-marostegui.json
07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1112 (duration: 00m 54s)
06:59 elukey: install python3-docopt manually on lithium to test check_anycast_healthchecker
06:51 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1270.eqiad.wmnet
06:42 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1270.eqiad.wmnet
06:42 _joe_: depooling mw1270 while migrating it to pure-php7
06:28 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=mw1348.eqiad.wmnet
06:19 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1348.eqiad.wmnet
06:18 _joe_: depooling mw1348 while moving it to no hhvm support.
00:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/resources/Resources.php: acfff67 (duration: 00m 54s)
00:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/specials/SpecialJavaScriptTest.php: acfff67 (duration: 00m 54s)
00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.16/includes/resourceloader/ResourceLoader.php: acfff67 (duration: 00m 55s)
00:28 krinkle@deploy1001: sync-file aborted: composer.json composer.lock dblists debug.json docroot errorpages fc-list fonts images langlist langlist-labs multiversion php php-1.34.0-wmf.13 php-1.34.0-wmf.14 php-1.34.0-wmf.15 php-1.34.0-wmf.16 phpcs.xml phpunit.xml portals private README requirements.txt robots.txt rpc scap setup.py src static test-requirements.txt tests tox.ini typos vendor w wikiversions.json wikiversions-labs.js

2019-07-31

23:34 eileen: civicrm revision changed from 218328b29d to 857dcc9461, config revision is 84b785d41c
23:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@db795ec]: Update mobileapps to b8c4166 (duration: 04m 21s)
23:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@db795ec]: Update mobileapps to b8c4166
23:14 Urbanecm: Evening SWAT done
23:12 urbanecm@deploy1001: Synchronized wmf-config/: SWAT: Add kask session storage configuration. Use only on testwiki, (ede989e, 862df8d, T222099) (duration: 00m 56s)
21:56 ejegg: updated fundraising python tools from 2a56e5e283 to 493a38f9e0
21:32 XioNoX: set cr1-eqiad's netflow target port to 2100 (nfacctd)
20:58 brennen@deploy1001: Synchronized php: Revert group1 back to 1.34.0-wmf.15 (duration: 00m 53s)
20:55 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 back to 1.34.0-wmf.15
20:48 brennen@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.16 (duration: 00m 54s)
20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.16
20:37 brennen@deploy1001: Synchronized php-1.34.0-wmf.16/skins/MinervaNeue/includes/MinervaHooks.php: Limit Recent Changes disable-table mode to Minerva skin T228280 (duration: 00m 56s)
20:32 mdholloway: mobileapps deploy failed, investigating
20:32 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to 5eb9068 (duration: 01m 39s)
20:30 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@7c6ce69]: Update mobileapps to 5eb9068
20:01 mbsantos@deploy1001: Finished deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to 529c493 (T227124) (duration: 01m 43s)
19:59 mbsantos@deploy1001: Started deploy [proton/deploy@ed6ebd8]: Update chromium-renderer to 529c493 (T227124)
19:55 ejegg: updated payments-wiki from 70b432d309 to 9533f70fab
18:49 mutante: phab1003 - manually running project_changes.sh to create mail to phabricator-reports@lists (T228575)
17:46 cdanis@deploy1001: Synchronized wmf-config/etcd.php: I45b705c8 disable dbctl on half of canary hosts (duration: 00m 57s)
17:21 volans@deploy1001: Synchronized wmf-config/db-codfw.php: depool db2058, I/O error, T229449 (duration: 00m 54s)
17:15 volans@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8841', previous config saved to /var/cache/conftool/dbconfig/20190731-171536-volans.json
16:52 Urbanecm: Morning SWAT done
16:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable MobileWebUIActionsTracking schema with 50% sampling rate (T220016) (duration: 00m 58s)
16:37 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikimediaEvents/: SWAT: Improved MobileUIActions tracking schema (T220016) (duration: 00m 54s)
16:26 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/GrowthExperiments/: SWAT: Only set relevant title on mobile skin (T229263, T225659) (duration: 00m 51s)
16:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: SWAT: Only set relevant title on mobile skin (T229263, T225659) (duration: 00m 56s)
16:14 bblack: deploying VCL for H/2 coalesce 421 responses - T207340
16:12 marostegui: Poweroff pc2010 for on-site maintenance T227552
15:52 mforns@deploy1001: Finished deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to eb2d9b0 (duration: 13m 09s)
15:45 bstorm_: restarting nfs service on labstore1004
15:39 mforns@deploy1001: Started deploy [analytics/refinery@eb2d9b0]: deploying analytics-refinery up to eb2d9b0
15:24 thcipriani: restarting jenkins for update
15:22 ema: cp-ats: upgrade fifo-log-demux to 0.4 and restart atsmtail@backend.service T229414
15:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.34.0-wmf.16
15:15 ema: upload fifo-log-demux 0.4 to stretch-wikimedia T229414
15:03 XioNoX: power down re1:cr1-codfw (backup) - T226422
14:57 godog: ms-be2018 disablepd 1I:1:1 - T225630
14:47 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8838', previous config saved to /var/cache/conftool/dbconfig/20190731-144731-marostegui.json
14:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1112 (duration: 00m 46s)
14:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1078 after upgrade and alter (duration: 00m 47s)
14:28 herron: beginning rolling reboots of codfw logstash hosts for security updates
14:28 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8837', previous config saved to /var/cache/conftool/dbconfig/20190731-142814-marostegui.json
14:18 cdanis@deploy1001: Synchronized wmf-config/etcd.php: I02d66736 expand dbctl to 25% of the fleet (duration: 00m 46s)
14:04 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
14:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1078 after upgrade and alter (duration: 00m 46s)
14:01 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8836', previous config saved to /var/cache/conftool/dbconfig/20190731-140124-marostegui.json
13:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1078 after upgrade and alter (duration: 00m 46s)
13:51 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8835', previous config saved to /var/cache/conftool/dbconfig/20190731-135129-marostegui.json
13:49 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
13:46 ema: cp4021: test fifo-log-demux 0.4 T229414
13:37 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
13:35 herron: beginning rolling restarts of codfw kafka-main brokers for security updates
13:32 jbond42: rolling update of exim
13:31 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
13:27 elukey: roll restart of zookeeper on conf100[4-6] and conf200[1-3] for openjdk upgrades
13:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 for alter and upgrade (duration: 00m 47s)
13:19 marostegui: Upgrade db1078
13:19 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8834', previous config saved to /var/cache/conftool/dbconfig/20190731-131900-marostegui.json
13:15 marostegui: Drop abuse_filter_log.afl_log_id in s3 eqiad - T226851
13:12 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
13:05 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
12:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
12:53 marostegui: Drop abuse_filter_log.afl_log_id from s3 codfw with replication (this will cause lag in s3 codfw) - T226851
12:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
12:22 Amir1: EU SWAT is done
12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 47s)
12:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 47s)
12:05 ladsgroup@deploy1001: sync-file aborted: SWAT: Switch property terms migration to WRITE_NEW on production wikidata (T225053) (duration: 00m 03s)
11:56 jbond42: enable puppet fleet wide https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645 deployed
11:52 kartik@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/ExternalGuidance: SWAT: 526637|Provide the messages in the target language of translation (T228019) (duration: 00m 46s)
11:41 jbond42: disable puppet to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/526645
{{safesubst:SAL entry|1=11:40 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:526646|Fix typo in name of config (T225055) (duration: 00m 47s)}}
11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Decrease idwiki MT threshold for publishing (T228971) (duration: 00m 48s)
11:16 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable other statements on Commons (duration: 00m 48s)
10:08 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
10:05 jbond42: rolling back https://gerrit.wikimedia.org/r/q/c9f876e9990fb171f27616515e7d125824d7a6ac
09:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
09:49 _joe_: pruning orphaned images on contint1001
08:37 elukey: restart Yarn Resource Managers on an-master100[12] to pick up the new openjdk version
08:06 _joe_: running puppet (and restarting mtail) on all eqiad appservers
08:05 elukey: restart hadoop Namenodes on an-master100[12] to pick up new heap settings and new openjdk
07:40 marostegui: Drop abuse_filter_log.afl_log_id in s1 eqiad - T226851
07:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=codfw), diff saved to 'https://phabricator.wikimedia.org/P8833', previous config saved to /var/cache/conftool/dbconfig/20190731-073608-marostegui.json
07:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2125 into s2 T228969 (duration: 00m 47s)
07:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2125 into s2 T228969 (duration: 00m 49s)
07:29 elukey: restart-hhvm on mw1290
07:25 marostegui: Add db2125 to tendril and zarcillo T228969
05:44 marostegui: Drop abuse_filter_log.afl_log_id from s1 codfw with replication (this will cause lag in s1 codfw) - T226851
05:39 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify that db2128 is the new sanitarium master (duration: 00m 47s)
05:00 marostegui: Compress s6 on labsdb1010 - T222978
04:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/tests/phpunit/includes/parser/ParserOutputTest.php: T229366 (duration: 00m 46s)
03:59 tstarling@deploy1001: Synchronized php-1.34.0-wmf.16/includes/parser/ParserOutput.php: T229366 (duration: 00m 47s)
02:24 TimStarling: on mwmaint1002 reverted previous change using scap pull
01:08 TimStarling: on mwmaint1002, editing wikiversions.json locally to move wikimania2006wiki to .16, to investigate T229366
00:24 eileen: tools revision changed from 4910f1507c to 2a56e5e283
00:04 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/CentralNotice/: T227711 among others (duration: 00m 47s)
00:01 catrope@deploy1001: Synchronized php-1.34.0-wmf.16/extensions/CentralNotice/: T227711 among others (duration: 00m 48s)

2019-07-30

23:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Enable MobileWebUIActionsTracking schema with 50% sampling rate" (T220016) (duration: 00m 47s)
23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Specify CentralAuth and OAuth session storage separately from per-wiki session storage (T227097, T227696) (duration: 00m 47s)
23:06 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable MobileWebUIActionsTracking schema with 50% sampling rate (T220016) (duration: 00m 48s)
22:26 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - T226331 (duration: 00m 09s)
22:26 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 3) - T226331
22:23 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - T226331 (duration: 00m 10s)
22:23 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 (pass 2) - T226331
22:19 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331 (duration: 00m 47s)
22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331
22:18 crusnov@deploy1001: Finished deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331 (duration: 00m 20s)
22:18 crusnov@deploy1001: Started deploy [netbox/deploy@b76139e]: Upgrade Netbox to v2.6.1 - T226331
22:15 eileen: tools revision changed from 8a464c4f0d to 4910f1507c (reverted pgmysql switch)
22:13 ppchelko@deploy1001: Finished deploy [changeprop/deploy@76b6639]: Report 400 errors by default. T229277 (duration: 01m 29s)
22:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@76b6639]: Report 400 errors by default. T229277
22:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060, take 2, feeds timed out (duration: 01m 03s)
22:00 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060, take 2, feeds timed out
22:00 ppchelko@deploy1001: Finished deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060 (duration: 18m 40s)
21:42 ppchelko@deploy1001: Started deploy [restbase/deploy@c7e0e33]: Enable language variants filter for PCS endpoints. T229060
19:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.34.0-wmf.15
19:19 mutante: restbase2017 - sudo systemctl start cassandra-b after it had failed for unknown reason
19:19 XioNoX: repool ulsfo
19:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.16
18:49 XioNoX: rollback vrrp priority changes on cr4-ulsfo
18:48 XioNoX: rollback bump cr4-ulsfo<->cr1-codfw ospf metric
18:39 XioNoX: restart cr4-ulsfo
18:38 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
18:38 XioNoX: bump cr4-ulsfo<->cr1-codfw ospf metric
18:26 XioNoX: failover VRRP master to cr3-ulsfo
18:25 XioNoX: activate transit BGP groups on cr3-ulsfo
18:25 XioNoX: rollback - bump cr3-ulsfo<->cr2-eqord ospf metric
18:15 XioNoX: restart cr3-ulsfo
18:15 brennen@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache (duration: 18m 23s)
18:14 XioNoX: bump cr3-ulsfo<->cr2-eqord ospf metric
18:07 XioNoX: deactivate transit BGP groups on cr3-ulsfo
18:06 XioNoX: failover VRRP master to cr4-ulsfo
17:56 brennen@deploy1001: Started scap: testwiki to php-1.34.0-wmf.16 and rebuild l10n cache
17:55 brennen@deploy1001: Pruned MediaWiki: 1.34.0-wmf.11 (duration: 07m 40s)
17:53 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@af8b471]: Update mobileapps to ec865a7 (duration: 05m 45s)
17:47 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@af8b471]: Update mobileapps to ec865a7
17:20 XioNoX: depool ulsfo for routers upgrades - T227886
17:15 godog: use wezen.codfw.wmnet instead of syslog.codfw.wmnet for production hosts
17:00 thcipriani: gerrit restart incoming -- gc time increasing causing timeouts
16:46 XioNoX: adding port 9105 to term prometheus in filter labs-in4 - T225296
16:41 cdanis@deploy1001: Synchronized wmf-config/etcd.php: Icf57a2ab enable dbctl on all mw canaries (duration: 00m 47s)
16:37 brennen: cutting 1.34-wmf.16
16:33 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
16:22 godog: bounce rsyslog on centrallog1001 - T199406
15:41 elukey@cumin1001: END (FAIL) - Cookbook sre.kafka.roll-restart-brokers (exit_code=99)
15:28 legoktm@deploy1001: Finished scap: Rebuild l10n cache for SecureLinkFixer message (duration: 18m 51s)
15:21 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
15:18 jijiki: Disable puppet on mw1347 and mw2136, depool and pool back - T219150
15:13 elukey: remove snakebite from buster-wikimedia (not needed anymore)
15:09 legoktm@deploy1001: Started scap: Rebuild l10n cache for SecureLinkFixer message
15:06 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer everywhere (T200751) (duration: 00m 47s)
14:48 cdanis@deploy1001: Synchronized wmf-config/etcd.php: I17c55428 dbctl canary on mwdebug*, mw1261, mw1276 (duration: 00m 47s)
14:36 cdanis@deploy1001: Synchronized wmf-config/CommonSettings.php: Ie98a8d9e dbctl canary on mwdebug1001 (duration: 00m 47s)
14:34 cdanis@deploy1001: Synchronized wmf-config/etcd.php: Ie98a8d9e dbctl canary on mwdebug1001 (duration: 00m 47s)
14:33 cdanis@deploy1001: Synchronized docroot/noc/db.php: Ie98a8d9e dbctl canary on mwdebug1001 (duration: 00m 48s)
14:14 fsero: refreshing calico policy from code in eqiad
14:13 fsero: refreshing calico policy from code in codfw
13:38 marostegui: Move db2094:3315 from db2066 to db2128 - T228258
13:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
13:13 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
12:36 marostegui@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8824', previous config saved to /var/cache/conftool/dbconfig/20190730-123630-marostegui.json
12:21 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
12:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
12:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
12:13 jbond42: while testing some changes on the puppet master a bad config caused a small blip in catalouge compilation
12:09 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
11:34 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
11:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
11:30 jijiki: Depool mw1348 and pool back
11:28 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
09:49 elukey: upload python-snakebite to buster-wikimedia (rebuilt for buster from source)
09:31 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
09:27 elukey: add thirdparty/cloudera to buster-wikimedia and import packages to it (pull from the jessie component)
08:17 marostegui: Stop MySQL on db2038 T227565
08:10 marostegui: Remove db2038 from tendril and zarcillo T227565
08:04 akosiaris: delete orespoolcounter{1,2}00{1,2} T227640
08:04 akosiaris: revoke and deactivate orespoolcounter{1,2}00{1,2} T227640
07:30 godog: bounce hhvm on mw1221
05:36 marostegui: Disable puppet on cumin2001 to investigate a backups issue
05:25 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/jobqueue/jobs/AssembleUploadChunksJob.php: T228929 (duration: 00m 46s)
05:24 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/api/ApiUpload.php: T228929 (duration: 00m 47s)
05:23 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/upload/UploadBase.php: T228929 (duration: 00m 48s)
05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s8 ready only T227062 (duration: 00m 24s)
05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s8 master eqiad from db1071 to db1104 T227062 (duration: 00m 24s)
05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s8 on read-only T227062 (duration: 00m 26s)
05:00 marostegui: Starting s8 failover from db1071 to db1104 - T227062
04:48 eileen: civicrm revision changed from 1d57aca19c to 218328b29d, config revision is 3f960c48f6
04:15 marostegui: Start pre-steps for s8 primary master failover - T227062
02:37 eileen: civicrm revision changed from 121feb5d53 to 1d57aca19c, config revision is 3f960c48f6

2019-07-29

23:37 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ams
23:36 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/: Make welcome and discovery tours fully mutually exclusive (T229044) (duration: 00m 48s)
23:26 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in ulsfo
23:22 XioNoX: replace export policy BGP_Wikimedia_own_space with BGP_Wikimedia_no_dfz in Dallas
22:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/MessageCache.php: T208897 - fa817b0 (duration: 00m 47s)
22:32 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: T214674 - bfcaf0c26d6 (duration: 00m 48s)
22:28 XioNoX: roll out anycast DNS and syslog to all network devices - T228190
22:16 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter/: T214674 - 940955e (duration: 00m 48s)
22:05 XioNoX: replace ulsfo network devices' DNS target with 10.3.0.1
22:00 Krinkle: krinkle@deploy1001: Dirty git status on extensions/AbusesFilter and extensions/CheckUser in php-1.34.0-wmf.15
21:43 XioNoX: replace ulsfo network devices' syslog target with syslog.anycast.wmnet
19:22 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c3ffbee]: Weekly deploy (duration: 11m 42s)
19:10 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c3ffbee]: Weekly deploy
18:23 Urbanecm: Morning SWAT done
18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Rename Image-reviewer to image-reviewer on fawiki (T216406) (duration: 00m 47s)
18:19 Urbanecm: Run mwscript migrateUserGroup.php --wiki=fawiki Image-reviewer image-reviewer (T216406)
18:18 XioNoX: switch traffic to the GTT link between Ashburn and Amsterdam (set GTT metric to 820 vs. 1820 before) - T228827
18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add several rights to eliminators in fawiki (T176553, 2/2) (duration: 00m 47s)
18:08 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Add several rights to eliminators in fawiki (T176553, 1/2) (duration: 00m 47s)
18:04 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/AbuseFilter: SWAT: Initialize user-defined variables during shortcircuit (T214674) (duration: 00m 49s)
17:37 ejegg: updated payments-wiki config to a7dacbf8e9
17:08 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-anycast-healthchecker
17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia python3-json-logger
17:05 XioNoX: reprepro copy buster-wikimedia stretch-wikimedia anycast-healthchecker
16:47 godog: add anycast syslog to wezen/centrallog1001
16:19 elukey: manually stopped the sre.kafka.roll-restart-brokers cookbook after 4 brokers restarts since the sleep interval (10mins) is too tight.
16:17 elukey@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97)
15:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Retry - Produce resource_change stream to eventgate-main - T211248 (duration: 00m 46s)
15:34 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
15:30 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce resource_change stream to eventgate-main - T211248 (duration: 00m 47s)
14:35 papaul: shutting down pc2010 for maintenance
13:57 cdanis@cumin1001: dbctl commit of MediaWiki config (dc=all), diff saved to 'https://phabricator.wikimedia.org/P8816', previous config saved to /var/cache/conftool/dbconfig/20190729-135730-cdanis.json
13:30 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
13:28 marostegui: Stop MySQL on pc2010 - T227552
13:23 arturo: T228870 reboot cloudvirt1007.eqiad.wmnet for kernel updates
13:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:23 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
13:09 arturo: T228870 reboot cloudvirt1006.eqiad.wmnet for kernel updates
13:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:09 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
13:01 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
12:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2128 into s5 api T221533 (duration: 00m 47s)
12:45 marostegui: Provision db2128 into s5 codfw - T228969
12:44 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2128 into s5 api T221533 (duration: 00m 47s)
12:39 arturo: T228870 reboot cloudvirt1005.eqiad.wmnet for kernel updates
12:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
12:20 arturo: T228870 reboot cloudvirt1004.eqiad.wmnet for kernel updates
12:20 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:20 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
11:58 arturo: T228870 reboot cloudvirt1003.eqiad.wmnet for kernel updates
11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:57 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
11:36 arturo: icinga downtime toolschecker for 6h
11:31 arturo: T228870 reboot cloudvirt1002.eqiad.wmnet for kernel updates
11:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
11:14 arturo: T228870 reboot cloudvirt1001.eqiad.wmnet for kernel updates
11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
11:13 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
11:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
11:11 dcausse: EU SWAT done
11:10 dcausse@deploy1001: Synchronized wmf-config/SearchSettingsForWikidata.php: [cirrus] Use correct factory declaration for EntityFullTextQueryBuilder (duration: 00m 47s)
10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 47s)
10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 47s)
09:49 marostegui: Add db2128 to tendril and zarcillo - T228969
09:24 elukey@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99)
09:22 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
09:21 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
08:55 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
08:51 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
08:47 elukey: set mcrouter async behavior for codfw replication to all mw app/api servers (changes will be picked up when puppet runs on the hosts) - T225642
08:35 godog: temp stop puppet on cp hosts to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/525259
08:32 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
08:32 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
08:16 marostegui: Drop abuse_filter_log.afl_log_id in s7 eqiad - T226851
07:49 dcausse: elastic@eqiad force recovery of failed shards (eswiki stuck)
07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2038 from config T221533 (duration: 00m 46s)
07:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2038 from config T221533 (duration: 00m 50s)
07:18 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
06:45 akosiaris: poweroff orespoolcounter{1,2}00{1,2} for removal T227640
06:37 _joe_: restarted php7.2 on mwdebug1002, low opcache
06:36 _joe_: restarted coherence report on netmon1002, it failed earlier this morning
06:31 _joe_: restarting nrpe on restbase-dev1006 T224260
06:30 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 in preparation for Tuesday 30th failover in s8 (duration: 00m 54s)
05:18 marostegui: Drop Drop abuse_filter_log.afl_log_id from s7 codfw with replication (this will cause lag in s7 codfw) - T226851
05:05 marostegui: Remove db1072 from tendril and zarcillo T228956

2019-07-28

15:13 arturo: disable 1m load average check in icinga for labstore1007 for 24h

2019-07-27

17:39 bd808: Updated profile & images for @wikimediatech twitter account
14:49 godog: bounce rsyslog on wezen / centrallog1001
06:43 elukey: powercycle mw1300 - no ssh, serial com2 stuck with no root loging available
00:35 mutante: restbase-dev1006 - starting nagios-nrpe-server
00:33 mutante: wikitech-static - fix /etc/letsencrypt/renewal/wikitech-static.wikimedia.org.conf - remove webroot_map and and line for status.wm.org that caused errors when doing a renewal dry-run. now dry run finishes succesfully and we are using "webroot" authenticator and not "apache" anymore. This should have resolved what this ticket was about. No more Apache kills/restarts on renewal. (T214640)

2019-07-26

23:51 mutante: restbase-dev1006 - manually booting into PXE to debug boot issue / start Debian installer (T224260)
23:27 mutante: restbase-dev1006 - does not boot - hangs at "attempting to boot from C:" - entering "Legacy BIOS One Time Boot Menu" (T224260)
21:52 mutante: restbase-dev1006 - power reset via mgmt
20:48 mutante: restbase-dev1006 - rebooting from busybox shell where it was idling since a failed reimage attempt
20:22 foks: reset password for Sharons36
18:43 XioNoX: remove lvs100[1-6] switch config from asw2-d-eqiad - T224223
18:33 mutante: deploy2001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
18:32 mutante: deploy1001 - delgroup gerrit-root (follow-up to https://gerrit.wikimedia.org/r/c/operations/puppet/+/525444)
18:20 XioNoX: remove lvs100[1-6] switch config from asw2-c-eqiad - T224223
18:08 XioNoX: remove lvs100[1-6] switch config from asw2-b-eqiad - T224223
18:01 XioNoX: remove lvs100[1-6] switch config from asw2-a-eqiad - T224223
17:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
17:37 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow/includes/Search/Iterators/TopicIterator.php: T229114 make orderUUID public, as it is needed by other classes for Dumps (duration: 00m 47s)
15:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: de08224 (duration: 00m 48s)
15:02 Krinkle: krinkle@deploy1001: php-1.34.0-wmf.15 is still dirty on extensions/CheckUser
14:23 ema: re-enable puppet on cache nodes T229091
14:10 ema: disable puppet on cache nodes T229091
13:41 fsero: sudo -i reprepro --ignore=wrongdistribution include stretch-wikimedia /home/fsero/envoyproxy_1.11.0~wmf1_amd64.changes
13:41 jeh: updated labstore100[67].wikimedia.org performance scaling_governor T225713
13:07 jeh: rebooting labstore1006.wikimedia.org for updates T224228
13:00 Urbanecm: Change user email assigned to SUL user Stansfield (T229004)
12:45 jeh: rebooting labsdb1012.eqiad.wmnet for updates T224228
12:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 vslow T221533 (duration: 00m 50s)
09:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db2123 into s5 T228969 (duration: 00m 47s)
09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db2123 into s5 T228969 (duration: 00m 48s)
08:42 marostegui: Add db2123 to tendril and zarcillo - T228969
06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 47s)
06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 47s)
05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 46s)
05:40 marostegui: Stop MySQL on db1072 to get it ready for decommission - T228956
05:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 48s)
05:05 marostegui: Stop MySQL on db1096 for upgrade
05:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 00m 49s)
00:53 ejegg: re-enabled dedupe_civicrm_contacts and major_gifts_addresses fundraising jobs
00:51 ejegg: re-enabled donations queue consumer
00:15 ejegg: disabled donations queue consumer

2019-07-25

23:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/GrowthExperiments/extension.json: Fix over-eager GrowthExperiments popups (T229045) (duration: 00m 50s)
23:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Delete Image-reviewer group from commonswiki for good" (T228098) (duration: 00m 47s)
23:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add sju, sjd, and rmf to wmgExtraLanguageNames (T226701) (duration: 00m 47s)
23:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor in namespace Wikipédia on Slovak Wikipedia (T229014) (duration: 00m 48s)
22:34 ejegg: re-enabled donations queue consumer
22:07 bblack: lvs1013 - restart pybal for resolv.conf changes - T228190
22:04 bblack: lvs1014 - restart pybal for resolv.conf changes - T228190
22:02 bblack: lvs1015 - restart pybal for resolv.conf changes - T228190
22:02 ejegg: turned off dedupe_civicrm_contacts fundraising job
21:59 bblack: lvs1016 - restart pybal for resolv.conf changes - T228190
21:47 bblack: primary high-traffic2 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - T228190
21:46 XioNoX: apply export BGP_Wikimedia_no_dfz to eqiad's Confed_esams - T227808
21:40 ejegg: turned off major_gifts_addresses fundraising job
21:38 bblack: primary high-traffic1 lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - T228190
21:07 bblack: backup lvses in codfw, esams, ulsfo: restart pybal for resolv.conf changes - T228190
20:54 hashar: Rebasing mediawiki/extensions/MobileFrontend@wmf/1.34.0-wmf.15 for a build/CI related change to package.json https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MobileFrontend/+/525632/
20:37 XioNoX: add prometheus-bird-exporter to stretch-wikimedia repo
20:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
20:15 robh@cumin1001: START - Cookbook sre.hosts.decommission
20:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
20:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
19:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, feeds timing out. (duration: 05m 34s)
19:53 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, feeds timing out.
19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
19:53 robh@cumin1001: START - Cookbook sre.hosts.decommission
19:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 3 (duration: 03m 14s)
19:49 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 3
19:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 2 (duration: 06m 33s)
19:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
19:44 robh@cumin1001: START - Cookbook sre.hosts.decommission
19:42 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016, take 2
19:42 ppchelko@deploy1001: Finished deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016 (duration: 13m 42s)
19:29 ppchelko@deploy1001: Started deploy [restbase/deploy@279cf27]: Set proper CSP headers for mobile-html T229016
19:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
19:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
19:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
19:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
19:01 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
18:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:36 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:19 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:04 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
17:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
17:58 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to 200a323 (T228938 T228287) (duration: 04m 39s)
17:53 mbsantos@deploy1001: Started deploy [mobileapps/deploy@11d9d4a]: Update service-mobileapp-node to 200a323 (T228938 T228287)
17:51 elukey: powercycle stat1007
17:44 volans: sudo cumin -s30 -b1 -m async 'A:wdqs-all and not A:wdqs-internal and not P{wdqs1009.eqiad.wmnet}' 'run-puppet-agent -e "volans - T228122 - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
17:33 volans: running sudo cumin -s30 -b1 -m async 'A:wdqs-internal' 'run-puppet-agent -e "volans - T228122 - deploying gerrit/524954"' 'systemctl restart wdqs-blazegraph'
17:18 volans: disabled puppet on A:wdqs-all, deploying gerrit/524954 - T228122
17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.rolling-restart-workers (exit_code=0)
17:01 elukey@cumin1001: START - Cookbook sre.hadoop.rolling-restart-workers
16:54 bblack: lvs5001 - restart pybal for resolv.conf change - T228190
16:53 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/WikibaseMediaInfo/resources/statements/: T228807 Fix formatValue abort handling (duration: 00m 48s)
16:52 jijiki: Rolling restart of hhvm across the fleet
16:50 bblack: lvs5002 - restart pybal for resolv.conf change - T228190
16:44 bblack: lvs5003 - restart pybal for resolv.conf change - T228190
16:19 jijiki: Disable puppet on mw* servers for 525156
15:52 jeh: rebooting cloudstore1008.wikimedia.org for updates T224228
15:41 jeh: rebooting cloudstore1009.wikimedia.org for updates T224228
15:41 nuria@deploy1001: Finished deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions (duration: 13m 40s)
15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
15:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
15:35 robh@cumin1001: START - Cookbook sre.hosts.decommission
15:32 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove redundant wgResourceLoaderStorageEnabled override (duration: 00m 50s)
15:27 nuria@deploy1001: Started deploy [analytics/refinery@f310917]: deploying refinery - migrations to hive2 actions
15:09 jeh: rebooting labstore1004.eqiad.wmnet for updates T224228
14:42 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2 (duration: 00m 19s)
14:42 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@87b25f2]: Convert oozie actions from hive to hive2
14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
14:22 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
14:06 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
14:02 moritzm: installing Java security updates on Druid servers
13:52 moritzm: installing Java security updates on AQS, Hadoop and Kafka/Jumbo servers
13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
13:49 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
13:42 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
13:39 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
13:38 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
13:35 robh: cloudvirt1015 offline for ram swap via T220853
13:20 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
13:19 fsero: recreating clusterrole deploy from helmfile in staging
13:09 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - T226851
13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.15
12:49 marostegui: Drop abuse_filter_log.afl_log_id in s4 codfw (lag will appear on codfw) - T226851
11:53 marostegui: Compress s3 wikis on labsdb1010 - T222978
11:03 arturo: update stretch-wikimedia/thirdparty/kubeadm-k8s on install1002 for T215531 (kubeadm 1.15.1)
10:53 moritzm: rebooting cloudvirt2003-dev
10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:35 moritzm: rebooting cloudvirt1024 for kernel update
10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:21 marostegui: Failover m1 from dbproxy1006 to dbproxy1001 - T227139
08:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:54 moritzm: rebooting cloudvirt2001-dev
08:32 Urbanecm: Password reset for SUL user Strejc
08:04 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw128[0-3].*
08:01 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad,name=mw12(6[89]|7[0-5]).*
08:01 _joe_: repooling mw1268-1275 in the appserver cluster
08:00 moritzm: rebooting cloudvirt2001-dev
07:59 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad,name=mw12(7[6-9|8[0-3]).*
07:59 _joe_: repooling mw1276-1283 in the API cluster
07:33 moritzm: rebooting cloudvirt2001-dev
07:23 marostegui: Upgrade MySQL on db1072
07:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
06:42 elukey: restart kafka* on kafka-jumbo1001 to pick up new openjdk-8 version
06:37 elukey: restart cassandra instances on aqs1004 to pick up new openjdk-8 version
06:34 elukey: add term eventgate to analytics-in4 on cr1/cr2-eqiad - T228882
05:31 twentyafterfour: set phabricator to read-write mode
05:30 marostegui: Failover m3 from db1072 to db1128 - T228243
05:30 twentyafterfour: phabricator set to read-only mode
04:51 marostegui: Start pre-failover steps on m3 T228243
02:02 XioNoX: remove peer AS63541 from cr1-eqsin

2019-07-24

23:46 nuria@deploy1001: Finished deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2 (duration: 13m 34s)
23:43 catrope@deploy1001: Synchronized php-1.34.0-wmf.15/extensions/Flow: Fix JS error when saving Flow board descriptions (T228818) (duration: 01m 01s)
23:42 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Fix JS error when saving Flow board descriptions (T228818) (duration: 01m 03s)
23:39 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable homepage for 50% of new users on arwiki (T228120) (duration: 00m 58s)
23:32 nuria@deploy1001: Started deploy [analytics/refinery@7d93398]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues). Try 2
23:30 nuria@deploy1001: Finished deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues) (duration: 18m 10s)
23:22 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage on arwiki (T228120) (duration: 00m 55s)
23:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Correct typo in arwiki help panel config (T228820) (duration: 00m 57s)
23:12 nuria@deploy1001: Started deploy [analytics/refinery@834db0a]: deploying refinery 0.0.96 (skipping 0.0.95 due to some jenkins/archiva issues)
22:41 thcipriani@: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
22:36 thcipriani@: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
22:28 thcipriani@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
21:22 mutante: <+icinga-wm> RECOVERY - Device not healthy -SMART- on restbase-dev1006 is OK: All metrics within thresholds. (T224260)
21:18 cscott@deploy1001: Finished deploy [parsoid/deploy@abd05ab]: Updating Parsoid to df1af404 (T227216, T226523, T226451) (duration: 18m 35s)
21:16 nuria@deploy1001: Finished deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95 (duration: 03m 54s)
21:12 nuria@deploy1001: Started deploy [analytics/refinery@58e64c1]: deploying refinery 0.0.95
21:03 ppchelko@deploy1001: Finished deploy [restbase/deploy@7911f65]: Store PCS endpoints T222384 (duration: 18m 18s)
21:00 cscott@deploy1001: Started deploy [parsoid/deploy@abd05ab]: Updating Parsoid to df1af404 (T227216, T226523, T226451)
20:45 ppchelko@deploy1001: Started deploy [restbase/deploy@7911f65]: Store PCS endpoints T222384
20:39 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to 1751a2e (duration: 04m 20s)
20:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints T222384 (duration: 01m 34s)
20:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@bf28187]: Rerender PCS endpoints T222384
20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@2e2ce6c]: Update mobileapps to 1751a2e
20:12 jeh: redirecting dumps.wikimedia.org back to labstore1007.wikimedia.org T224228
19:43 ejegg: updated fundraising CiviCRM from 875ab97742 to 121feb5d53
19:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SecureLinkFixer on group0 wikis - T200751 (duration: 00m 55s)
18:33 cmjohnson1: moving cloudvirt107 to 10G rack T228691
18:19 krinkle@deploy1001: Synchronized php-1.34.0-wmf.15/includes/cache/localisation/LocalisationCache.php: 31d99eb381bc (duration: 00m 54s)
18:15 ejegg: updated payments-wiki from a28ad541ed to 70b432d309
18:13 urandom: creating new restbase keyspaces -- T228804
18:12 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34.0-wmf.15
17:14 XioNoX: rollback failover master VIP of ae2.1202 inet6 away from cr1-eqiad - T226782
17:10 XioNoX: Add mr1-codfw<->cr1/2-codfw vlan/link config on asw-a-codfw - T228112
16:44 jijiki: Rolling puppet-enable and apache reload of jobrunners in codfw
16:12 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
16:12 bblack: re-pooling recdns on dns1001 via confctl - T226782
16:11 bblack: lvs1014 - restore puppet and resolv.conf contents, restart pybal
16:10 bblack: dns1001 - restart recursor and re-enable puppet - T226782
16:07 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: T228720 make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
16:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: T228720 make XmlDumpwriter more resilient to blob store corruption (duration: 00m 55s)
15:59 bblack: dns1001 - puppet disable, stop recursor service to kill anycast advert - T226782
15:59 bblack: lvs1014 - puppet disable, remove dns1001 from resolv.conf, restart pybal - T226782
15:58 XioNoX: failover master VIP of ae2.1202 inet6 away from cr1-eqiad - T226782
15:56 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
15:56 bblack: depooling recdns on dns1001 via confctl - T226782
15:56 bblack: depooling recdns on dns1001 via confctl
15:47 jijiki: Rolling puppet-enable and apache reload of jobrunners in eqiad
15:44 jeh: rebooting labstore1007.wikimedia.org for updates T224228
15:42 jijiki: Disable puppet on jobrunners for 525306
15:11 herron: resume ingesting [message] =~ /^SlowTimer/ logs on logstash1007 (as a canary)
15:02 XioNoX: re-enable vc link between asw2-a6 and asw2-a7 - T228823
14:58 jeh: unmounting dumps NFS clients from labstore1007.wikimedia.org T224228
14:54 XioNoX: cleared vc ports stats on asw2-a-eqiad - T228823
14:43 marostegui: Drop abuse_filter_log.afl_log_id in s5 eqiad - T226851
14:40 marostegui: Drop abuse_filter_log.afl_log_id in s5 codfw (lag will appear on codfw) - T226851
14:31 tarrow@: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
13:49 robh: rebooting cloudvirt1015 into OS, memory error confirmed. new memory replacement dispatch entered via T220853
13:31 marostegui: Drop abuse_filter_log.afl_log_id in s2 eqiad - T226851
13:25 robh: rebooting cloudvirt1015 into memtest for dell support repair via T220853
13:06 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.15 (duration: 00m 54s)
13:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.15
12:19 marostegui: Stop haproxy on dbproxy1004 and dbproxy1009 (m4 - eventlogging) - T228768
11:23 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable FileImporter source wiki edits (T228851) (duration: 00m 54s)
11:12 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove Content Translation event logging config (part 2/2) (duration: 00m 54s)
11:10 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove Content Translation event logging config (part 1/2) (duration: 00m 59s)
10:04 marostegui: Drop abuse_filter_log.afl_log_id from labswiki (wikitech) and labtestwiki - T226851
09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1082 (duration: 00m 55s)
08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 into API after upgrade (duration: 00m 55s)
08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 after upgrade (duration: 00m 54s)
08:40 marostegui: Stop MySQL on db1082 for upgrade
08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for upgrade (duration: 00m 57s)
08:35 marostegui: Drop abuse_filter_log.afl_log_id in s2 codfw (lag will appear on codfw) - T226851
07:58 marostegui: Drop abuse_filter_log.afl_log_id from wikidata in eqiad - T226851
07:21 marostegui: Stop MySQL on db1117:3322 to check dbproxy1013 notifications - T202367
07:10 marostegui: Deploy grants for dbproxy1013 in m2 - T202367
05:00 marostegui: Stop puppet on dbprov2001 to generate s5 mysqldump manually
04:52 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: T227700 (duration: 00m 54s)
04:51 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)
04:50 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/MediaWiki.php: T227700 (duration: 00m 53s)
04:49 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)
04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: T227700 (duration: 00m 54s)
04:45 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: T227700 (duration: 00m 54s)
04:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/MediaWiki.php: T227700 (duration: 00m 54s)
04:40 tstarling@deploy1001: Synchronized php-1.34.0-wmf.15/includes/specials/SpecialGoToInterwiki.php: (no justification provided) (duration: 00m 56s)
03:41 tstarling@deploy1001: Synchronized w/fatal-error.php: Adding post-send exception test for T228462 (duration: 00m 54s)
03:39 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adding DeferredUpdates log channel (T228462) (duration: 00m 56s)
02:01 eileen: payments-wiki revision changed from 224c6b2d7b to a28ad541ed, config revision is 8dcb77cf22

2019-07-23

23:44 eileen: civicrm revision changed from 88e9f24893 to 875ab97742, config revision is 4006d3bdc5
23:43 shdubsh: reverting logstash mitigations and re-enable puppet
23:42 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/diff/DifferenceEngine.php: T228766 Don't double wrap rollback links (duration: 00m 56s)
23:31 mutante: mw1267 - rm -rf /srv/mediawiki/php-1.33.0-wmf.23 ; rm -rf /srv/mediawiki/php-1.32.0-wmf.3 ; scap pull
23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
22:36 mutante: rolling out scap 3.11.1-1 on mw-eqiad servers
22:14 mutante: continuing rollout of new scap version 3.11.1-1, starting with kafka-all followed by other cumin-alias groups (T228328)
22:06 herron: puppet temporarily disabled on eqiad/codfw logstash collectors while catching up with backlog. see /etc/logstash/conf.d/01-filter_temp_drops.conf
21:52 herron: logstash - temporarily dropping logs matching [message] =~ /^SlowTimer/ due to UTF-8 parsing errors that are stopping the logstash processing pipeline. will re-enable after logstash has caught up with the backlog
20:59 shdubsh: temporarily disable input-kafka-rsyslog-shipper and drop memcached logs on logstash nodes
20:08 paravoid: asw2-a-eqiad: request virtual-chassis vc-port set interface member 6 vcp-255/1/0 disable
19:58 eileen: process-control config revision is 4006d3bdc5 - disabled drush fill donor totals job
19:49 mutante: mwdebug1002 - restarting hhvm - mw1312 - restarted apache
19:44 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 and 1004
19:40 mutante: restarting hhvm on mw1312
19:28 cdanis: depool all appservers in eqiad A7 cdanis@cumin1001.eqiad.wmnet ~ 🍵 sudo cumin 'mw12[67-83]*' 'depool'
19:11 bblack: repool lvs1013 - T227143
19:10 bblack: repool cp1077 + cp1078 - T227143
19:09 elukey: depool mw1261 for investigation
19:06 herron: restarting logstash on logstash100[789]
18:53 robh: mw1271 had power loss event due to pdu swap via T227143
18:45 mutante: rolling out scap 3.11.1-1 on all mw codfw servers (T228328)
18:43 mutante: rolling out scap 3.11.1-1 on mw canary servers (T228328)
18:13 robh: started depooling servers in a7-eqiad for pdu work via T227143
18:11 cdanis: depool mw1267
18:10 cdanis: cdanis@mw1267.eqiad.wmnet /srv/mediawiki ☕ scap pull
18:09 cdanis: cdanis@mw1267.eqiad.wmnet ~ ☕ sudo apt install python-concurrent.futures
18:08 jforrester@deploy1001: Synchronized php-1.34.0-wmf.15/includes/export/XmlDumpWriter.php: T228720 Make XmlDumpwriter resilient to blob store corruption (duration: 00m 54s)
18:07 James_F: Belay that, error on mw1267.
18:06 James_F: Sync error on mw1314.eqiad.wmnet, No module named concurrent.futures
18:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: T228720 Make XmlDumpwriter resilient to blob store corruption (duration: 00m 57s)
18:05 bblack: lvs1013 - disable puppet and stop pybal - T227143
18:04 bblack: depool cp1077 + cp1088 - T227143
18:03 cdanis@deploy1001: Synchronized docroot/noc/db.php: 8def4af1d noc db.php: include readonly status & group loads (duration: 00m 55s)
17:52 moritzm: installing Java security updates on kafka/main and Logstash servers
17:38 ppchelko@deploy1001: Finished deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema T226522, step 2 (duration: 01m 37s)
17:36 ppchelko@deploy1001: Started deploy [changeprop/deploy@6c5c0a3]: Switch internal events to the new schema T226522, step 2
17:00 ppchelko@deploy1001: Finished deploy [changeprop/deploy@894f735]: Switch internal events to the new schema T226522 (duration: 01m 30s)
16:58 ppchelko@deploy1001: Started deploy [changeprop/deploy@894f735]: Switch internal events to the new schema T226522
16:22 godog: pool prometheus1003 - T227139
15:46 robh: side b of a5-eqiad swapping pdu via T227141
15:14 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
15:08 _joe_: uninstalling php-pear, php-mail, php-mail-mime from mw1267 T195364
14:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248, attempt 2 (duration: 13m 08s)
14:39 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248, attempt 2
14:14 robh: a3-eqiad pdu swap taking place now via T227139
13:47 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
13:45 godog: depool restbase1016 restbase1019 restbase1011 restbase1010 prometheus1003 ahead of PDU work - T227139
13:45 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
13:44 moritzm: installing Java security updates on furud/flerovium
13:43 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
13:27 jeh: dumps switching active vps to labstore1006 T224228
13:17 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.15
13:06 marostegui: Drop abuse_filter_log.afl_log_id from s8 codfw (lag will happen on codfw s8) - T226851
12:33 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (duration: 29m 46s)
12:04 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache
12:02 akosiaris: drain kubernetes1001. T227139
12:01 akosiaris: empty ganeti1007 from running instances. T227139
11:59 akosiaris: enable disable poolcounter1003, switchover codfw poolcounters T224572
11:58 tarrow: EU SWAT finished
11:58 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 46s)
11:56 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T214902 Fix missing /termbox in SSRTermboxServerUrl (duration: 00m 44s)
11:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.10 (duration: 07m 55s)
11:43 jijiki: restart php-fpm on mwdebug*
11:25 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T214902 Enable termbox on testwikidatawiki (duration: 01m 37s)
11:08 jijiki: enable puppet on jobrunners
10:17 marostegui: Drop abuse_filter_log.afl_log_id from db1096:3316, db1139:3316 and dbstore1005:3316 T226851
10:02 moritzm: installing Java security updates on notebook/stat hosts
09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
09:59 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:53 marostegui: Drop abuse_filter_log.afl_log_id from s6 codfw with replication (this will cause lag in s6 codfw) - T226851
09:51 akosiaris: enable poolcounter1005, disablepoolcounter1001 T224572
09:51 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool into API db1100 after upgrade (duration: 00m 46s)
09:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool into API db1100 after upgrade (duration: 00m 47s)
09:09 akosiaris@deploy1001: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 00m 47s)
09:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1100 after upgrade (duration: 00m 46s)
08:34 marostegui: Upgrade db1100
08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 for upgrade (duration: 00m 53s)
08:08 marostegui: Stop MySQL on db2044 to test dbproxy2002 notifications - T202367
07:31 marostegui: Deploy grants for dbproxy2002 on m2 - T202367
04:52 eileen: civicrm revision changed from d951b07ce3 to 88e9f24893, config revision is f7b7622e27
04:43 marostegui: Failover m1 from dbproxy1001 to dbproxy1006 T227139
00:06 Urbanecm: slwiki updateCollection.php completed (T208984)

2019-07-22

23:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 524952 Increase hewiki rollback limit for patrollers to 50/60 (duration: 00m 48s)
23:54 Urbanecm: Run mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
23:42 Urbanecm: All updateCollation.php runs completed, except the one for slwiki (T208984)
23:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add flood group to ptwiki (T228521) (duration: 00m 47s)
23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwiktionary --previous-collation=uppercase (T208984)
23:39 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiversity --previous-collation=uppercase (T208984)
23:37 Urbanecm: Run mwscript updateCollation.php --wiki=slwikisource --previous-collation=uppercase (T208984)
23:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix comment in IS.php (noop, T227000) (duration: 00m 46s)
23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikiquote --previous-collation=uppercase (T208984)
23:34 Urbanecm: Run mwscript updateCollation.php --wiki=slwikibooks --previous-collation=uppercase (T208984)
23:33 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Fix "Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource" (T227000) (duration: 00m 47s)
23:29 Urbanecm: Run mwscript updateCollation.php --wiki=slwiki --previous-collation=uppercase (T208984)
23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgCategoryCollation to uca-sl-u-kn on Slovene projects (sl) (T208984) (duration: 00m 47s)
22:11 mutante: dropped zero.wikiMedia.org from DNS (T187716)
21:50 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for T227416 (duration: 00m 46s)
21:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate T211248 (duration: 13m 01s)
21:35 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Temporary make account creation limits more restrictive" (duration: 00m 47s)
21:27 eileen: civicrm revision is d951b07ce3, config revision is f7b7622e27
21:25 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Rollback: Switch event production to eventgate T211248
21:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248 (duration: 16m 14s)
21:21 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
21:20 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
21:19 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
21:17 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
21:05 eileen: civicrm revision changed from f932e56cd2 to d951b07ce3, config revision is f7b7622e27
21:04 ppchelko@deploy1001: Started deploy [restbase/deploy@ea10fa5]: Switch event production to eventgate T211248
20:04 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0be6045]: Weekly deploy (duration: 18m 42s)
19:46 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0be6045]: Weekly deploy
19:09 ppchelko@deploy1001: Finished deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate T211248 (duration: 01m 31s)
19:07 ppchelko@deploy1001: Started deploy [changeprop/deploy@3f8aad2]: Switch revision-score to eventgate T211248
18:59 elukey: repool scb1001 after pdu maintenance
18:59 herron: repooling kafka1001 T227140
18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel for 50% of new users on arwiki (T226729) (duration: 00m 47s)
18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Trying the last sync again, because it's appearing inconsistently (duration: 00m 47s)
18:15 thcipriani: restarting gerrit due to T224448
18:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments help panel on arwiki (T226729) (duration: 00m 48s)
18:00 elukey: arm keyholder on netmon1002 after power loss
17:35 elukey: depool scb1001 for PDU work T227140
17:22 herron: depooling kafka1001 for PDU work T227140
17:17 nuria@deploy1001: Finished deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs (duration: 14m 51s)
17:02 nuria@deploy1001: Started deploy [analytics/refinery@d889893]: deploying refinery jar bump forwebrequest/load jobs
17:02 jijiki: enable puppet on all jobrunners
16:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T87899 Use wfLoadExtension for Collection rather than deprecated entry point (duration: 00m 47s)
16:48 jforrester@deploy1001: Synchronized wmf-config/extension-list: Load Collection i18n via extension.json directly (duration: 00m 47s)
16:36 jeh: redirecting dumps.wikimedia.org dns to labstore1006 T224228
15:49 jijiki: Rolling depool and pool of mw1293, mw1294, mw1295, mw1296, mw1299 - T219148
15:38 marostegui: Stop mysql and power off pc2010 for on-site maintenance - T227552
15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Wikibase/lib/WikibaseLib.php: T227814 Wikibase: Define $wgMessagesDirs in WikibaseLib PHP entry point (duration: 00m 48s)
15:27 jijiki: Depool mw1300 and pool back
15:24 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: T228614 XmlDumpWriter: don't load revision text content unless requested to (duration: 00m 48s)
15:17 jijiki: Disable puppet on jobrunners to enable php7_only
14:55 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
14:53 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
14:44 otto@: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
14:38 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
14:30 otto@: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
14:30 ottomata: deploying refactored eventgate chart using eventgate-wikimedia image to eventgate-* services - T226668
14:28 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
13:12 kart_: Updated cxserver to 2019-07-17-074415-production (T227553, T216812)
13:07 kartik@deploy1001: scap-helm cxserver finished
13:07 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
13:07 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
13:02 kartik@deploy1001: scap-helm cxserver finished
13:02 kartik@deploy1001: scap-helm cxserver cluster codfw completed
13:02 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
13:00 kartik@deploy1001: scap-helm cxserver finished
13:00 kartik@deploy1001: scap-helm cxserver cluster staging completed
12:59 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
12:58 marostegui: Stop MySQL on db1117:3321 to test dbproxy1014 (replacement for dbproxy1006) on m1 - T202367
12:22 moritzm: installing debian-archive-keyring Stretch update (SUA 164)
11:20 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable wgNamespacesWithSubpages on main NS for kowikiversity (T228481) (duration: 00m 54s)
11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable FileImporter source wiki edit and delete, (remove labs customizations) (T225617, T226532) (duration: 00m 54s)
11:13 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable FileImporter source wiki edit and delete (T225617, T226532) (duration: 00m 56s)
10:55 jijiki: Enable puppet on jobrunners
10:27 jijiki: Depool and pool mw1300
10:23 jijiki: Disable puppet on jobrunners for 524336 - T219148
10:21 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
10:20 fsero: deploy coredns in staging T226516
09:47 elukey: failover + restart of Hadoop HDFS namenode on an-master1001 to apply GC settings - T228620
09:40 marostegui: Deploy grants on m1 to allow connections from dbproxy1014 - T202367
09:32 elukey: restart hadoop hdfs namenode on an-master1002 to apply new GC settings - T228620
08:33 marostegui: Rename table enwiki.math on db2116 T196055
07:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1134 after schema change T226851 (duration: 00m 51s)
07:54 elukey: sudo -i depool on elastic1046 - broken disk (srv partition not available) - T228606
07:40 elukey: systemctl reset-failed restbase on restbase1007->15 (decommed nodes)
07:27 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db1134 T226851
07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1134 for schema change T226851 (duration: 00m 56s)
07:17 moritzm: installing openjdk-11 security updates
06:47 marostegui: Stop MySQL on db2062 to test dbproxy2001 notification T202367
06:23 elukey: restart hadoop-hdfs-namenode on an-master1002 to verify if out-of-the-ordinary GC activity
06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1104 from s8 API (duration: 00m 55s)
05:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 into API (duration: 00m 58s)
05:24 marostegui: Compress more tables on labsdb1009 - T222978
04:48 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/extension.json: fixing UBN T228465 (duration: 00m 54s)
04:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/maintenance/loadExitNodes.php: fixing UBN T228465 (duration: 00m 54s)
04:44 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/TorBlock/includes/TorExitNodes.php: fixing UBN T228465 (duration: 00m 56s)
04:17 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: fix UBN bug T227772 (duration: 00m 56s)

2019-07-21

01:06 Urbanecm: Deployed patch for T228574

2019-07-19

22:36 mutante: phab2001 - switching apache to php-fpm and worker instead of mpm-prefork (to match phab1001) (T190568 T137928 T190572)
21:57 eileen: update process control process-control config revision is c913a5f261
21:34 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
21:25 eileen: civicrm revision changed from 21d3c5a3fc to f932e56cd2, config revision is 9f7eba2193
19:35 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:35 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
19:34 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:07 eevans@: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
19:02 eevans@: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
17:53 cdanis@deploy1001: Synchronized docroot/noc/db.php: noc: db.php: support ?dc=codfw, and cleanups (duration: 00m 56s)
17:44 XioNoX: change netflow target port to 2055 in eqiad
16:17 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
15:55 moritzm: rebooting mw2164 for a test
15:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:40 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
15:27 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
15:26 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
15:22 fsero: deploy coredns in staging T226516
15:03 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
14:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Collection/Collection.php: 90eed0fad / T87899 (duration: 00m 54s)
14:35 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/Collection.php: 66ce154 / T87899 (duration: 00m 56s)
14:29 ariel@deploy1001: Finished deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps (duration: 00m 03s)
14:29 ariel@deploy1001: Started deploy [dumps/dumps@71e62ee]: better exception handling for misc dumps
14:28 Krinkle: krinkle@deploy1001: Untracked file found in php-1.34-wmf.13
14:28 Krinkle: krinkle@deploy1001: extensions/CheckUser is dirty in php-1.34-wmf.13 and php-1.34-wmf.14
13:30 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
13:04 moritzm: installing bzip2 security updates on jessie
12:28 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
10:56 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:55 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:53 root@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:53 fsero: deploying calico from helmfile in staging T227775
10:35 jijiki: enable puppet on jobrunners
10:26 jijiki: disable puppet on jobrunners for 523908
08:37 ariel@deploy1001: Finished deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default (duration: 00m 04s)
08:37 ariel@deploy1001: Started deploy [dumps/dumps@440faa0]: more error reporting for stubs/abstracts/pagelogs; more public table dumps by default
08:36 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
08:24 gehel: repooling wdqs2004 - T228122
08:22 gehel: repooling wdqs2003 - T228122
08:20 vgutierrez: restart pybal on lvs2003
08:16 vgutierrez: restart pybal on lvs2006
08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1109 into API (duration: 00m 54s)
07:57 moritzm: installing idp1001 T228403
07:38 moritzm: rebooting tungsten for kernel update
07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:03 elukey: restart php-fpm on mw1330 - op-cache hit ratio low
07:02 jynus: reloading dbproxy1004/9
07:01 elukey: depool wdqs2004 from all services (waiting for maintenance)
06:32 legoktm@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: T225199 (duration: 00m 55s)
06:30 legoktm@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/EventBus/includes/EventBus.php: Add more debugging to figure out which events are invalid: T225199 (duration: 00m 55s)
06:15 elukey: clear opcache on mwdebug*
05:26 fsero: repool ms-fe2005 - T228196
05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2116 (duration: 00m 55s)
04:11 eileen: I think I didn't push the turn it on commit - tried again process-control config revision is 9f7eba2193
03:03 eileen: process-control config revision is 7598dc1bf9 (jobs reenabled)
01:52 XioNoX: enable outbound sampling on eqiad's router
00:52 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add even more severe rate limits for eswikiquote and some other, smaller wikis (T227416) (duration: 00m 58s)
00:38 mutante: mwmaint2001 - puppet fails - not removing a bunch of log dirs for maintenance crons
00:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
00:08 eileen: process-control config revision is 7598dc1bf9 - jobs disabled
00:04 mutante: install1002 - exported indices for new scap version - copied back from buster to stretch - upgraded scap version on mw2250 - scap pull now works and starts to rsync (T228482, T228328, T226948)

2019-07-18

23:50 mutante: built new scap version 3.11.1-1 on boron, copied to install1002, imported package with reprepro, copied from stretch to jessie and buster (T228482)
23:22 Lucas_WMDE: Evening SWAT done
23:17 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Configure Citoid+Wikibase integration on Beta (production no-op) (T228411) (duration: 00m 54s)
23:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Set $wgWBRepoSettings[enableRefTabs] in Wikibase.php (T228414) (duration: 01m 16s)
23:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Define settings for Citoid+Wikibase integration (T228414) (duration: 00m 55s)
22:23 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=wdqs1008.eqiad.wmnet
22:16 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
22:00 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
21:49 bd808: Cleaned up stale striker logs on labweb1001 and labweb1002. Logs go to journald now so log rotate is not triggered to rotate out logs from before that change.
21:42 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
21:36 bd808@deploy1001: Finished deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models (T228222, T228332) (duration: 01m 13s)
21:34 bd808@deploy1001: Started deploy [striker/deploy@91594df]: Fixes for deprecation warnings and editing Tool models (T228222, T228332)
21:15 mutante: gerrit (cobalt) - scheduled 1h downtime, rebooting for kernel upgrade
21:03 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: T228290 Fix fatal in ChangesListFormatter::getLogTextLinks() (duration: 01m 02s)
20:57 mutante: gerrit2001 - icinga downtime for 1h
20:56 mutante: gerrit2001 - reboot for kernel upgrade
20:51 mutante: gerrit2001 - apt-get upgrade; apt-get autoremove ; puppet agent -tv
19:55 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
19:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T228374 Enable SecureLinkFixer in beta cluster (2/2) (duration: 00m 55s)
19:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T228374 Enable SecureLinkFixer in beta cluster (1/2) (duration: 00m 55s)
19:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T207750 Revoke editmyuserjsredirect from all users (duration: 00m 54s)
19:25 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
19:21 otto@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
19:20 eevans@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
18:45 mutante: contint2001 - had puppet failure in puppet board / dpkg issue due to unfinished zuul install which was done on contint1001 - stopped zuul and zuul-merger, apt-install zuul (was already latest version but needed to finish configure step), apt-get autoremove to remove unused packages, ran puppet. dpkg and puppet happy again
17:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/objectcache/RedisBagOStuff.php: 69cd8b0 (duration: 00m 55s)
17:15 Krinkle: krinkle@depoy1001: Pull down https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/523844/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralAuth/+/524276/ (no-op, not deploying)
16:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
16:29 XioNoX: upgrade Routinator to 0.5.0 in eqiad - T220669
16:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.14/resources/src/mediawiki.misc-authed-ooui/special.movePage.js: e97a284dbe54 (duration: 00m 58s)
16:17 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
16:06 XioNoX: upgrade Routinator to 0.5.0 in codfw - T220669
16:05 XioNoX: add routinator 0.5.0 to APT
15:54 fsero: depool ms-fe2005 - T228196
15:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.34.0-wmf.13 # T228436 T220739
15:19 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
14:46 godog: roll-restart thumbor in codfw - T228086
14:45 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
14:37 liw: all wikis at 1.34.0-wmf.14
14:36 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.14
14:28 bblack: cp hosts: apt autoremove to clean up pkgs on the fleet
14:27 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery (duration: 00m 20s)
14:26 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: deploying v0.0.94 of refinery
14:24 godog: repool thumbor2003
14:20 godog: reboot thumbor2003
14:17 jijiki: Depool thumbor2003 for reboot
14:12 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
13:53 moritzm: installing php5 security updates
13:50 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
13:36 jeh: rebooting labstore1005.eqiad.wmnet - T224228
13:34 jbond42: remove mtail 3.0.0~rc24.1-1+wmf1 from stretch-wikimedia
13:30 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 53s)
13:29 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
13:24 jbond42: downgrade cp servers backl to 3.0.0~rc5-1~bpo9+1
13:23 liw: promoting 1.34.0-wmf.14 to group1
13:22 godog: temporarily stop ircecho on icinga1001 to avoid spam
13:00 jbond42: rolling upgrade of mtail
12:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
12:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
12:53 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
12:51 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
12:34 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
12:26 jbond42: add mtail 3.0.0~rc24.1-1+wmf1 to stretch-wikimedia
11:13 dcausse: EU Swat done
11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert [cirrus] switch search traffic (except completion) to codfw (duration: 00m 56s)
11:02 godog: swift eqiad-prod: put back ms-be1043 sdk1 - T218544
10:51 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
10:43 ema: cp-eqiad: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
10:37 jijiki: enable puppet on services_proxy hosts - T228063
10:29 godog: reboot wezen.codfw.wmnet - T225713
10:27 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
10:15 jijiki: Disable puppet on services_proxy hosts - T228063
09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
09:26 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
09:09 godog: resume swift ms-be rolling restarts - T225713
09:03 fsero: reuploding missing layers T228196
08:57 hashar: contint1001: stopped zuul, ran apt install to get the new python2.7 copied to Zuul virtualenv, restarted zuul/zuul-merger. That clears a couple Icinga alarms from yesterday
08:56 marostegui: Drop afl_log_id column from enwiki.abuse_filter_log on db2116 T226851
08:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2116 (duration: 00m 55s)
08:18 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
08:14 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
06:56 dcausse: deleting zerowiki elastic indices (eqiad and codfw) T227718
05:22 marostegui: Stop MySQL on db2045, host will be decommissioned T228281
05:18 marostegui: Remove db2045 from tendril and zarcillo T228281
05:16 marostegui: Disable notifications on db2045 T228281
05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2045 from config, will be decommissioned T228281 (duration: 00m 54s)
05:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2045 from config, will be decommissioned T228281 (duration: 00m 56s)
04:31 legoktm: running query for T227843 on mwmaint102

2019-07-17

23:51 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 54s)
23:48 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseTheWikipediaLibrary (false everywhere, no-op) (duration: 00m 53s)
22:35 mutante: reimaging mw2250 after disks have been replaced
22:16 hoo: Manually started the Wikidata RDF dumps on snapshot1008 (due to T228104)
21:42 apergos: started wikidata entity dumps json run on snapshot1008
21:37 nuria: deployment aborted for refinary 0.0.94
21:37 nuria@deploy1001: Finished deploy [analytics/refinery@4f07755]: refinery 0.0.94 (duration: 36m 28s)
21:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/loadbalancer: T228104 rdbms: better handle a non-existing defaultGroup in LoadBalancer (duration: 00m 55s)
21:15 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/Flow: Clean up accidentally-deployed debugging code for T228290 (duration: 01m 02s)
21:10 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - T227958 (duration: 02m 52s)
21:07 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: Fix ?doc to use openapi instead of swagger - T227958
21:00 nuria@deploy1001: Started deploy [analytics/refinery@4f07755]: refinery 0.0.94
20:35 accraze@deploy1001: Finished deploy [ores/deploy@676f7ba]: T228331 (duration: 24m 59s)
20:10 accraze@deploy1001: Started deploy [ores/deploy@676f7ba]: T228331
19:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.14/includes/libs/rdbms/loadbalancer: T228104 rdbms: better handle a non-existing defaultGroup in LoadBalancer (duration: 00m 55s)
19:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2181.codfw.wmnet
18:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqiad
18:28 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s codfw
18:26 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s esams
18:25 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s ulsfo
18:23 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s eqsin
18:20 cdanis: cdanis@mw1261.eqiad.wmnet ~ % sudo -i pool
18:19 cdanis: testing conftool upgrade: cdanis@mw1261.eqiad.wmnet ~ % sudo -i depool
18:15 mutante: mw2181 - sudo: /usr/local/bin/mwscript: command not found on scap pull ??
18:14 mutante: mw2181 - scap pull (T205240)
18:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u 2019-07-17-conftool.yaml -s mw-canary
18:02 cdanis: upgrade to python3-conftool 1.1.1-1 on mwdebug2001
18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include jessie-wikimedia conftool/conftool_1.1.1-1+deb8u1_amd64.changes
18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include buster-wikimedia conftool/conftool_1.1.1-1+deb10u1_amd64.changes
18:01 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include stretch-wikimedia conftool/conftool_1.1.1-1_amd64.changes
17:09 papaul: shutting down restbase2009 for firmware upgrade
17:06 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[0|1] wikis to 1.34.0-wmf.13"
16:57 dcausse: morning swat done
16:54 dcausse@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/ElasticaErrorHandler.php: T228283: Log response data JSON on errors (duration: 00m 55s)
16:48 Urbanecm: Deployed patch for T207094
16:47 gehel@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
16:40 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm (not used anymore)
16:37 dcausse: reponing morning SWAT
16:24 papaul: shutting down mw2181 for firmware upgrade
16:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
16:19 jijiki: Depool mw2181 - T205240
16:08 Urbanecm: Morning SWAT done
16:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Raise zh_classicalwiki requirement for autoconfirmed (T228141) (duration: 00m 55s)
16:07 cmjohnson1: powering off cloudvirt1014 for rack move T226188
16:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on dewiki (T228150) (duration: 00m 54s)
16:01 jbond42: copy confd package from stretch-wikimedia to buster-wikimedia
15:47 Urbanecm: Re-syncing patch for T207094 T228284 and wmf.14
15:37 Urbanecm: Deployed patch for T207094 T228284 to wmf.13 and wmf.14
15:15 fsero: restarting swift-container-sync on ms-be* for getting logging configuration T228196
15:11 papaul: shutting down mw2250 for disk replacement
15:10 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
15:07 hashar: upgrading CI Jenkins # T228142
15:06 papaul: shutting down ms-be2022 for HW troubleshooting
15:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime
15:03 jijiki: Depool mw2269 to reboot it - T227548
15:00 godog: poweroff ms-be2022 - T227667
14:55 moritzm: updated jenkins in thirdparty/ci (stretch) and thirdparty (jessie) to 2.176.2 (T228142)
14:45 fsero: enabling container-sync logging T228196
14:41 otto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
14:41 otto@cumin1001: START - Cookbook sre.hosts.decommission
14:35 moritzm: restart pybal on lvs2002 (codfw primary) T227778
14:32 gehel@cumin1001: START - Cookbook sre.postgresql.postgres-init
14:31 gehel: repool maps1004 - T218097
14:11 liw@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.14 (duration: 00m 54s)
14:10 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.14
14:09 moritzm: restarting pybal on backup LVSes in codfw
14:02 liw@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CirrusSearch/includes/Searcher.php: Do not serialize ResultsType instance T228276 (duration: 00m 55s)
13:37 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:26 moritzm: disabled puppet on Icinga hosts in preparation of adding the LDAP replicas/codfw to LVS
13:10 ema: cp-codfw: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
13:06 ema: prometheus servers: remove varnish-upload_$dc_backend.yaml, replaced by ATS equivalent T227668
12:57 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
12:36 godog: upgrade hp raid firmware on ms-be1 hosts - T141756
12:15 Urbanecm: Running foreachwiki extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php in tmux session on mwmaint1002 (T209565)
12:11 Urbanecm: Ran extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php for cawiki and viwiki (T209565)
11:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
11:30 mlitn@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/WikibaseMediaInfo: [WikibaseMediaInfo] Revert "Add Wikidata links to statement UI elements" (duration: 00m 56s)
11:16 dcausse: reindexing wikidata (elastic@eqiad) T227136
11:08 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T227136: [cirrus] switch search traffic (except completion) to codfw (duration: 00m 54s)
10:53 moritzm: re-enabled icinga1001 in meta monitoring
10:41 godog: install updated linux-image-4.9.0-9-amd64 on ms-be hosts
10:30 godog: start rolling reboot of ms-be eqiad hosts - T225713
10:30 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
10:23 moritzm: rebooting icinga1001 for kernel update
10:20 moritzm: disabled icinga1001 in meta monitoring
10:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:08 moritzm: rebooting lithium for kernel update
10:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:33 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
09:33 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
09:23 moritzm: rebooting grafana1001 to pick up MDS-enabled qemu
09:21 ema: cp-ats: upgrade fifo-log-demux to 0.3 T227668
09:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool and clarify db2045 status T227862 (duration: 00m 55s)
09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:15 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
09:07 ema: upload fifo-log-demux 0.3 to stretch-wikimedia T227668
08:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:36 jijiki: Disable puppet on thumbor* in eqiad, depool and pool back to apply 523728 - T224572
08:17 jijiki: Pool mw1239 - T227867
07:48 godog: swift eqiad-prod: put back ms-be1043 sdk1 - T218544
07:46 ema: cp-esams: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
07:33 moritzm: reimaging sarin for some tests
06:59 elukey: apply mcrouter async replication to mw2224 - T225642
06:25 elukey: reboot analytics1072 as attempt to clear the megacli's config (and add a new disk)
06:20 elukey: sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to reset opcache
05:26 marostegui: Stop MySQL on db1065 for decommissioning - T227560
05:24 marostegui: Remove db1065 from tendril and zarcillo - T227560
03:46 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: T227772 (duration: 00m 54s)
03:42 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: T227772 (duration: 00m 56s)
03:00 tstarling@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 54s)
02:58 tstarling@deploy1001: Synchronized php-1.34.0-wmf.14/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 57s)
00:50 mutante: wikitech-static commented out cert renewal cron job out of caution - still needs fixing but continue tomorrow
00:12 mutante: wikitech-static - adding (undocumented!) option webroot-map to certbot config to use webroot authenticator with different document roots per domain while using the config file and not cli params (T214640)
00:01 mutante: wikitech-static certbot --dry-run renew (T214640)
00:01 mutante: wikitech-static changing certbot renewalparams: authenticator = webroot (changed from standalone), install = apache (unchanged) (T214640)

2019-07-16

23:53 RoanKattouw: Deployed patch for T207094
23:27 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 55s)
23:26 catrope@deploy1001: Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 56s)
23:26 mutante: wikitech-static - current status with method 'standalone' is that it's broken on cert renewal and gets fixed by restarting apache, which makes no sense since the previous fixes were the straight opposite and the ticket claims the fix was moving back from apache to standalone (T214640)
23:26 fsero: repool ms-fe2005 T228196
23:23 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me -> T204840#5243222 i previously did the opposite change in T214640#4907685 to fix it) and that takes down apache during the renewal (T214640)
23:20 mutante: wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me) and that takes down apache during the renewal
23:17 catrope@deploy1001: Synchronized php-1.34.0-wmf.14/extensions/GrowthExperiments/: Don't use timestamp in help panel questions in Flow (T212433) (duration: 00m 56s)
23:09 mutante: wikitech-static got ssl config files in sync with the repo, the difference was really just that space on one line each though (T225258)
22:35 fsero: uploading only blobs on docker-registry-codfw from a backup on ms-fe2005 T228196
22:29 mutante: wikitech-static the diff between the ssl config files in the repo and on server were just a space at the end of the ServerAdmin line .... T225258
22:28 fsero: depooling ms-fe2005 for swift upload for registry T228196
22:26 mutante: wikitech-static ran certbot with --dry-run renew to confirm cert renewal works and it was just fine .. 2 minutes later apache errors which were fixed by restarting apache2 (T214640)
22:24 mutante: wikitech-static restarted apache
22:11 mutante: wikitech-static: turn /etc/apache2/sites-available/wikitech-static.wikimedia.org-ssl.conf and status.wikimedia.org-ssl.conf into symlinks to /wikitech-static/apache/ to match config for http vhosts (T225258)
22:06 mutante: wikitech-static: move /etc/apache2/sites-available/000-default.conf and default-ssl.conf out of directory and reload apache to confirm they are not used and get us in sync with the repo contents again (T225258)
21:17 bd808@deploy1001: Finished deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508) (duration: 01m 08s)
21:15 bd808@deploy1001: Started deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508)
20:55 SMalyshev: repooled wdqs2004 and wdqs2001 - reload done
20:26 mutante: ganeti1001 - gnt-instance remove netmon1003.wikimedia.org (T220355)
19:59 XioNoX: update ACLs on pfw3-eqiad/codfw - T228205
19:52 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
19:51 fsero: republishing base images for wikimedia-(stretch,jessie and buster) T228196
18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:58 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
18:58 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:54 gehel: data copy from wdqs2004 to wdqs2001 - T228122
18:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - Produce revision-create stream to eventgate-main - T211248 (duration: 00m 54s)
18:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce revision-create stream to eventgate-main - T211248 (duration: 00m 54s)
18:08 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Update ExtensionDistributor config to point to REL1_33 as the released version (duration: 00m 54s)
18:05 fsero: republishing base images for nodejs-slim due to registry T228196
18:02 andrewbogott: rebooting cloudcontrol2003-dev, cloudweb2001-dev, cloudcontrol1004 for T225713
17:39 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce centralnotice.campaign-* streams to eventgate-main - T211248 (duration: 00m 55s)
17:23 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to 334a4c4 (T227907) (duration: 04m 51s)
17:19 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to 334a4c4 (T227907)
16:55 mutante: netmon1003: shutdown -h now | ganeti1001: gnt-instance shutdown netmon1003.wikmedia.org - removed from icinga T198939 T220355
16:36 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling jobs to PHP7 - T219150 (duration: 00m 50s)
16:35 jiji@deploy1001: Started deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling jobs to PHP7 - T219150
16:28 dcausse: reindexing wikidata (elastic@eqiad) T227136
15:57 tarrow@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
15:37 elukey: reboot analytics1072 as attempt to force the raid controller to set a drive failed - T226467
15:12 elukey: start mariadb on db1107 and re-enable mysql consumers on eventlog1002 and replication on db1108
14:53 elukey: stop mariadb on db1107 to allow maintenance
14:53 elukey: stop eventlogging mysql consumers on eventlog1002 and eventlogging_sync on db1108 to allow db1107 maintenance
14:52 jbond42: will restart redis on oresdb at 16:00 UTC - T228045
14:51 jbond42: enable puppet accross the fleat
14:50 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
14:40 jbond42: disable puppet accross the fleat to make a change to the hiera
14:30 jijiki: Enable puppet and rolling restart thumbor* in codfw - T224572
14:16 jijiki: Depool thumbor2001 and pool back - T224572
14:13 jijiki: Disabling puppet on thumbor*codfw.wmnet - T224572
14:08 liw: group0 to 1.34.0-wmf.14
14:06 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to php-1.34.0-wmf.14
13:41 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.14 and rebuild l10n cache (duration: 26m 45s)
13:24 vgutierrez: restarting pybal on lvs2001 and lvs1013
13:20 vgutierrez: restarting pybal on lvs2004 and lvs1016
13:14 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.14 and rebuild l10n cache
12:59 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.8 (duration: 01m 46s)
12:57 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.7 (duration: 02m 01s)
12:54 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.6 (duration: 02m 04s)
12:52 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 (duration: 02m 11s)
12:49 liw@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 (duration: 07m 42s)
12:42 dcausse: deleting stale wikidata indices (elastic@eqiad) T227136
12:11 jijiki: Depool mw1293 and pool back
11:57 moritzm: synched docker-ce, docker-ce-cli, containerd.io to thirdparty/ci for stretch-wikimedia (T226236)
11:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
11:12 moritzm: rebooting remaining swift frontends in eqiad to pick up a kernel with SACK fixed (T228086)
10:29 moritzm: rebooting ms-fe1005 to pick up kernel with SACK fixed (T228086)
10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:17 vgutierrez: restart pybal on lvs1013
10:15 vgutierrez: restart pybal on lvs2001
10:11 vgutierrez: restarting pybal on lvs1016
10:08 vgutierrez: restarting pybal on lvs2004
10:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=ncredir,service=nginx
09:24 elukey: apply mcrouter async replication settings to mw1276 - T225642
09:23 elukey: pool mw1261 back with mcrouter async replication settings - T225642
08:50 fsero: upload coredns docker image into registry T226516
08:44 jynus: droping servermon accounts from m1 dbs T198939
08:12 fsero: uploading coredns_1.5.2 for buster and stretch - T226516
08:11 fsero: uploading coredns_1.5.2 for buster and stretch
07:45 elukey: depool mw1261 to test mcrouter changes
00:24 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/cache/LinkCache.php: 4a5f4ca2fd788 (duration: 00m 51s)
00:05 catrope@deploy1001: Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Restrict AMC scripts and styles to AMC mode (T227929) (duration: 00m 52s)
00:03 shdubsh: restart logstash to revert mitigations - T228089

2019-07-15

23:55 XioNoX: rotate network-root password
23:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:31 robh@cumin1001: START - Cookbook sre.hosts.decommission
23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove reference to non-existent feature flag (duration: 00m 51s)
22:33 XenoRyet: updated civicrm from 8a4451f390 to 3be1a8c77c
22:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgNonincludableNamespaces, default, never varied (duration: 00m 52s)
22:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Drop wmgEnableTabularData and wmgEnableMapData, unused (duration: 00m 55s)
21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Use wmgEnableJsonConfigDataMode instead of wmgEnableTabularData and wmgEnableMapData (duration: 00m 56s)
21:56 jijiki: Depool mw1239 for maintenance - T227867
21:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wmgEnableJsonConfigDataMode to IS (duration: 00m 55s)
21:46 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Add more severe rate limits for eswikiquote (T227416) (duration: 00m 50s)
21:16 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
21:06 XioNoX: rollback `as-path HE ".* 6939 .*"` to AVOID-PATH in eqsin - T228015
20:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Title.php: T227700 / T227700: getSubpage should not lose the interwiki prefix (duration: 00m 52s)
20:54 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907) (duration: 02m 24s)
20:52 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907)
20:52 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907) (duration: 07m 53s)
20:50 Krinkle: deploy1001: Unable to fetch git commits from Gerrit for php-1.34.0-wmf.13 due to "error: cannot update the ref 'refs/remotes/origin/fundraising/REL1_31': unable to append to '.git/logs/refs/remotes/origin/fundraising/REL1_31': Permission denied"
20:47 XioNoX: add `as-path HE ".* 6939 .*"` to AVOID-PATH in eqsin - T228015
20:44 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@bc3a2fd]: Update mobileapps to 7fd39da (T227907)
20:30 XioNoX: deactivate HE peering in eqsin - T228015
20:02 jynus: reducing consistency of db2045 to avoid lag at T227862
19:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:31 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@fd0a41a]: Change the name of the error log field for deduplicatio (duration: 01m 13s)
19:30 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@fd0a41a]: Change the name of the error log field for deduplicatio
19:27 ppchelko@deploy1001: Finished deploy [changeprop/deploy@df6322a]: Rename error field in deduplication logs (duration: 01m 28s)
19:26 ppchelko@deploy1001: Started deploy [changeprop/deploy@df6322a]: Rename error field in deduplication logs
19:25 XenoRyet: update payments-wiki from 59ace50d66 to 224c6b2d7b
19:10 thcipriani: gerrit back
19:09 thcipriani: gerrit restart for v2.15.14
19:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (cobalt - restart incoming) (duration: 00m 10s)
19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (cobalt - restart incoming)
19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (gerrit2001) (duration: 00m 12s)
19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@40d88dc]: Bump gerrit version to 2.15.14 (gerrit2001)
19:05 shdubsh: restarting logstash on logstash1008
18:27 Urbanecm: Morning SWAT done
18:13 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Remove spam mitigations (T200104) (duration: 00m 50s)
18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable WelcomeSurvey A/B test for arwiki (T226221) (duration: 01m 02s)
18:07 jbond42: syncing puppetmaster1001 facts to compiler1001/1002
17:34 cdanis: downtime mr1-eqsin.oob IPv6 for 20h T227967
16:58 jynus: setting labsdb1009/10/11 to performance scaling_governor T225713
16:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce revision-visibility-change stream to eventgate-main - T211248 (duration: 00m 49s)
14:08 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 06s)
14:08 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
14:08 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 01s)
14:07 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
14:07 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 01s)
14:07 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
14:06 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 07s)
14:06 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
14:04 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 05s)
14:04 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
13:55 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 06s)
13:55 elukey: enable profile::base::firewall on notebook100[3,4]
13:55 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
13:55 otto@deploy1001: Finished deploy [analytics/refinery@3296aab] (notebook): (no justification provided) (duration: 00m 15s)
13:54 otto@deploy1001: Started deploy [analytics/refinery@3296aab] (notebook): (no justification provided)
13:23 Urbanecm: Running mwscript importImages.php --wiki=commonswiki --user=Meisam /home/urbanecm/T223052
13:16 gehel: repooling maps eqiad - T218097
13:02 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
13:01 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
12:59 gehel: depooling kartotherian eqiad - T225713
12:59 gehel: re-enabling kartotherian codfw - T225713
12:55 gehel: shutting down tilerator on maps eqiad to free some CPU - T225713
12:54 gehel: shutting down tilerator on maps eqiad to free some CPU -
12:52 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Delete Image-reviewer group from commonswiki for good (T216406) (duration: 00m 51s)
12:50 gehel: restarting kartotherian on maps1002
12:35 gehel: reimporting OSM data for maps eqiad cluster - T218097
12:25 moritzm: installing openjpeg2 security updates
12:20 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --bureaucrat Ladsgroup
12:16 jbond42: update redis on mwlog, pybal-test, maps and rdb*
12:10 moritzm: installing ldap-replica200[12] (T227778)
12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Specify $wgWBRepoSettings['conceptBaseUri'] again (T225212) (duration: 00m 50s)
12:06 moritzm: removing myself from cn=tools.admin (currently not used, was mostly historical for debugging some Toollabs issue in the past)
12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Specify $wmgWBRepoConceptBaseUri again (T225212) (duration: 00m 51s)
12:00 Urbanecm: Running mwscript initSiteStats.php --wiki=commonswiki --update to update Special:Statistics after a big change (T216406)
11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Regrant image reviewers on commonswiki the ability to mass upload (T216406) (duration: 00m 50s)
11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Rename `Image-reviewer` to `image-reviewer` for Commons (2/2, T216406) (duration: 00m 48s)
11:48 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Rename `Image-reviewer` to `image-reviewer` for Commons (1/2, T216406) (duration: 00m 50s)
11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on the Finnish Wikipedia (T228008) (duration: 00m 51s)
11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Move private and fishbowl overrides from groupOverrides to groupOverrides2 (T227980) (duration: 00m 51s)
11:24 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/http/MultiHttpClient.php: SWAT: Raise default reqTimeout in MultiHttpClient (T226979) (duration: 00m 51s)
11:23 moritzm: installing python-django security updates on jessie
11:22 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Title.php: SWAT: When title contains only slashes, Title::getRootText() shouldnt return false (T227816) (duration: 00m 51s)
11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WikiLove and SandboxLink on sqwiki (T227970) (duration: 00m 51s)
11:15 Urbanecm: Running mwscript extensions/WikimediaMaintenance/createExtensionTables.php sqwiki wikilove for T227970
11:13 Urbanecm: Running mwscript migrateUserGroup.php --wiki=commonswiki Image-reviewer image-reviewer for T216406
11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disallow admins to grant or revoke image reviewer due to migration (T216406) (duration: 00m 50s)
11:08 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Create image-reviewer for commonswiki with same rights as Image-reviewer (T216406) (duration: 00m 52s)
10:52 moritzm: installing ldap-replica200[12] (T227778)
10:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
09:56 ema: cp-eqsin: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
09:39 fsero: repooling ms-fe2005 T227570
08:50 fsero: creating docker_registry_codfw on eqiad T227570
08:49 gehel: correction: set oemhp_powerreg=os + reboot for elastic1052 (NOT elastic1054) - T225713
08:49 fsero: T227570 changing container_synchronization on docker_registry_codfw to //docker_registry/eqiad/AUTH_docker/docker_registry_codfw
08:48 gehel: set oemhp_powerreg=os + reboot for elastic1054 - T225713
08:22 godog: set oemhp_powerreg=os on ms-be10[16-39] - T225713
08:01 vgutierrez: upgrading acme-chief to version 0.19 in acme-chief production instances - T225945

2019-07-14

13:18 godog: silence mr1-eqsin.oob IPv6 until tomorrow 8 UTC - T227967
12:01 Urbanecm: Running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sporti /home/urbanecm/T227968 for server side upload

2019-07-13

01:51 MaxSem: DIsabled 2FA for my staff account

2019-07-12

23:35 mutante: netmon1003 - shutdown -h now after it's gone from Icinga now
23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
23:28 mutante: netmon1003 - stopping apache2 service (decom of servermon.wikimedia.org)
19:41 James_F: Disabled 2FA for MSchottlender-WMF for device reset.
19:17 shdubsh: add prometheus-varnishkafka-exporter 0.1 to apt repo T196066
19:15 urandom: bootstrapping restbase1017-c -- T222960
19:08 jeh: rebooting cloudvirt1018.eqiad.wmnet T216040
18:53 mutante: cp1072 - enabling notifications for service checks in icinga, they were disabled but all green and no SAL/ticket. looked like forgotten from the past
18:49 gehel: setting CPU governor to performance for wdqs1010 - T225713
18:16 Krinkle: Remove bogus Graphite data at frontend.navtiming2.requet (typo from Nov 2018), graphite1004/2003
18:02 urandom: bootstrapping restbase1017-b -- T222960
16:32 urandom: bootstrapping restbase1017-a -- T222960
16:25 jijiki: Rolling restart swift proxy on ms-fe*
15:25 jeh: rebooting cloudvirt1018.eqiad.wmnet T216040
14:05 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
12:45 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
12:39 fsero: recreating ci staging namespaces T227775
12:39 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
12:38 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
12:36 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
12:33 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
12:33 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
12:22 fsero: recreating eventgate-* and blubberoid staging namespaces T227775
12:22 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
12:22 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
12:18 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
12:15 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
12:11 fsero: recreating sessionstore,cxserver and mathoid staging namespaces T227775
12:10 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
12:06 fsero: recreating citoid staging namespace T227775
12:05 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
12:01 fsero: recreating termbox staging namespace T227775
11:09 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Switchover db2045 x1 codfw master to db2069 (duration: 00m 51s)
10:24 jynus: switchover x1 codfw master from db2045 to db2069 T227862
10:23 jynus: switchover x1 codfw master from db2045 to db2069
09:43 moritzm: shut down ldap-codfw-replica01/ldap-codfw-replica02 (pending reimage)
08:18 jijiki: enable puppet on mw1222
06:35 vgutierrez: upgrading acme-chief to version 0.19 in acme-chief test instances - T225945
06:28 vgutierrez: uploaded acme-chief 0.19 to apt.wikimedia.org (buster) - T225945
05:45 elukey: sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to clear opcache
01:01 Krinkle: mw1342 generated some ~ 11,500 additional PHP errors over a 4 hour period (18:00-22:30 UTC), ref T224491
00:59 Krinkle: mw1342 is generating strange PHP erros (php7 only), ref T224491
00:58 urandom: bootstrapping restbase1017-a -- T222960
00:50 mutante: restbase1018 - restart ferm service
00:15 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e4bd91f71b (duration: 00m 50s)
00:13 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f309856f0912 (duration: 00m 50s)
00:03 eevans@deploy1001: Finished deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960) (duration: 00m 03s)
00:03 eevans@deploy1001: Started deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960)
00:01 eevans@deploy1001: Finished deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960) (duration: 00m 25s)
00:01 eevans@deploy1001: Started deploy [cassandra/metrics-collector@df909a1]: deploy logback to restbase1017 (T222960)

2019-07-11

23:58 thcipriani@deploy1001: Synchronized php-1.34.0-wmf.13/includes/watcheditem/WatchedItemStore.php: SWAT: WatchedItemStore: Fix fatal when revision is deleted T226741 (duration: 00m 51s)
23:49 eevans@deploy1001: Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: deploy logback to restbase1017 (T222960) (duration: 00m 47s)
23:48 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: deploy logback to restbase1017 (T222960)
23:47 eevans@deploy1001: Finished deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided) (duration: 01m 56s)
23:45 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided)
23:38 eevans@deploy1001: deploy aborted: (no justification provided) (duration: 02m 00s)
23:36 eevans@deploy1001: Started deploy [cassandra/logstash-logback-encoder@d085ffa]: (no justification provided)
23:15 thcipriani@deploy1001: Synchronized wmf-config: SWAT: Oversample all EditAttemptStep events on VE-as-mobile-default wikis T227317 (duration: 00m 50s)
22:59 mutante: netmon1003 - removing servermon - servermon.wikimedia.org is being decom'ed (T198939)
22:37 RoanKattouw: Deployed fix for T224240, accidentally rode along with Tyler's no-op scap
22:34 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: wikidatawiki back to 1.34.0-wmf.13
22:26 thcipriani@deploy1001: Finished scap: no op scap sync to rebuild l10n-cache (T227814) (duration: 19m 34s)
22:07 thcipriani@deploy1001: Started scap: no op scap sync to rebuild l10n-cache (T227814)
21:23 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 02m 02s)
21:21 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
20:22 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
20:22 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
20:20 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 03s)
20:20 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
20:19 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
20:19 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
20:18 otto@deploy1001: Finished deploy [analytics/refinery@3296aab]: (no justification provided) (duration: 00m 02s)
20:18 otto@deploy1001: Started deploy [analytics/refinery@3296aab]: (no justification provided)
20:11 milimetric@deploy1001: deploy aborted: Fix to reimport cu_changes (duration: 27m 34s)
20:03 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert wikidata to 1.34.0-wmf.11
19:44 milimetric@deploy1001: Started deploy [analytics/refinery@3296aab]: Fix to reimport cu_changes
19:29 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.13 refs T220738
18:09 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.13 refs T220738 (duration: 00m 57s)
18:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.13 refs T220738
18:02 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s eqiad
17:37 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s codfw
17:02 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s esams
16:48 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s eqsin
16:19 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s ulsfo
16:12 XioNoX: revert deactivate ping-offload in eqiad for server reboot
16:03 moritzm: rebooting ping1001 to pick up MDS-enabled qemu
16:02 cdanis: repool cp4022 after testing conftool change
15:59 XioNoX: deactivate ping-offload in eqiad for server reboot
15:58 cdanis: depool cp4022 for testing conftool change
15:58 XioNoX: revert deactivate ping-offload in codfw for server reboot
15:56 moritzm: installing dnspython update from stretch point release
15:53 moritzm: rebooting ping2001 to pick up MDS-enabled qemu
15:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:50 XioNoX: deactivate ping-offload in codfw for server reboot
15:45 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (cobalt) (duration: 00m 11s)
15:45 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (cobalt)
15:44 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (gerrit2001 only) (duration: 00m 11s)
15:44 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4daa16c]: it-phabricator plugin update (gerrit2001 only)
15:28 gehel: setting CPU governor to performance for wdqs1004 - T225713
15:28 cdanis: upgrade to python3-conftool 1.1.0-1 on cp4022
15:05 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s cp-canary
15:00 hashar_: restarted Jenkins for plugins upgrades
14:57 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo debdeploy deploy -u T197126-2019-07-11-conftool.yaml -s mw-canary
14:55 gehel: setting CPU governor to performance for elastic1052 - T225713
14:51 cdanis: upgrade to python3-conftool 1.1.0-1 on mwdebug2001
14:45 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/database/Database.php: 903f3f94f5d2e3 / T227708 (duration: 00m 59s)
14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include stretch-wikimedia /home/volans/conftool/stretch/conftool_1.1.0-1_amd64.changes
14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include jessie-wikimedia /home/volans/conftool/jessie/conftool_1.1.0-1+deb8u1_amd64.changes
14:26 cdanis: cdanis@install1002.wikimedia.org ~ % sudo -E reprepro -C main include buster-wikimedia /home/volans/conftool/buster/conftool_1.1.0-1+deb10u1_amd64.changes
14:17 ema: restart wikibugs
13:40 godog: roll restart ms-be2016 ms-be2017 ms-be2018 ms-be2019 ms-be2020 ms-be2021 ms-be2028 ms-be2029 ms-be2030 ms-be2031 ms-be2032 ms-be2033 ms-be2034 ms-be2035 ms-be2036 - T225713
13:00 ema: cp-ulsfo: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672
12:48 ema: fleet-wide: remove obsolete file /etc/debdeploy-autorestarts.conf
12:44 ema: cp-ulsfo: upgrade mtail to 3.0.0~rc5-1~bpo9+1wmf1
12:44 Urbanecm: Running purgePage.php on pages in Page: NS on pawikisource (T226959)
12:39 jijiki: Disable puppet on mw1222, server will be depooled and pooled a few times for tests - T224538
12:07 godog: ms-be2031 raid controller firmware upgrade 4.52 -> 6.88 - T141756
12:03 godog: power reset ms-be2031, stuck and nothing on console
11:56 Urbanecm: EU SWAT done
11:54 urbanecm@deploy1001: Finished scap: Namespace translation for Punjabi (T226959) (duration: 30m 13s)
11:24 urbanecm@deploy1001: Started scap: Namespace translation for Punjabi (T226959)
11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove usergroup communityapps from officewiki (T227680) (duration: 01m 02s)
11:20 urbanecm@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Remove commonswiki from mobilemainpagelegacy (T227719) (duration: 00m 58s)
11:14 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable UTR30 as a lookup method for ns prefixes on group2 (duration: 01m 02s)
10:45 moritzm: installing ldap-codfw-replica*
10:28 fsero: depooling ms-fe2005 for docker_registry_backups T227570
10:08 fsero: creating swift docker_registry_container_backup T227570
09:56 moritzm: re-enabling puppet (puppetdb reboots completed)
09:47 moritzm: rebooting puppetdb1001 to pick up MDS-enabled qemu
09:35 moritzm: rebooting puppetdb2001 to pick up MDS-enabled qemu
09:31 moritzm: disabling puppet temporarily (for puppetdb reboots)
09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:51 godog: upload mtail 3.0.0~rc5-1~bpo9+1wmf1 to stretch-wikimedia - T225604
08:14 ema: cp-ulsfo: downgrade mtail to 3.0.0~rc5-1~bpo9+1 to fix varnishmtail-backend T225604
07:43 moritzm: installing ldap-codfw-replica* T227669
07:31 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
07:21 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
07:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
07:11 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
07:10 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
07:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
02:27 ejegg: updated payments-wiki from 4c1261fe5d to 59ace50d66

2019-07-10

23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/CirrusSearch/includes: T227691 RedirectsAndIncomingLinks: succeede or fail, but not both (duration: 01m 02s)
23:02 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/OAuth/includes/backend/MWOAuthUtils.php: T227688 OAuth: Do not rely on array autocreation for custom User properties; re-try (duration: 00m 58s)
22:59 jforrester@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
22:57 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/user/User.php: T227688 User: support setting custom fields + array autocreation in non-existent field (duration: 00m 58s)
22:46 shdubsh: downgrading cp4031 to mtail_3.0.0~rc5-1~bpo9+1wmf1 to fix varnishmtail T225604
22:46 jforrester@deploy1001: Synchronized w: T156319 Remove /w/skin-1.5 symlink (duration: 00m 58s)
22:16 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T212865 Stop configuring ZeroBanner and ZeroPortal, unused (duration: 00m 58s)
22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T212865 Drop the ability to use ZeroBanner and ZeroPortal from production (duration: 00m 57s)
22:03 jforrester@deploy1001: Synchronized wmf-config/mobile.php: T212865 Drop the ability to use ZeroBanner and ZeroPortal from production, mobile code (duration: 00m 57s)
21:59 jforrester@deploy1001: Synchronized w/robots.php: T212865 Drop the special treatment for Wikipedia Zero (duration: 00m 58s)
21:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T212865 Drop the Wikipedia Zero debug log channel (duration: 00m 58s)
21:51 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187716 Drop all zerowiki configuration (duration: 00m 58s)
21:50 mutante: mwdebug1002 - php7adm /opcache-free because icinga showed a warning for opcache free space below 100MB
21:49 jforrester@deploy1001: Synchronized dblists/: T187716 Mark zerowiki as deleted in dblists (duration: 01m 00s)
21:41 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T212865 Disable ZeroBanner on all wikis (duration: 00m 59s)
21:36 mutante: mw1235 - restarting hhvm (socket timeout alert in icinga since about 1.5h)
21:35 mutante: mw1290 - restarting hhvm (socket timeout alert in icinga since about 5h)
19:45 hoo: Updated the Wikidata property suggester with data from the 2019-07-01 JSON dump and applied the T132839 workarounds
19:32 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce recentchange stream to eventgate-main - T211248 (duration: 00m 57s)
19:26 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Use wgEventServiceStreamConfig to configure wgRCFeeds eventbus. No-op in prod. - T211248 (duration: 00m 58s)
19:05 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - T219150 (duration: 01m 00s)
19:04 jiji@deploy1001: Started deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - T219150
18:15 jforrester@deploy1001: Synchronized php-1.34.0-wmf.13/includes/Linker.php: T227656 Fix visibility of IPs that aren't suppressed (duration: 00m 59s)
17:54 twentyafterfour: phabricator: hotfixing fatal error by pulling upstream fix ( see https://secure.phabricator.com/D20644 )
16:09 Urbanecm: Morning SWAT done
16:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change bawikibooks logo to correct one according to community wish (2/2, T227418) (duration: 00m 58s)
16:07 Urbanecm: Purged two urls for T227418
16:06 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Change bawikibooks logo to correct one according to community (1/2, T227418) (duration: 01m 16s)
16:04 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Disable local uploads on wuuwiki (T226764) (duration: 00m 58s)
15:23 ema: cp-ulsfo: upgrade varnish to 5.1.3-1wm11 T227672
15:08 ema: restart wb2-phab wikibugs job
14:51 ema: upload varnish 5.1.3-1wm11 to stretch-wikimedia T227672
14:42 godog: reimage ms-be2022 - T227667
14:03 jbond42: copy puppetdb-termini 4.4.0-1~wmf2 from stretch-wikimedia to jessie-wikimedia
13:47 ema: cp hosts: cleanup WP zero leftovers T213769
13:22 godog: reset ilo on ms-be2022 - bios can't talk to it on boot
12:49 godog: reboot ms-be2022 - T225713
11:53 Urbanecm: Purged 14 urls for T211413
11:51 Urbanecm: Purged 24 urls for T227635
11:11 Urbanecm: EU SWAT done
11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove autopromote to patroller on testwiki (T168718) (duration: 00m 58s)
11:10 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Several logo changes (T227635 T211413) (duration: 01m 00s)
11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove fawikiquote HD logo (T211413) (duration: 00m 57s)
11:07 urbanecm@deploy1001: sync-file aborted: SWAT: Several logo changes (T227635 T211413) (duration: 00m 20s)
11:06 urbanecm@deploy1001: Synchronized docroot/noc/conf/highlight.php: SWAT: Fix non-working "raw text" links on noc.wikimedia.org web pages (T227606) (duration: 01m 02s)
09:57 moritzm: re-enabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts (actually did that 20 minutes ago, but missed to log earlier)
09:54 jynus: disabling puppet on prometheus* hosts for upcoming deploy
09:38 fsero: doing the same on ms-be1030
09:37 fsero: docker-registry: running manual only once swift-container-sync on ms-be2019
09:36 moritzm: rearmed keyholder on acmechief1001
09:29 moritzm: rebooting acmechief1001 to pick up MDS-enabled qemu
09:25 moritzm: rearmed keyholder on acmechief2001
09:22 moritzm: rebooting acmechief2001 to pick up MDS-enabled qemu
09:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:19 moritzm: disabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts
08:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
08:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
08:06 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
08:06 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
05:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1079 after upgrade (duration: 00m 57s)
05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1079 after upgrade (duration: 00m 57s)
05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 after upgrade (duration: 00m 58s)
05:05 marostegui: Upgrade db1079
05:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 for upgrade (duration: 00m 59s)

2019-07-09

23:07 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.13 refs T220738
23:06 robh: updating power ports on T209101 and disabling ports not in used (only turning off one side and awaiting any icinga alerts for 15 minutes before touching other side of power)
22:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/AbuseFilter/includes/AbuseFilter.php: 0096dff3022 / T227613 (duration: 00m 57s)
22:52 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/SecurePoll/includes/pages/: c7d7a55 / T227620 (duration: 00m 57s)
22:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/extensions/Collection/includes/CollectionProposals.php: T227407 / 69a30966c (duration: 00m 57s)
21:53 krinkle@deploy1001: Synchronized php-1.34.0-wmf.13/includes/libs/rdbms/: T226770 / 4c2a58589f2db (duration: 00m 59s)
20:58 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.34.0-wmf.11"
20:37 mutante: scb1001 - re-activate puppet, run puppet, stop pdfrender service, run puppet again (T226675)
20:36 mutante: scb2001 - sudo systemctl stop pdfrender (T226675)
20:25 mutante: temp disabling puppet on scb1001 - removing pdfrender classes from scb2001
20:23 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.13 refs T220738
20:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.13 (duration: 36m 39s)
19:36 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.13
19:17 XioNoX: enable samping on cr2-eqiad:border-in4
19:14 XioNoX: replace netflow target on cr2-eqiad with netflow1001
18:19 longma: cutting the branch for 1.34.0-wmf.13 T220738
17:32 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint, take 2 (duration: 02m 04s)
17:30 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint, take 2
17:30 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint (T227481) (duration: 03m 49s)
17:26 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@2a9d097]: Fix etag generation for the talk endpoint (T227481)
16:59 godog: reboot ms-be2039 with oemhp_powerreg=os - T225713
16:54 godog: reboot ms-be2027 with oemhp_powerreg=os - T225713
16:42 godog: reboot ms-be2026 with oemhp_powerreg=os - T225713
16:29 godog: reboot ms-be2025 with oemhp_powerreg=os - T225713
15:44 XioNoX: reject RPKI invalids on Ashburn peering links - T220669
15:38 akosiaris: restart pybal on lvs2003, lvs1015. Removal of pdfrender service T226675
15:38 XioNoX: reject RPKI invalids on Amsterdam peering link - T220669
15:33 akosiaris: restart pybal on lvs2006, lvs1016. Removal of pdfrender service T226675
15:28 XioNoX: reject RPKI invalids on Chicago peering link - T220669
15:27 godog: reboot ms-be2024 with oemhp_powerreg=os - T225713
15:22 godog: reboot ms-be2023 with oemhp_powerreg=os - T225713
15:20 XioNoX: reject RPKI invalids on Singapore peering link - T220669
15:13 XioNoX: reject RPKI invalids on Dallas peering link - T220669
15:03 jeh: rebooting cloudnet1003.eqiad T224228
14:53 gehel: repooled elastic2054 - T227298
14:50 moritzm: installing orespoolcounter100[34] T227567
14:42 XioNoX: reject RPKI invalids on ulsfo peering link - T220669
14:29 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@8517fec]: Migrating cirrus* jobs to PHP7 - T219150 (duration: 01m 02s)
14:28 jiji@deploy1001: Started deploy [cpjobqueue/deploy@8517fec]: Migrating cirrus* jobs to PHP7 - T219150
14:28 jeh: rebooting cloudnet1004.eqiad T224228
14:21 tarrow@deploy1001: scap-helm termbox finished
14:21 tarrow@deploy1001: scap-helm termbox cluster staging completed
14:21 tarrow@deploy1001: scap-helm termbox upgrade staging stable/termbox -f termbox-staging-values.yaml [namespace: termbox, clusters: staging]
13:59 moritzm: installing orespoolcounter200[34] T227567
13:26 elukey: enable base::firewall on stat1007
13:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:27 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
12:21 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
12:18 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
12:13 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
12:11 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
12:11 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
12:09 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
12:04 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
12:02 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
11:57 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
11:47 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:47 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:13 Urbanecm: EU SWAT done
11:12 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Disable flaggedrevs for hewikisource main page (T227000) (duration: 00m 48s)
11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up `wgNamespacesWithSubpages` to remove unneeded entries (T227546) (duration: 00m 49s)
11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Configuration migration for Translate (T87985) (duration: 00m 49s)
11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure help urls for MediaInfo (T227226) (duration: 00m 50s)
10:39 elukey: update wikimedia-buster thirparty/amd-rocm component with upstream packages - T224723
10:14 jbond42: upgrade openssl on canary systems
09:30 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
09:26 ema: cp1076: restart trafficserver with storage.config set to /dev/nvme0n1
09:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=ats-be
09:13 elukey: enable per-server metrics on all prometheus-mcrouter-exporter(s) via puppet - T225059
09:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 after upgrade (duration: 00m 49s)
08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 after upgrade (duration: 00m 47s)
08:49 elukey: upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-eqiad (cumin alias) via debdeploy - T225059
08:41 marostegui: Upgrade db1086
08:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 for upgrade (duration: 00m 51s)
08:36 elukey: upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-codfw (cumin alias) via debdeploy - T225059
08:08 moritzm: installing zeromq3 security updates
08:00 marostegui: Upgrade db1065 to 10.1.39
07:39 moritzm: pruning unused libzmq3/python-zmq packages from swift/parsoid hosts
07:26 elukey: upload prometheus-mcrouter-exporter 0.0.0+git20190709-1 to stretch-wikimedia - T225059
06:00 marostegui: Failover m2 from db1065 to db1132 - T226952
05:19 marostegui: Start switchover steps T226952
05:13 marostegui: Rebooting pc2010 for a second time as per papaul's suggestion T227552
04:53 marostegui: Reboot pc2010 to debug a memory issue
01:47 XioNoX: restart PHP FPM on mwdebug2001
01:35 XioNoX: restart PHP FPM on mwdebug1002

2019-07-08

23:03 tzatziki: changing password for user "Naomi.piquette"
20:57 bd808: Upgraded prometheus-pdns-exporter to 0.4.1 on cloudservices1004.wikimedia.org (T227411)
20:53 bd808: Upgraded prometheus-pdns-exporter to 0.4.1 on cloudservices1003.wikimedia.org (T227411)
19:38 reedy@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/OATHAuth/src/Key/TOTPKey.php: T227502 (duration: 00m 50s)
19:23 moritzm: uploaded prometheus-pdns-exporter 0.4.1 to stretch-wikimedia T227411
18:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-* streams to eventgate-main - T211248 (duration: 00m 50s)
18:33 moritzm: installing zeromq3 security updates
18:15 Urbanecm: Morning SWAT done
18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change liwikinews logo to correct one per community wish (2/2, T227418) (duration: 00m 49s)
18:13 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Change liwikinews logo to correct one per community wish (1/2, T227418) (duration: 00m 49s)
18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add templateeditor user group and protection level on commons (T227420) (duration: 00m 49s)
18:06 urbanecm@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: SWAT: [cirrus] Increase elastic master timeout to 5m (T227136) (duration: 00m 49s)
18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable RDF output for MediaInfo (T221916) (duration: 00m 49s)
17:20 gehel@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: new blazegraph and updater version (duration: 12m 47s)
17:08 gehel@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: new blazegraph and updater version
16:40 eevans@deploy1001: scap-helm sessionstore finished
16:40 eevans@deploy1001: scap-helm sessionstore cluster staging completed
16:40 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
16:39 eevans@deploy1001: scap-helm sessionstore finished
16:38 eevans@deploy1001: scap-helm sessionstore cluster staging completed
16:38 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
16:38 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
16:36 eevans@deploy1001: scap-helm sessionstore finished
16:36 eevans@deploy1001: scap-helm sessionstore cluster staging completed
16:36 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
16:05 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive - part III (duration: 00m 50s)
15:59 godog: bounce prometheus@k8s on prometheus200[34] - T227478
15:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2045 instead of db2069 as x1 codfw master (duration: 00m 49s)
15:45 marostegui: Failover db2069 to db2045 on x1 codfw
15:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2069 as x1 codfw master (duration: 00m 50s)
15:15 jynus: shutting down db2097 T225378 T216240
15:13 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@7379e91]: Migrating refreshLinks to PHP7 - T219150 (duration: 01m 26s)
15:12 jiji@deploy1001: Started deploy [cpjobqueue/deploy@7379e91]: Migrating refreshLinks to PHP7 - T219150
15:07 eevans@deploy1001: scap-helm sessionstore finished
15:07 eevans@deploy1001: scap-helm sessionstore cluster staging completed
15:07 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
15:04 eevans@deploy1001: scap-helm sessionstore finished
15:04 eevans@deploy1001: scap-helm sessionstore cluster staging completed
15:04 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
14:57 marostegui: Failover x1 codfw from db2045 to db2069
14:48 ppchelko@deploy1001: Finished deploy [restbase/deploy@9a99b17]: Loosen etag regex for talk endpoint and fix alert (duration: 16m 07s)
14:45 marostegui: Restart MySQL on db1132 to enable performance_schema - T226952
14:43 urandom: decommissioning restbase1017-c -- T222960
14:32 ppchelko@deploy1001: Started deploy [restbase/deploy@9a99b17]: Loosen etag regex for talk endpoint and fix alert
14:21 papaul: shutting down elastic2054 for troubleshooting
14:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@8e81e98]: Release 1.0, expose talk endpoints T225733, suggestions endpoints T224754, fix summary purging T226983 (duration: 16m 11s)
14:03 eevans@deploy1001: scap-helm sessionstore finished
14:03 eevans@deploy1001: scap-helm sessionstore cluster staging completed
14:03 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
13:53 godog: reprepro --delete clearvanished on install1002 to cleanup trusty
13:52 elukey: import AMD ROCm's Debian repo key (9386B48A1A693C5C) manually on install1002 - T224723
13:51 moritzm: running "apt-get --allow-releaseinfo-update" on all buster hosts which were installed prior to the final buster release
13:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8e81e98]: Release 1.0, expose talk endpoints T225733, suggestions endpoints T224754, fix summary purging T226983
13:30 godog: bounce prometheus@k8s on prometheus1003
12:52 godog: copy mtail to buster-wikimedia - T225604
12:42 kartik@deploy1001: scap-helm cxserver finished
12:42 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
12:42 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
12:39 kartik@deploy1001: scap-helm cxserver finished
12:39 kartik@deploy1001: scap-helm cxserver cluster codfw completed
12:39 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
12:36 kartik@deploy1001: scap-helm cxserver finished
12:36 kartik@deploy1001: scap-helm cxserver cluster staging completed
12:36 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
11:47 Urbanecm: EU SWAT done
11:44 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/includes/Title.php: SWAT: Title: ensure getBaseTitle and getRootTitle return valid Titles (T225585) (duration: 00m 50s)
11:39 Urbanecm: Purged 14 logo urls for T227418
11:36 urbanecm@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: SWAT: Fix array shape for $wgCirrusSearchExtraIndexes (T227379) (duration: 00m 51s)
11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove HD logos for projects with no entry in wgLogo or add a wgLogo entry (2/2, T227418) (duration: 00m 49s)
11:30 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Remove HD logos for projects with no entry in wgLogo or add a wgLogo entry (1/2, T227418) (duration: 00m 49s)
11:26 moritzm: installing poolcounter1004/1005
11:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/AbuseFilter/: SWAT: Fix query in normalizeThrottleParameters (T209565) (duration: 00m 51s)
11:22 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: Disable Wikidata for ProofreadPage namespaces (T227201) (duration: 00m 50s)
11:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable jsonld output format for wikibase entities everywhere (T207168) (duration: 00m 49s)
11:11 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Remove "עמוד" namespace from wgFlaggedRevsNamespaces for hewikisource (T227000) (duration: 00m 49s)
11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add several Ukrainian government websites to wgCopyUploadsDomains (T227366) (duration: 00m 49s)
11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create "autopatrolled" user group on az.wiktionary (T227208) (duration: 00m 49s)
11:04 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Create "autopatrolled" user group on az.wiktionary (T227208) (duration: 00m 50s)
10:56 moritzm: installing poolcounter2003/2004
10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
09:51 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
09:51 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
09:49 ema: removed /srv/prometheus/ops/targets/varnish-upload-ats_mtail_$DC.yaml from prometheus hosts
08:27 moritzm: updated buster installer images to final release
07:43 moritzm: rebooting hassium to pick up MDS-enabled qemu
07:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:43 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:40 moritzm: rebooting weblog1001 for kernel security update
07:38 jynus: deploying sys schema to missing db production hosts
07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:00 elukey: add base::firewall to stat1004 - T170826
06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1109 after changing its binlog format (duration: 00m 49s)
06:36 marostegui: Run compare for s5 main tables on db2038 vs db2059 - T221533
06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1109 after changing its binlog format (duration: 00m 49s)
05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1094 after upgrade, slowly repool db1109 after changing its binlog format (duration: 00m 49s)
05:45 marostegui: Restart MySQL on db1109 to pick up STATEMENT as binlog format - T227062
05:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for binlog format change (duration: 00m 49s)
05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More weight to db1094 after upgrade (duration: 00m 51s)
05:31 marostegui: Compress medium wikis on labsdb1009 - T222978
05:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after upgrade (duration: 00m 49s)
05:22 marostegui: Drop empty table edit_page_tracking from some s3 wikis - T57385
05:11 marostegui: Drop empty table edit_page_tracking from s7 - T57385
05:08 marostegui: Stop MySQL on db1094 for upgrade
05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 for upgrade (duration: 00m 50s)
03:19 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (duration: 00m 53s)
01:16 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporary make account creation limits more restrictive (duration: 00m 50s)

2019-07-07

20:13 urandom: decommissioning restbase1017-b -- T222960
17:25 urandom: decommissioning restbase1017-a -- T222960
15:14 godog: power reset restbase2009

2019-07-06

07:56 thcipriani: restarting gerrit out of heap space

2019-07-05

17:18 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload (duration: 00m 39s)
17:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload
17:17 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload (duration: 00m 01s)
17:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4b7cdf5]: Deploy new versions preparing for reload
15:32 fsero: uploaded debian buster base docker image
15:30 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
15:23 fsero: restarting swift-container-sync on swift backends
15:20 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
15:15 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
15:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
15:01 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
14:51 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
14:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
14:15 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
13:44 elukey: roll restart of aqs on aqs100* to pick up new druid settings
13:33 fsero: disabling puppet on swift backends
13:26 fsero: restarting swift-container-sync on swift backends
13:05 ema: pool cp1090 w/ ATS backend T226638
12:12 ema: depool cp1090 and reimage as upload_ats T226638
11:46 ema: pool cp1088 w/ ATS backend T226638
11:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime
11:38 jijiki: Reboot ms-be1021 - T141756 - T227076
11:32 jijiki: Upgrading smartarray firmware on ms-be1021 - T141756 - T227076
11:31 moritzm: installing postgresql-9.4 updates on jessie
11:10 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
11:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
11:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
11:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
11:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
11:04 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
11:00 ema: depool cp1088 and reimage as upload_ats T226638
10:55 ema: pool cp1086 w/ ATS backend T226638
10:29 moritzm: rebooting debug proxies to pick up MDS-enabled qemu
10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:23 jiji@cumin1001: START - Cookbook sre.hosts.downtime
10:23 moritzm: rebooting seaborgium to pick up correct Stretch kernel
10:15 moritzm: rebooting serpens to pick up correct Stretch kernel
10:14 moritzm: fixed up kernel packages on serpens/seaborgium, these were dist-upgraded from jessie, but the correct kernel packages for Stretch were not setup, as such there were still stuck with an old jessie kernel
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime
10:02 jijiki: Rolling rebood rdb* hosts - T227304
10:00 moritzm: rebooting seaborgium to pick up MDS-enabled qemu
09:51 moritzm: rebooting serpens to pick up MDS-enabled qemu
09:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:39 ema: depool cp1086 and reimage as upload_ats T226638
09:31 moritzm: rebooting LDAP replicas in eqiad
09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:15 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=elastic2054.codfw.wmnet
09:01 moritzm: rebooting kraz (irc.wikimedia.org) to pick up MDS-enabled qemu
08:54 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:54 jmm@cumin1001: START - Cookbook sre.hosts.downtime
07:57 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s)
07:35 moritzm: installing imagemagick security updates on jessie
07:23 moritzm: installing wireshark security updates on jessie
07:17 marostegui: Compress small wikis on labsdb1009 T222978
07:13 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 52s)
06:46 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 with full weight (duration: 00m 49s)
06:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove old comments (duration: 00m 50s)
05:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 after upgrade (duration: 00m 49s)
05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 after upgrade (duration: 00m 49s)
05:23 marostegui: Upgrade db1104 T227062
05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 for upgrade (duration: 00m 51s)
05:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
05:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
05:09 marostegui: Stop MySQL on db1069 for decommission T227166
05:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
05:08 vgutierrez@cumin1001: START - Cookbook sre.ganeti.makevm
05:02 marostegui: Remove db1069 from tendril and zarcillo - T227166

2019-07-04

21:50 volans@deploy1001: Finished deploy [debmonitor/deploy@0ee26a3]: Deploy Debmonitor v0.1.10 (duration: 00m 48s)
21:50 volans@deploy1001: Started deploy [debmonitor/deploy@0ee26a3]: Deploy Debmonitor v0.1.10
21:35 volans: forcing reboot of elastic2054 from console, host unresponsive - T227298
17:03 AndyRussG: re-enabled banner impressions loader job
16:36 ema: pool cp1084 w/ ATS backend T226638
16:02 AndyRussG: DjangoBannerStats revision changed from 02be6cbb74 to 8965666e17
15:56 AndyRussG: temporarily disabled banner impressions loader job
15:34 ema: depool cp1084 and reimage as upload_ats T226638
15:22 ema: pool cp1082 w/ ATS backend T226638
14:51 twentyafterfour: phabricator: lowered phd.taskmasters config to 1 from 10
14:28 ema: depool cp1080 and reimage as upload_ats T226638
13:51 volans: removing python-conftool (old py2 version) from all hosts - T226965
13:40 ema: pool cp1080 w/ ATS backend T226638
13:23 volans: upgraded scap to 3.11.0-1 on A:eqiad - T227225
13:15 godog: reboot ms-be2037 after setting "os control" for power regulator mode - T225713
13:05 volans: upgraded scap to 3.11.0-1 on A:codfw - T227225
12:43 marostegui: Restore defaults replication consistency options on db2065 - T227251
12:40 volans: upgraded scap to 3.11.0-1 on deploy[12]001 - T227225
12:39 ema: depool cp1080 and reimage as upload_ats T226638
12:24 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 with low weight (duration: 00m 49s)
12:21 hoo: Started a Wikidata JSON dump run (sudo -b -u dumpsgen /usr/local/bin/dumpwikidatajson.sh) on snapshot1008 (T227207)
12:01 moritzm: upgrading buster installations to final frozen package state
11:59 jynus: stop and upgrade db1109 T227062
11:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for upgrade (duration: 00m 50s)
11:47 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for upgrade (duration: 00m 45s)
11:38 volans: upgraded scap to 3.11.0-1 on A:mw-canary - T227225
10:47 marostegui: Ease replication consistency option on db2065 to allow it to catch a bit - T227251
10:01 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
09:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:55 moritzm: rolling reboot of kubestagetcd* to pick up MDS-enabled qemu
09:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:41 moritzm: rearmed keyholder on netmon1002
09:36 moritzm: rebooting netmon1002 for kernel security update
09:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:25 volans: uploaded scap_3.11.0-1 to {jessie,stretch,buster}-wikimedia APT - T227225
09:07 moritzm: partly rearmed keyholder on deploy1001 (missing for apache2modsec)
09:00 moritzm: rebooting deploy1001 for kernel security update
08:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:41 marostegui: Repool labsdb1011 - T222978
08:29 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
08:29 vgutierrez: upgrading acme-chief to version 0.18 in acme-chief test instances - T225945
08:25 moritzm: rearmed keyholder on cumin1001
08:22 vgutierrez: uploaded acme-chief 0.18 to apt.wikimedia.org (buster) - T225945
08:22 ema: pool cp1078 w/ ATS backend T226638
08:21 moritzm: rebooting cumin1001 for kernel security update
08:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:20 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:08 marostegui: Upgrade db2044 - T226952
08:00 moritzm: rearmed keyholder on cumin2001
07:57 moritzm: rebooting cumin2001 for kernel security update
07:55 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:55 jmm@cumin1001: START - Cookbook sre.hosts.downtime
07:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1069 from config as it will be decommissioned T227166 (duration: 00m 48s)
07:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1069 from config as it will be decommissioned T227166 (duration: 00m 49s)
07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 after upgrade (duration: 00m 49s)
07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 after upgrade (duration: 00m 49s)
07:17 ema: depool cp1078 and reimage as upload_ats T226638
07:09 moritzm: rebooting restbase-dev* for kernel security updates
07:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 after upgrade (duration: 00m 48s)
06:45 moritzm: restarting archiva on archiva.wikimedia.org to pick up Java security update
06:42 elukey: update puppet compiler's facts
05:57 twentyafterfour: disabled phd on phab1003 while I clean things up. Registered the downtime in icinga
05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 after upgrade (duration: 00m 49s)
05:16 marostegui: Upgrade db1101 - T227062
05:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 for upgrade (duration: 00m 50s)
00:41 twentyafterfour: phabricator upgrade complete
00:27 twentyafterfour: Deploying Phabricator release/2019-07-03/1 from wmf/stable
00:21 cscott@deploy1001: Finished deploy [parsoid/deploy@af5fd0e]: Updating Parsoid to d355bc90 (deploy-20170703 branch, T227216) (duration: 06m 48s)
00:15 cscott@deploy1001: Started deploy [parsoid/deploy@af5fd0e]: Updating Parsoid to d355bc90 (deploy-20170703 branch, T227216)
00:03 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy PB to wikisource, wikivoyage and wiktionary projects; T218626 (duration: 00m 50s)

2019-07-03

23:26 foks: reset email for "Uwe Martens"
23:00 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/MobileFrontend/resources/dist/: T221197 schemaEditAttemptStep: only set bucket and anonymous-user-token on defaults if non-null (duration: 00m 51s)
22:59 mutante: stat1007 - jbd2/md0-8 invoked oom-killer
22:57 mutante: stat1007 - systemctl restart nagios-nrpe-server after OOM from some python process
20:58 XioNoX: add static backup routes for anycast recdns on cr1/2-codfw/eqiad - T186550
20:45 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@350e74b]: Update mobileapps to 94d0233 (T205550) (duration: 05m 11s)
20:40 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@350e74b]: Update mobileapps to 94d0233 (T205550)
20:28 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550) (duration: 01m 10s)
20:27 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550)
20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550) (duration: 01m 25s)
20:24 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@cf64319]: Update mobileapps to fdb0108 (T205550)
20:12 jeh: rebooting labmon1001 T224228
19:58 jeh: rebooting labmon1002 T224228
19:44 jeh: rebooting labpuppetmaster1001 T224228
19:22 jeh: rebooting labpuppetmaster1002 T224228
19:10 jeh: rebooting cloudelastic1004 T224228
19:02 jeh: rebooting cloudelastic1003 T224228
18:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Wikibase/data-access/src/GenericServices.php: T227207 Fix missing qualifier hashes in JSON output (duration: 00m 50s)
18:54 jeh: rebooting cloudelastic1002 T224228
18:46 jeh: rebooting cloudelastic1001 T224228
16:43 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
16:36 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
16:35 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
16:35 fsero@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
16:24 Urbanecm: Morning SWAT done
16:23 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/ReadingLists/: SWAT: Fix API continuation (T226640) (duration: 00m 49s)
16:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Enable DataBridge on Beta (T226816) (production no-op) (duration: 00m 54s)
16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
16:18 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
16:18 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
16:17 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:16 fsero: deleting zotero namespace and recreating it with helmfile on staging cluster
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
16:13 root@: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
16:10 moritzm: rearmed keyholder on netmon2001 (was rebooted earlier)
16:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Undeploy reader demographics surveys (T226273) (duration: 00m 49s)
16:07 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean expired throttle rules (duration: 00m 49s)
15:55 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2250.codfw.wmnet
15:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:28 moritzm: rolling reboot of Kubernetes etcd nodes in eqiad
15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:26 jeh: rebooting cloudweb2001-dev.codfw T224228
15:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:18 jeh: rebooting clouddb2001-dev.codfw T224228
15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:05 moritzm: rolling reboot of Kubernetes etcd nodes in codfw
15:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:04 jeh: rebooting cloudservices2002-dev.codfw T224228
15:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:55 jeh: rebooting cloudnet2003-dev.codfw T224228
14:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:47 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:47 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:39 jeh: rebooting cloudnet2002-dev.codfw T224228
14:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:13 ema: pool cp1076 w/ ATS backend T226638
14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:06 XioNoX: power off msw1-codfw - T224250
14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:54 XioNoX: remove all mentions of sampling (curently disabled) on cr2-esams to try to reduce memory usage
13:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:33 moritzm: rebooting doc1001 to pick up MDS-enabled qemu
13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:24 jynus: upgrade and restart db2097 T225378
13:08 ema: depool cp1076 and reimage as upload_ats T226638
13:07 ema: depool cp1076 and reimage as upload_ats T226637
12:55 marostegui: Drop secret and stratch_tokens columns from centralauth (s7) T226826
12:53 ema: pool cp2026 w/ ATS backend T226637
12:50 Urbanecm: foreachwiki refreshImageMetadata.php --mediatype=AUDIO --mime=audio/mid --force completed (T226784)
12:40 Urbanecm: Started foreachwiki refreshImageMetadata.php --mediatype=AUDIO --mime=audio/mid --force for T226784 on mwmaint1002 in a tmux
12:40 moritzm: rebooting mendelevium (ticket.wikimedia.org) to pick up MDS-enabled qemu
12:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:35 moritzm: rebooting dubnium/pollux (corp LDAP replicas) to pick up MDS-enabled qemu
12:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:31 moritzm: rebooting neon (kubernetes staging master) to pick up MDS-enabled qemu
12:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:24 moritzm: rebooting bromine to pick up MDS-enabled qemu
12:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:21 moritzm: rebooting pybal-test hosts to pick up MDS-enabled qemu
12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:14 ema: reimage cp2026 as upload_ats T226637
12:13 kart_: Updated cxserver to b447674 (T226611)
12:10 kartik@deploy1001: scap-helm cxserver finished
12:10 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
12:10 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
12:09 kartik@deploy1001: scap-helm cxserver finished
12:09 kartik@deploy1001: scap-helm cxserver cluster codfw completed
12:09 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
12:07 kartik@deploy1001: scap-helm cxserver finished
12:07 kartik@deploy1001: scap-helm cxserver cluster staging completed
12:07 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
11:55 reedy@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/TimedMediaHandler/: T226840 (duration: 00m 50s)
11:29 moritzm: ran puppet clean/deactivate and debdeploy removal for cp3037 (host is broken for a long time and triggering failing Cumin/debdeploy runs) T227077
11:14 Urbanecm: EU SWAT done
11:14 Urbanecm: Ran mwscript namespaceDupes.php --wiki=pawikisource --fix for T226959
11:12 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for enwiki event (T227059) (duration: 00m 48s)
11:11 urbanecm@deploy1001: Synchronized wmf-config/throttle-analyze.php: SWAT: [throttle-analyze] Grant autoconfirmed permission to user when throttle rule is applied (T204583) (duration: 00m 49s)
11:11 moritzm: rebooting people1001 (people.wikimedia.org) to pick up MDS-enabled qemu
11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configuring Namespaces at pawikisource (T226959) (duration: 00m 52s)
11:05 moritzm: rebooting krypton nodes to pick up MDS-enabled qemu
11:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
11:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:36 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wiktionary extensions/Cognate/maintenance/populateCognatePages.php (T226358)
10:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:11 moritzm: rolling reboot of eventschema service hosts to pick up MDS-enabled qemu
10:00 marostegui: Drop secret and stratch_tokens columns from the private wiki list T226826
09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:54 moritzm: rebooting netmon2001 for kernel security update
09:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:47 moritzm: rebooting debmonitor nodes to pick up MDS-enabled qemu
09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:27 moritzm: rebooting failoid nodes to pick up MDS-enabled qemu
09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:01 moritzm: rolling reboot of kubernetes masters in eqiad to pick up MDS-enabled qemu
08:44 moritzm: rolling reboot of kubernetes masters in codfw to pick up MDS-enabled qemu
08:44 moritzm: rolling reboot of kubernetes masters in codfw
08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:43 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:34 godog: reenable puppet fleetwide
07:33 marostegui: Upgrade db2079 (s8 codfw master)
07:25 marostegui: Upgrade db2100 (snapshots on that hosts are finished)
07:24 godog: temporarily disable puppet to test/apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/520012
07:23 moritzm: updated buster installer d-i image to RC3
07:10 marostegui: Drop secret and scratch_tokens from labswiki (wikitech) and labstestwiki - T226826
07:06 marostegui: Drop secret and scratch_tokens from fishbowl wiki list T226826
07:05 godog: add 150G to graphite hosts lv, was at 94% utilization
06:55 godog: depool and roll-restart swift proxy - T209182
06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1069 status (duration: 00m 28s)
06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover x1 master eqiad from db1069 to db1120 T226358 (duration: 00m 27s)
06:00 marostegui: Starting x1 failover from db1069 to db1120 - T226358
06:00 elukey: move the zookeeper puppet submodule into operations/puppet - T226466
05:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
05:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
05:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
05:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
05:03 vgutierrez: restarting pybal on lvs4006
05:02 marostegui: Start pre-failover steps for x1 - T226358
04:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
04:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
04:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
04:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
04:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
04:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
04:23 vgutierrez: rebooting primary lvs servers for MDS security updates
00:14 eileen: process-control config revision is 8e215d07f2 (renable jobs)
00:08 eileen: civicrm revision is 8a4451f390, config revision is ec8c43ee86 Redis
00:05 eileen: process-control config revision is ec8c43ee86 (Redis turned on)

2019-07-02

23:42 eileen: civicrm revision is 8a4451f390, config revision is c02a038331 (mysql locks enabled)
23:36 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Echo/: T226594 (duration: 00m 51s)
23:34 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/skins/MonoBook/: T226594 (duration: 00m 50s)
22:35 eileen: civicrm revision changed from 96985fcc4b to 8a4451f390, config revision is af9e657134
20:35 mutante: contint1001 - created new partitions on /dev/sdc and /dev/sdd; created new RAID 1 over /dev/sdc1 and /dev/sdd1
20:28 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@cc60181]: Weekly WDQS deploy (duration: 14m 43s)
20:20 mutante: contint1001 - temp installing parted for labeling new disks sdc and sdd for raid for docker images (T207707)
20:13 smalyshev@deploy1001: Started deploy [wdqs/wdqs@cc60181]: Weekly WDQS deploy
19:37 krinkle@deploy1001: Finished scap: l10n sync did not work as expected, try full scap to fix missing i18n message for 9963d843622 (duration: 18m 24s)
19:18 krinkle@deploy1001: Started scap: l10n sync did not work as expected, try full scap to fix missing i18n message for 9963d843622
19:07 krinkle@deploy1001: scap sync-l10n completed (1.34.0-wmf.11) (duration: 00m 47s)
19:05 krinkle@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/AbuseFilter/: 9963d843622b / T227095 (duration: 00m 51s)
19:03 krinkle@deploy1001: scap sync-l10n completed (1.34.0-wmf.11) (duration: 00m 48s)
19:00 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@a29da76]: Update recommendation-api to 4f50c71 (duration: 02m 50s)
18:57 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@a29da76]: Update recommendation-api to 4f50c71
18:07 XioNoX: setup tunnel between eqord and eqiad - T226158
17:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@9ca9b0f]: Update mobileapps to 941e14f (T219998 T217352 T219909) (duration: 05m 49s)
17:43 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@9ca9b0f]: Update mobileapps to 941e14f (T219998 T217352 T219909)
16:59 hashar: CI is back, I had to restart Zuul :-\ T227111
16:55 hashar: Starting Jenkins and Zuul T227111
16:53 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
16:52 hashar: Stopping Jenkins and Zuul T227111
16:32 bblack: testing failure scenarios on dns2002, possible false-alarm alerts (depooled from LVS recdns)
16:31 bblack@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
16:31 bblack: depool dns2002 from recdns server for testing
16:30 hashar: CI code-review +2 changes are not quite processed for some unknown reason T227111
16:19 XioNoX: add term allow-anycast-dns in filter labs-in4
15:55 ema: depool cp2026 and reimage as upload_ats T226637
15:47 ema: pool cp2025 w/ ATS backend T226637
15:43 XioNoX: "Equinix will be expanding the DA IX subnet from a /24 to a /23." (cf. email)
15:34 XioNoX: Add BGP to AS15830 in AMS-IX
15:26 XioNoX: add centrallog1001 to routers ACLs - T226813
15:20 Krinkle: Set repo back from active to read-only https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet/cdh (T226474))
14:58 jijiki: Run restart-php-fpm in all-mw-codfw - T223391
14:49 ema: depool cp2025 and reimage as upload_ats T226637
14:47 XioNoX: add anycast BGP statement to eqsin
14:25 jbond42: restart apache2 on phab1003
14:22 XioNoX: add DNS anycast BGP statement to cr3-ulsfo
14:18 ema: pool cp2024 w/ ATS backend T226637
14:13 otto@deploy1001: Finished deploy [eventstreams/deploy@de1d356]: Limit concurrent number of connections per X-Client-IP - T226808 (duration: 06m 17s)
14:07 otto@deploy1001: Started deploy [eventstreams/deploy@de1d356]: Limit concurrent number of connections per X-Client-IP - T226808
14:02 bblack: deploying anycast_healthchecker changes to the recdnses (puppet disabled on all, testing dns4002 first) - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/397723/
13:31 marostegui: Upgrade db2086
13:27 ema: depool cp2024 and reimage as upload_ats T226637
13:26 XioNoX: test fix policy ASXXX_in (missing `then next policy`)
13:23 marostegui: Upgrade db2085
13:13 XioNoX: push RPKI classification to eqiad - T220669
13:09 XioNoX: push RPKI classification to eqsin - T220669
13:06 ema: pool cp2022 w/ ATS backend T226637
12:51 XioNoX: push RPKI classification to AMS - T220669
12:47 marostegui: Upgrade db2082 - T227062
12:30 jijiki: Power cycle ms-be1021 - T227076
11:51 ema: depool cp2022 and reimage as upload_ats T226637
11:40 Urbanecm: EU SWAT really done
11:37 Urbanecm: Ran mwscript resetAuthenticationThrottle.php --wiki=metawiki --signup --ip 86.49.134.37 for T225555
11:37 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for cswiki workshop (T225555) (duration: 00m 49s)
11:33 Urbanecm: Reopen EU SWAT for last-time throttle rule
11:33 moritzm: re-enabled meta monitoring for icinga2001
11:26 moritzm: rebooting icinga2001 for kernel security update
11:26 jijiki: Run restart-php-fpm in all-mw-eqiad - T223391
11:25 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
11:23 moritzm: temporarily disabled meta monitoring for icinga2001
11:16 dcausse: EU Swat done
11:15 dcausse@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/CirrusSearch/includes/Updater.php: T226592: Ignore broken redirects when updating incoming link counts (duration: 00m 49s)
11:06 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable UTR30 as a lookup method for ns prefixes on group1 (duration: 00m 50s)
10:47 jijiki: Rollout Wikidiff 1.8.2 to eqiad - T223391
10:45 jijiki: Rollout Wikidiff 1.8.2 to codfw - T223391
10:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
10:21 moritzm: draining restbase1027 for eventual reboot for MDS security updates / OpenJDK security update
10:15 moritzm: draining restbase1026 for eventual reboot for MDS security updates / OpenJDK security update
10:15 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
10:05 elukey: powercycle analytics1056 (soft lockups logged in the serial console, no ssh, no net connectivity)
10:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
10:02 ema: pool cp2020 w/ ATS backend T226637
10:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
09:58 godog: restart rsyslog on wezen - T199406
09:55 moritzm: draining restbase1025 for eventual reboot for MDS security updates / OpenJDK security update
09:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
09:50 vgutierrez: rebooting secondary lvs servers for MDS security updates
09:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:46 moritzm: draining restbase1024 for eventual reboot for MDS security updates / OpenJDK security update
09:39 marostegui: Upgrade db2094 (codfw sanitarium) T227062
09:39 moritzm: draining restbase1023 for eventual reboot for MDS security updates / OpenJDK security update
09:34 marostegui: Upgrade mysql on 2080 db2081 db2083 - T227062
09:29 moritzm: draining restbase1022 for eventual reboot for MDS security updates / OpenJDK security update
09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:22 moritzm: draining restbase1021 for eventual reboot for MDS security updates / OpenJDK security update
09:11 moritzm: draining restbase1020 for eventual reboot for MDS security updates / OpenJDK security update
09:00 moritzm: draining restbase1019 for eventual reboot for MDS security updates / OpenJDK security update
08:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:55 ema: depool cp2020 and reimage as upload_ats T226637
08:52 moritzm: draining restbase1018 for eventual reboot for MDS security updates / OpenJDK security update
08:50 ema: pool cp2018 w/ ATS backend T226637
08:36 moritzm: draining restbase1017 for eventual reboot for MDS security updates / OpenJDK security update
08:20 moritzm: draining restbase1016 for eventual reboot for MDS security updates / OpenJDK security update
08:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:10 godog: restbase spare hosts, mask and stop restbase - T227054
07:58 moritzm: draining restbase2020 for eventual reboot for MDS security updates / OpenJDK security update
07:55 ema: depool cp2018 and reimage as upload_ats T226637
07:48 moritzm: draining restbase2019 for eventual reboot for MDS security updates / OpenJDK security update
07:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1092 (duration: 00m 49s)
07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1092 (duration: 00m 49s)
05:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1092 into API (duration: 00m 49s)
05:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1092 (duration: 00m 48s)
05:23 marostegui: Upgrade MySQL and kernel on db1092
05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 54s)
01:39 milimetric@deploy1001: Finished deploy [analytics/refinery@b8a496b]: fix private sqoop (duration: 17m 36s)
01:21 milimetric@deploy1001: Started deploy [analytics/refinery@b8a496b]: fix private sqoop

2019-07-01

20:33 milimetric@deploy1001: Finished deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list (duration: 01m 33s)
20:32 milimetric@deploy1001: Started deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list
20:32 milimetric@deploy1001: Finished deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list (duration: 16m 59s)
20:15 milimetric@deploy1001: Started deploy [analytics/refinery@4e9894c]: minor, just removing hiwikisource from sqoop list
19:31 tzatziki: removing nine files for legal compliance
19:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Homepage for 50% of new users on viwiki (duration: 00m 49s)
18:53 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/ContentTranslation: SWAT: Require only one user group to allow publishing to main namespace (T225398) (duration: 00m 49s)
18:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SSWAT: Dont show cannot publish error to sysop users (T225398) (duration: 00m 49s)
18:46 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/CentralAuth: SWAT: Require only one user group to allow publishing to main namespace (T225398) (duration: 00m 51s)
18:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
18:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Homepage on viwiki (T218237) (duration: 00m 49s)
18:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EditorJourney on arwiki (T225737) (duration: 00m 49s)
17:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
16:01 ema: pool cp2017 w/ ATS backend T226637
15:42 moritzm: draining restbase2018 for eventual reboot for MDS kernel updates
15:37 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
15:36 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
15:31 moritzm: draining restbase2017 for eventual reboot for MDS kernel updates
15:27 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
15:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
15:19 moritzm: draining restbase2016 for eventual reboot for MDS kernel updates
15:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:11 moritzm: draining restbase2015 for eventual reboot for MDS kernel updates
14:54 moritzm: draining restbase2014 for eventual reboot for MDS kernel updates
14:50 moritzm: installing openjdk-8 security updates on stretch-based restbase hosts
14:45 ejegg: updated payments-wiki from 86381aeeff to 5f974d2386
14:44 moritzm: draining restbase2013 for eventual reboot for MDS kernel updates
14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:25 ema: depool cp2017 and reimage as upload_ats T226637
14:24 moritzm: draining restbase2012 for eventual reboot for MDS kernel updates
14:10 moritzm: rolling reboot of docker registry nodes to pick up MDS-enabled qemu
14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:02 moritzm: draining restbase2011 for eventual reboot for MDS kernel updates
13:54 moritzm: draining restbase2010 for eventual reboot for MDS kernel updates
13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:43 ottomata: modified dt format of webrequest logs to use 'Z' suffix for timezone offset - T217040
13:42 jbond42: rolling update of expat
13:41 fsero: uploading helmfile to jessie as well
13:38 moritzm: draining restbase2009 for eventual reboot for MDS kernel updates
13:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:20 akosiaris: repool eqiad after kubernetes upgrades. T226256
13:20 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
12:51 akosiaris: depool eqiad for kubernetes upgrades. T226256
12:51 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad
12:49 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
12:49 akosiaris: repool codfw after kubernetes upgrades. T226256
12:01 akosiaris: depool codfw for kubernetes upgrades. T226256
12:01 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=codfw
11:36 Urbanecm: EU SWAT done
11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up wgNamespaceAliases (T226765) (duration: 00m 49s)
11:27 apergos: urbanecm@deploy1001 Synchronized php-1.34.0-wmf.11/includes/: SWAT: Join slot and content tables when dumping XML (T220493) (duration: 01m 14s)
11:12 jbond42: rolling upgrade of facter3
11:12 jbond42: upload facter_3.11.0-2~debu9u2+wmf1 to stretch-wikimedia component/facter3
11:10 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: Add abusefilter-view-private to checkusers on arwiki (T226899) (duration: 00m 49s)
11:06 urbanecm@deploy1001: Synchronized dblists/: Close wikimania2018.wikimedia.org (T201188) (duration: 00m 49s)
10:04 elukey: remove burrow-analytics.service from kafkamon1001 (the analytics cluster has been decommed)
09:55 elukey: reboot kafkamon1001 with 4g of dedicated ram (was 8g) - T224988
09:54 elukey: reboot kafkamon2001 with 4g of dedicated ram (was 8g) - T224988
09:54 godog: swift eqiad-prod eqiad-prod: put back ms-be1033 - T223518
09:33 _joe_: removing python-conftool from all hosts where it's still installed
09:16 _joe_: update python3-etcd, python3-conftool to their latest versions T226965
09:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Noop: Do not load InitialiseSettings-labs.php multiple times (T224899) (duration: 00m 51s)
08:39 elukey: restart hadoop-yarn-nodemanager on all hadoop workers to pick up new jvm settings - T225296
07:04 ema: pool cp2014 w/ ATS backend T226637
06:16 ema: depool cp2014 and reimage as upload_ats T226637
04:53 marostegui: Keep compressing tables on labsdb1011 - T222978
04:50 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to depool labsdb1011 - T222978
04:49 marostegui: Change pt-kill value on labsdb1009 temporarily, from 300 to 14400 T222978

2019-06-30

23:27 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 45s)
07:05 Urbanecm: Remove 2FA from User:SQL (T226918)

2019-06-28

21:19 otto@deploy1001: Finished deploy [eventstreams/deploy@2af2719]: Manually blacklisting IP - T226808 (duration: 03m 07s)
21:16 otto@deploy1001: Started deploy [eventstreams/deploy@2af2719]: Manually blacklisting IP - T226808
20:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Wikibase/repo/RepoHooks.php: Make it possible for File pages to be moved on Commons again T224303 T226672 (duration: 00m 50s)
19:49 jforrester@deploy1001: Synchronized wmf-config/mobile.php: T221196 VE mobile A/B test part 2 (duration: 00m 49s)
19:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221196 VE mobile A/B test part 1 (duration: 00m 50s)
19:05 joal@deploy1001: Finished deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy (duration: 02m 08s)
19:03 joal@deploy1001: Started deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy
18:51 joal@deploy1001: Finished deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy (duration: 17m 47s)
18:33 joal@deploy1001: Started deploy [analytics/refinery@de8eb99]: Missing bit of regular analytics deploy
18:14 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1004 only (duration: 01m 03s)
18:13 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1004 only
18:12 elukey: systemctl reset-failed kafka* units on kafka2001 (in decom phase)
18:12 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only again (duration: 00m 26s)
18:11 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only again
18:09 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only (duration: 00m 05s)
18:09 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only
18:08 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only (duration: 00m 04s)
18:08 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Late regular analytics weekly deploy - notebook1003 only
18:06 joal@deploy1001: Finished deploy [analytics/refinery@8d6fa30]: Laste regular analytics weekly deploy (duration: 53m 35s)
17:53 cdanis: increasing nginx proxy_buffer_size / proxy_buffers 02d7bcaa
17:36 ottomata: restarting eventstreams on scb1001 with trace logging of X-Client-IP for T226808
17:13 joal@deploy1001: Started deploy [analytics/refinery@8d6fa30]: Laste regular analytics weekly deploy
16:35 bblack: Raising varnish max_http_hdr (max allowed applayer response header count) from 64->128 in systemd config and live tuning - https://gerrit.wikimedia.org/r/519661 - T226840
15:04 eevans@deploy1001: scap-helm sessionstore finished
15:04 eevans@deploy1001: scap-helm sessionstore cluster codfw completed
15:04 eevans@deploy1001: scap-helm sessionstore upgrade production -f sessionstore-codfw-values.yaml stable/kask [namespace: sessionstore, clusters: codfw]
15:02 eevans@deploy1001: scap-helm sessionstore finished
15:02 eevans@deploy1001: scap-helm sessionstore cluster eqiad completed
15:02 eevans@deploy1001: scap-helm sessionstore upgrade production -f sessionstore-eqiad-values.yaml stable/kask [namespace: sessionstore, clusters: eqiad]
14:48 ema: pool cp2011 w/ ATS backend T226637
14:47 XioNoX: upload kafkatee to buster-wikimedia
14:11 eevans@deploy1001: scap-helm sessionstore finished
14:11 eevans@deploy1001: scap-helm sessionstore cluster staging completed
14:11 eevans@deploy1001: scap-helm sessionstore upgrade staging -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
14:07 eevans@deploy1001: scap-helm sessionstore upgrade production -f sessionstore-staging-values.yaml stable/kask [namespace: sessionstore, clusters: staging]
14:06 ema: depool cp2011 and reimage as upload_ats T226637
11:36 elukey: roll restart eventstreams on all scb1* nodes
11:33 elukey: restart eventstreams on scb1001
11:18 fsero: draining kubernetes1006 for applying updates
11:14 fsero: draining kubernetes1005 for applying updates
11:13 fsero: draining kubernetes2006 for applying updates
11:09 fsero: draining kubernetes2005 for applying updates
11:04 _joe_: uploading php-wmerrors to thirdparty/php72 - T187147
10:31 Reedy: running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --audio --mime=audio/midi --missing --throttle` on mwmaint1002 in screen T226713
10:20 reedy@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/TimedMediaHandler/maintenance/requeueTranscodes.php: Extra filtering option (duration: 00m 51s)
10:09 ema: pool cp2008 w/ ATS backend T226637
09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:17 ema: depool cp2008 and reimage as upload_ats T226637
09:16 elukey: systemctl reset-failed kafka* units on kafka2002 (role spare, failed units, already masked)
09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:10 moritzm: rebooting releases* hosts for MDS-enabled qemu/kernel
09:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:43 elukey: roll restart of eventstreams on all scb2* nodes, service now working (kafka transport failures logged)
08:02 moritzm: updating openssl packages on mw1265
07:57 ema: pool cp2005 w/ ATS backend T226637
07:11 _joe_: upgrading php-wikidiff2 on the mw canaries, only on php7 - T223391
07:05 ema: depool cp2005 and reimage as upload_ats T226637
01:22 Krinkle: Killing arclamp-log on webperf1002, no flame graphs for three days, presumably mwlog/redis connection dropped again. T215740

2019-06-27

23:28 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/TimedMediaHandler/: T226748 (duration: 00m 50s)
23:26 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/GrowthExperiments/includes/HomepageHooks.php: Fix JS error on Special:Homepage (duration: 00m 50s)
23:25 brion: roan is fixing deploy of T226748 which failed to include the patch (whoops)
21:58 cdanis: cdanis@cp1075.eqiad.wmnet ~ % sudo -i varnish-backend-restart
21:44 brion: deploying fix for TMH jobqueue bug T226748
20:31 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/MobileFrontend/resources/dist: T221191: Log editor switches to visualeditorfeatureuse (duration: 00m 50s)
{{safesubst:SAL entry|1=20:18 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Wikibase: [[gerrit:519492|Avoid inserting a new addUsage job when the current usage stays untouched (duration: 01m 14s)}}
19:23 Urbanecm: run namespaceDupes.php for wikis in P8674 (T173070)
19:23 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.11 refs T220736
19:16 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855 (duration: 11m 21s)
19:04 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855
18:52 Urbanecm: Morning SWAT done for real
18:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Tidy up GroupOverrides (T173070) (duration: 00m 56s)
18:50 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: gerrit:Tidy up GroupOverrides, part 1 (T173070) (duration: 00m 57s)
18:48 Urbanecm: foreachwiki namespaceDupes.php --fix done (T173070)
18:46 Urbanecm: Reopen Morning SWAT
18:33 legoktm: gerrit set-account --active '"Dzahn"'
18:33 Urbanecm: Morning SWAT done, namespaceDupes.php still running for T173070
18:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Tidy up groupOverrides (T185898) (duration: 00m 56s)
18:22 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: Remove several wikis from commonsuploads.dblist (T185898) (duration: 00m 57s)
18:20 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: Restrict uploading on wikimaniawiki (T225505) (duration: 00m 56s)
18:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Restrict uploading on wikimaniawiki, Add + in front of wikimaniawiki in GroupOverrides (T225505) (duration: 00m 57s)
18:13 herron: kafka2001 -> kafka-main2001 migration complete. re-enabling alerting on kafka-main2001, and moving kafka2001 to role::spare::system T225005
18:08 Urbanecm: running namespaceDupes.php across all wikis in tmux on mwmaint1002 (T173070)
18:06 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016 (duration: 01m 41s)
18:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Revert "Set default aliases for Project_talk namespace"" (T173070) (duration: 00m 57s)
18:05 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016
18:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:04 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:03 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016 (duration: 00m 08s)
18:03 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, restbase1016
18:01 ppchelko@deploy1001: Finished deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, rb2009 only, fixed mathoid config (duration: 02m 19s)
17:59 ppchelko@deploy1001: Started deploy [restbase/deploy@ff6f302]: Use new projects and new config layout T220855, rb2009 only, fixed mathoid config
17:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/WikimediaIncubator/includes/WikimediaIncubator.php: T204883 / 93643b44a52ea7 (duration: 01m 00s)
17:26 ppchelko@deploy1001: Finished deploy [restbase/deploy@da50001]: Use new projects and new config layout T220855, rb2009 only (duration: 02m 38s)
17:23 ppchelko@deploy1001: Started deploy [restbase/deploy@da50001]: Use new projects and new config layout T220855, rb2009 only
17:21 arturo: imported gpg keys 9DC858229FC7DD38854AE2D88D81803C0EBFCD88 and 54A647F9048D5688D7DA2ABE6A030B21BA07F4FB into install1002 for T215975
17:14 ejegg: updated fundraising tools from da82ed111d to 3089c0ec76
16:42 jynus: repool labsdb1011 T222978
16:39 ema: pool cp2002 w/ ATS backend T226637
14:43 herron: beginning replacement of kafka2001 with kafka-main2001 T225005
14:33 akosiaris: push newer calico outgoing policy rules. T225005
14:28 XioNoX: push RPKI classification to Dallas - T220669
14:23 Reedy: running `mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=commonswiki --audio --missing --throttle` in screen as me on mwmaint1002 T226713
14:13 XioNoX: push RPKI classification test to eqord - T220669
14:11 ema: depool cp2002 and reimage as upload_ats T226637
13:43 XioNoX: push RPKI classification test to cr3-ulsfo - T220669
13:26 XioNoX: push RPKI classification test to cr4-ulsfo - T220669
13:15 elukey: start druid drop datasource test - might affect AQS - T226035
13:11 godog: depool restbase10(0[7-9]|1[0-5]) before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/513262
12:01 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052)
11:23 Amir1: EU SWAT is done
11:21 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Portal Namespace to VisualEditor option on kowiki (T224813) (duration: 00m 57s)
10:48 jijiki: Rolling restart ms-fe* proxy services for T226373 and T211661
10:48 moritzm: updated buster d-i image to release candidate 2
10:40 _joe_: progressively restarting pybal in codfw, eqiad to pick up the change in monitoring for wdqs
10:39 volans: restarted stashbot on toolforge was not !log-ing since 01:11 UTC this morning
01:11 bblack: depool eqiad front edge

2019-06-26

23:39 catrope@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/Echo/modules/nojs/mw.echo.badge.monobook.less: Fix horizontal scrollbars in Monobook (T226594) (duration: 00m 55s)
23:38 catrope@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Echo/modules/nojs/mw.echo.badge.monobook.less: Fix horizontal scrollbars in Monobook (T226594) (duration: 00m 57s)
21:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't set wgSentryEventGateUri in prod CS (duration: 00m 55s)
21:35 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Explicitly set wgSentryEventGateUri to false in prod IS (duration: 00m 56s)
21:22 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable other statements on test commons (duration: 00m 58s)
20:58 marktraceur: added cparle to wmf-deployment group on Gerrit (already has deploy access)
20:56 cscott@deploy1001: Finished deploy [parsoid/deploy@3d20703]: Updating Parsoid to 31d356a5 (ensure proper source texts when parsing) (duration: 20m 55s)
20:52 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@85fc707]: Update mobileapps to 4f9b376 (duration: 02m 08s)
20:50 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@85fc707]: Update mobileapps to 4f9b376
20:48 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@41a86f8]: Merge "Update prod config template to pass thru accept-language to the MW API" (duration: 03m 17s)
20:44 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@41a86f8]: Merge "Update prod config template to pass thru accept-language to the MW API"
20:37 bsitzmann@deploy1001: deploy aborted: Merge "Update prod config template to pass thru accept-language to the MW API" (duration: 02m 15s)
20:35 cscott@deploy1001: Started deploy [parsoid/deploy@3d20703]: Updating Parsoid to 31d356a5 (ensure proper source texts when parsing)
20:35 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@41a86f8]: Merge "Update prod config template to pass thru accept-language to the MW API"
19:42 shdubsh: file-read-backwards v2.0.0 deployed to apt repo
19:08 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.11 refs T220736 (duration: 00m 56s)
19:06 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.11 refs T220736
17:52 herron: finished migration of kafka2002 to kafka-main2002 — enabling alert notifications for kafka-main2002, and leaving kafka2002 disabled T225005
16:42 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix $wgSentryEventGateUri (T217142) (duration: 09m 52s)
16:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Reverting change scap had problems with (duration: 00m 55s)
16:25 urbanecm@deploy1001: scap failed: average error rate on 11/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
16:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change name of Serbian Wikinews in InitialiseSettings.php (part 2) (T226315) (duration: 00m 55s)
16:20 Urbanecm: Purged srwikinews.png, srwikinews-1.5x.png, srwikinews-2x.png (T226315)
16:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Change name of Serbian Wikinews (part 1) (T226315) (duration: 00m 56s)
16:15 jijiki: Pooling restbase1007 back
16:14 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable sending JS errors to EventGate (T217142) (duration: 00m 55s)
16:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable mobile homepage for cswiki and kowiki (T225676) (duration: 00m 56s)
16:08 ppchelko@deploy1001: Finished deploy [restbase/deploy@a915f69]: Really revert (duration: 01m 35s)
16:06 ppchelko@deploy1001: Started deploy [restbase/deploy@a915f69]: Really revert
16:04 ema: pool cp5006 w/ ATS backend T226477
15:56 jijiki: Depooling restbase1007
15:54 ppchelko@deploy1001: Finished deploy [restbase/deploy@574a678]: Revert (duration: 03m 47s)
15:51 ppchelko@deploy1001: Started deploy [restbase/deploy@574a678]: Revert
15:50 ppchelko@deploy1001: deploy aborted: Use new projects and new config layout T220855, canaries only (duration: 03m 31s)
15:46 ppchelko@deploy1001: Started deploy [restbase/deploy@995bc9d]: Use new projects and new config layout T220855, canaries only
15:04 ema: depool cp5006 and reimage as upload_ats T226477
15:01 ema: pool cp3043 as cache_text
14:16 herron: beginning replacement of kafka2002 with kafka-main2002 T225005
14:12 ema: depool cp3043 and convert it from upload to text
14:01 moritzm: rebooting graphite1004 for kernel security update
13:55 moritzm: rebooting puppetboard* to pick up MDS-enabled qemu and new kernel
13:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:48 XioNoX: push RPKI classification test to cr4-ulsfo - T220669
13:32 ema: pool cp5005 w/ ATS backend T226477
13:31 moritzm: rebooting graphite2003 for kernel security update
13:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:17 Lucas_WMDE: end (success) lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintStatements.php wikidatawiki # T223372
13:16 Lucas_WMDE: begin lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintStatements.php wikidatawiki # T223372
12:27 Amir1: EU SWAT is done for real
12:27 Amir1: end of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=3 (T225052)
12:25 ema: depool cp5005 and reimage as upload_ats T226477
12:07 Amir1: start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=3
12:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntityUsageTable addUsage batch size to 100 (T225500) (duration: 00m 56s)
12:02 dcausse: Revert: EU swat done
12:02 dcausse: EU swat done
12:01 dcausse@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/CirrusSearch/includes/RequestLogger.php: T226568: Convert array params to string when logging requests (duration: 00m 56s)
11:55 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Enable UTR30 as a lookup method for ns prefixes on group0 (duration: 00m 56s)
11:47 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] remove unused wgCirrusSearchRequestEventSampling (duration: 00m 54s)
11:40 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T226273: Enable reader demographics surveys (duration: 00m 55s)
11:17 urbanecm@deploy1001: sync-file aborted: Reverting gerrit:519167 (T226273) (duration: 00m 32s)
11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch property terms migration to WRITE_BOTH on wikidata production (T225051) (duration: 00m 56s)
11:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow bureaucrats to remove sysop on nycwikimedia (T226591) (duration: 00m 57s)
10:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:21 ema: pool cp5004 w/ ATS backend T226477
09:49 _joe_: restarted php7.2-fpm on mwdebug1002, testing php-check-and-restart script
09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1068 from config T217396 (duration: 00m 55s)
09:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1068 from config T217396 (duration: 01m 11s)
09:18 ema: depool cp5004 and reimage as upload_ats T226477
09:04 elukey: reboot druid100[4-6] for kernel and openjdk upgrades
09:00 kart_: Updated cxserver to 9bad239 (T226482)
08:58 kartik@deploy1001: scap-helm cxserver finished
08:58 kartik@deploy1001: scap-helm cxserver cluster codfw completed
08:58 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
08:56 kartik@deploy1001: scap-helm cxserver finished
08:56 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
08:56 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
08:52 kartik@deploy1001: scap-helm cxserver finished
08:52 kartik@deploy1001: scap-helm cxserver cluster staging completed
08:52 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
08:43 moritzm: rebooting deployment-mediawiki-07 for new kernel
08:30 ema: pool cp5003 w/ ATS backend T226477
07:50 godog: bounce rsyslog on lithium - T199406
07:30 godog: powercycle ms-be2032 - T226600
07:19 ema: depool cp5003 and reimage as upload_ats T226477
07:09 elukey: reboot of druid100[1-3] hosts for kernel + openjdk upgrades
05:59 elukey: systemctl mask + reset-failed kafka on kafka10[12-23] - T226517
05:57 marostegui: wikimedia_editor_tasks_entity_description_exists from s8:testwikidatawiki T226326
05:46 marostegui: wikimedia_editor_tasks_entity_description_exists from s3:testwikidatawiki T226326
05:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add db1133 into m5 depooled T222682 (duration: 00m 55s)
05:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add db1133 into m5 depooled T222682 (duration: 00m 55s)

2019-06-25

22:50 jforrester@deploy1001: Synchronized php-1.34.0-wmf.11/extensions/Echo/modules/nojs/: T226503 Fix badge icons in Monobook (duration: 00m 56s)
22:48 jforrester@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/Echo/modules/nojs/: T226503 Fix badge icons in Monobook (duration: 00m 57s)
21:30 jgleeson: updating civicrm from 5c02e62d6e to 98fd34417d
21:30 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.11
21:23 thcipriani: gerrit back on 2.15.13
21:19 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on cobalt (restart incoming) (duration: 00m 11s)
21:19 thcipriani@deploy1001: Started deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on cobalt (restart incoming)
21:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on gerrit2001 (duration: 00m 10s)
21:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@7b379a6]: revert Gerrit to 2.15.13 on gerrit2001
21:03 hashar: contint1001: running puppet to clear a puppet alarm (due to Gerrit restart)
20:44 moritzm: rebooting ununpentium for kernel security update
20:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
20:35 thcipriani: gerrit back
20:33 thcipriani: restarting gerrit due to T224448
20:28 moritzm: rebooting vega for kernel security update
20:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
20:24 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.11 (duration: 41m 35s)
20:10 moritzm: rebooting webperf hosts for kernel security update
20:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
20:00 moritzm: rebooting torrelay1001 for kernel security update
20:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
19:43 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.11
19:16 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.10 refs T220735
19:08 twentyafterfour: deploying MediaWiki 1.34.0-wmf.10 to all wikis
19:07 twentyafterfour: looks like we are unblocked for wmf.10, deploying that first
18:45 longma: cutting the branch f or 1.34.0-wmf.11 T220736
17:51 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148 (duration: 01m 37s)
17:49 jiji@deploy1001: Started deploy [cpjobqueue/deploy@eb8f692]: Migrating RecentChangesUpdate to PHP7 - T219148
17:24 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@ca96238]: undo: modify agents for T226471 (duration: 16m 14s)
17:08 smalyshev@deploy1001: Started deploy [wdqs/wdqs@ca96238]: undo: modify agents for T226471
17:01 herron: finished migration of kafka2003 to kafka-main2003 — enabling alert notifications for kafka-main2003, and leaving kafka2003 disabled T225005
16:45 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some dupe config (duration: 00m 55s)
16:25 jynus: upgrade and restart db1114 (test-s1)
15:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/maintenance/: T226448 / 40e725b6502cd6 (duration: 01m 15s)
15:56 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/includes/: T226448 / 40e725b6502cd6 (duration: 01m 20s)
15:28 jforrester@deploy1001: Synchronized php-1.34.0-wmf.10/skins/MonoBook/includes/SkinMonoBook.php: T226503 Fix Notifications RL module dependency (duration: 00m 57s)
15:13 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-properties-change stream to eventgate-main - T211248 (duration: 00m 58s)
14:43 herron: beginning replacement of kafka2003 with kafka-main2003 T225005
14:26 ottomata: shutting down Kafka on old analytics brokers - T183303
14:21 andrewbogott: rebooting cloudvirt1014, 1018, 1024
14:02 ema: pool cp5002 w/ ATS backend T226477
13:54 onimisionipe: changing replication factor of v4 keyspace for maps codfw cluster - T226161
12:49 marostegui: Stop MySQL on db1117:m5 (checked dumps, they are done) to clone db1133 - T222682
12:48 godog: swift eqiad-prod: put back ms-be1033 - T223518
12:46 ema: depool cp5002 and reimage as upload_ats T226477
12:34 jijiki: Upgrade scap to eqiad - T224915
12:32 jijiki: Upgrade scap to codfw - T224915
12:27 akosiaris: fully depool kubernetes2001 T226237
12:26 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=kubernetes2001.*
12:24 jijiki: Upgrade to scap 3.10.0-1 on mw-api-canary as well - T224915
12:22 jijiki: Upgrade to scap 3.10.0-1 on mw* codfw
10:22 ema: pool cp5001 w/ ATS backend T226477
09:30 jijiki: enable puppet on dbproxy*
09:24 _joe_: restarting gerrit on cobalt
09:09 ema: depool cp5001 and reimage as upload_ats T226477
09:08 jijiki: Rolling haproxy restarts on thumbor* - T225284
09:02 jijiki: Disable puppet on dbproxy* - T225284
08:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Change parsercache key everywhere after deploying it in small batches for a few hours T210725 (duration: 00m 57s)
08:50 jijiki: Disable puppet on thumbor* - T225284
08:30 marostegui: Change parsercachekey on 20 more hosts
08:22 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@bd3df8c]: modify agents for T226471 (duration: 11m 02s)
08:18 marostegui: Change parsercachekey on 10 more hosts
08:11 smalyshev@deploy1001: Started deploy [wdqs/wdqs@bd3df8c]: modify agents for T226471
08:08 marostegui: Change parsercachekey on 10 more hosts
07:58 marostegui: Change parsercachekey on 20 more hosts
07:49 marostegui: Change parsercachekey on 20 more hosts
07:44 marostegui: Change parsercachekey on 20 more hosts
07:35 marostegui: Change parsercachekey on 20 more hosts
07:21 marostegui: Change parsercachekey on 10 more hosts
07:09 SMalyshev: depooled wdqs1004 due to lag
07:09 marostegui: Change parsercachekey on 10 more hosts
06:51 marostegui: Change parsercachekey on 20 more hosts
05:52 marostegui: Change parsercachekey on 10 more hosts
05:43 marostegui: Change parsercachekey on 10 more hosts
05:33 marostegui: Change parsercachekey on 20 more hosts
05:24 marostegui: Change parsercachekey on 20 more hosts
05:12 marostegui: Change parsercache key on 20 more hosts
05:01 marostegui: Change parsercache key on the canaries T210725
05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change parsercache key T210725 (duration: 00m 58s)
04:46 kart_: Updated cxserver to use nodejs10 (T226074)
04:44 kartik@deploy1001: scap-helm cxserver finished
04:44 kartik@deploy1001: scap-helm cxserver cluster codfw completed
04:44 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
04:37 kartik@deploy1001: scap-helm cxserver finished
04:37 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
04:37 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
04:31 kartik@deploy1001: scap-helm cxserver finished
04:31 kartik@deploy1001: scap-helm cxserver cluster staging completed
04:30 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
00:10 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: poke at autopromote config (duration: 00m 54s)

2019-06-24

23:51 twentyafterfour@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/Wikibase/: Sync https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/518782/ refs T220735 (duration: 01m 21s)
23:42 reedy@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/AdvancedSearch/: (no justification provided) (duration: 00m 56s)
22:56 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/ProofreadPage/includes/Special/SpecialProofreadPages.php: ed556868f / T225813 (duration: 00m 53s)
22:53 Krinkle: krinkle@deploy1001: There is an untracked "wmf-config/event-schemas/" directory in the /srv/mediawiki deployment source, ref T226436
21:42 thcipriani: gerrit back
21:40 thcipriani: restart gerrit for https://gerrit.wikimedia.org/r/518811/
21:39 ppchelko@deploy1001: Finished deploy [changeprop/deploy@17e71b5]: Support .meta.stream as well as .meta.topic T226198 (duration: 01m 42s)
21:37 ppchelko@deploy1001: Started deploy [changeprop/deploy@17e71b5]: Support .meta.stream as well as .meta.topic T226198
21:32 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deployed r/518350 - Revert "Temporary make account creation limits more restrictive" (duration: 00m 56s)
21:23 mobrovac@deploy1001: Finished deploy [restbase/deploy@a915f69]: Add /page/media-lint - T226105 - and various other cleanups (duration: 19m 08s)
21:04 mobrovac@deploy1001: Started deploy [restbase/deploy@a915f69]: Add /page/media-lint - T226105 - and various other cleanups
20:42 XenoRyet: updated payments-wiki from 79d1822644 to a19e5ae077
20:13 andrewbogott: rebooting cloudvirt1024
19:57 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Update more fr config (duration: 00m 55s)
19:56 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: rm old comments move more FR config (duration: 00m 52s)
19:50 thcipriani: gerrit back
19:48 thcipriani: restarting gerrit for 2.15.14 update
19:47 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 on cobalt (restart incoming) (duration: 00m 12s)
19:47 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 on cobalt (restart incoming)
19:46 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 (gerrit2001 only) (duration: 00m 11s)
19:46 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3695fd]: Gerrit to 2.15.14 (gerrit2001 only)
19:43 otto@deploy1001: Synchronized .gitmodules: Remove the event-schemas submodule - .gitmodules - T226436 (duration: 00m 55s)
19:41 otto@deploy1001: Synchronized wmf-config: Remove the event-schemas submodule - wmf-config - T226436 (duration: 00m 55s)
19:32 elukey: restart yarn/hdfs on analytics1072 to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518767/ (broken disk)
19:32 otto@deploy1001: Synchronized wmf-config: Remove remaining monolog kafka and avro related configs - wmf-config - T226436 (duration: 00m 55s)
19:30 otto@deploy1001: Synchronized tests/TestServices.php: Remove remaining monolog kafka and avro related configs - tests - T226436 (duration: 00m 56s)
19:16 otto@deploy1001: Synchronized wmf-config: Remove usages of monolog kafka handler and avro formatter - wmf-config - T226436 (duration: 00m 56s)
19:14 otto@deploy1001: Synchronized tests/loggingTest.php: Remove usages of monolog kafka handler and avro formatter - tests - T226436 (duration: 00m 55s)
19:13 otto@deploy1001: sync-file aborted: Remove usages of monolog kafka handler and avro formatter - tests - T226436 (duration: 00m 06s)
18:58 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgFlaggedRevsAutoReview to a boolean (duration: 00m 55s)
18:50 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some now redundant config (duration: 00m 55s)
18:48 andrewbogott: rebooting cloudvirt1018
18:46 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move some basic FR config into IS (duration: 00m 55s)
18:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable CirrusSearchRequestSet avro monolog channel - T222268 (duration: 00m 55s)
18:27 otto@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add hualab.nl to $wgCopyUploadsDomains (T225917) (duration: 00m 55s)
18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add "mass-upload" to autopatrollers and patrollers on commons (T226217) (duration: 00m 55s)
18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix wgMetaNamespaceTalk for aswikisource (T226027) (duration: 00m 55s)
18:02 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@157f40c]: weekly WDQS deploy (duration: 18m 11s)
17:44 smalyshev@deploy1001: Started deploy [wdqs/wdqs@157f40c]: weekly WDQS deploy
17:26 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Cleanup (duration: 00m 55s)
17:10 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: T226410 (duration: 00m 54s)
16:25 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some duplicated config (duration: 00m 55s)
16:13 mobrovac@deploy1001: Synchronized rpc/RunSingleJob.php: RunSingleJob: check that only the database param is set and leave the rest to JobExecutor - T226109 (duration: 00m 55s)
16:08 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: comments (duration: 00m 56s)
15:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:48 XioNoX: remove cwdent from all network devices - T226405
15:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:28 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Simple config outside callback (duration: 00m 56s)
15:17 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove some unnecessary copy pasted code (duration: 00m 55s)
15:05 gehel: re-enabling wdqs updater on wdqs-public / eqiad
14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:01 ema: cp3032: upgrade varnish to 5.1.3-1wm11 T226375
13:51 jbond42: rolling restart of the conf servers starting in 10 minutes please let me know if you forsee any issue
13:51 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: T225144 T225276 T225414 T225776 T225797 T226054 (duration: 00m 56s)
13:26 moritzm: re-enabling TCP SACKs on cp4024-4029 (half of Varnish/text and Varnish/upload in ulsfo) T225998
13:25 jbond42: update libviry on cloudvirt* stretch servers
13:19 moritzm: re-enabling TCP SACKs on cp3040-cp3047, cp3049 (half of Varnish/text and Varnish/upload in esams) T225998
13:10 moritzm: re-enabling TCP SACKs on cp2001,2002,2004-2008,2010,2011, 2014, 2017 (half of Varnish/text and Varnish/upload in codfw) T225998
13:04 moritzm: re-enabling TCP SACKs on cp1075-1082 (half of Varnish/text and Varnish/upload in eqiad) T225998
13:00 gehel: shutdown wdqs updater on wdqs/public/eqiad
12:49 gehel: restarting blazegraph on wdqs1004 (JVM thread out of control)
11:31 Lucas_WMDE: EU SWAT done
11:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Labs: enable QuickSurveys on hewiki (T225819) (duration: 00m 57s)
10:36 moritzm: re-enabling TCP SACKs on cp5007-cp5009 (half of Varnish/text in eqsin) T225998
10:28 moritzm: re-enabling TCP SACKs on cp5001-cp5003 (half of Varnish/upload in eqsin) T225998
09:23 elukey: reboot of kafka-jumbo100[1-6] for kernel + openjdk upgrades
08:56 elukey: re-enable eventloggign mysql consumers after maintenance on eventlog1002
08:52 marostegui: Upgrade Mysql on db1140 (checked that all snapshots backups are done) - T226358
08:42 elukey: reboot an-master100[1,2] for kernel + openjdk upgrades
08:38 jynus: upgrade, stop and restart db1108
08:34 jynus: reloading haproxy on dbproxy1004/9
08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 after upgrade T226358 (duration: 00m 56s)
08:14 jynus: upgrade, stop and restart db1107
08:09 marostegui: Stop MySQL on db1120 for upgrade - T226358
08:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 for upgrade T226358 (duration: 00m 56s)
07:51 elukey: stop mysql consumer on eventlog1002 (so traffic to db1107 will be stopped, to allow maintenance to happen)
07:06 moritzm: installing vim update for stretch
06:31 _joe_: publishing docker-registry.wikimedia.org/nodejs10-slim:0.0.2, T226346
06:16 elukey: powercycle analytics1060 (stuck, no ssh, no console com2 available)
06:01 marostegui: Stop MySQL on db1117:3321 to clone db1135 (haproxy alert will be triggered) - T222682
05:57 _joe_: rebuilding base debian/alpine images to pick up security updates
05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1135 from config T222682 (duration: 00m 55s)
05:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1135 from config T222682 (duration: 01m 07s)
04:59 marostegui: Rename table wikimedia_editor_tasks_entity_description_exists in db1123 (testwikidatawiki) T226326
04:54 marostegui: Rename table wikimedia_editor_tasks_entity_description_exists in db1092 T226326

2019-06-21

14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:51 moritzm: rebooting planet1001 to pick up MDS mitigations/new kernel
14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:37 moritzm: rebooting kerberos1001 to pick up MDS mitigations/new kernel
14:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:23 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:23 moritzm: rebooting wezen
14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:17 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:16 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:10 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:10 Urbanecm: Attached Carmen0428@metawiki to Carmen0428 global account (T223036)
14:09 Urbanecm: Renamed Carmen0429@metawiki to Carmen0428@metawiki as part of re-attaching to global account (T223036)
13:55 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:48 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:43 akosiaris@deploy1001: scap-helm mathoid finished
13:43 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
13:43 akosiaris@deploy1001: scap-helm mathoid upgrade --recreate-pods -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
13:33 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:26 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:16 moritzm: rebooting kafkamon instances to pick up MDS mitigations/new kernel
13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:16 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:58 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:51 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:30 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:15 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:13 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:09 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:06 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:45 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:43 moritzm: rebooting cp1008
09:42 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:35 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:23 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:15 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:09 jiji@deploy1001: Synchronized wmf-config/ProductionServices.php: Remove kafka1018 from ProductionServices - T224538 (duration: 00m 56s)
09:08 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:08 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:01 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:48 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
08:46 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
08:42 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:40 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
07:39 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet
07:38 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
07:24 moritzm: installing python-thumbor-wikimedia, python-opencv on stat1006
06:54 moritzm: installed radeontop on stat1005 to diagnose GPU usage (T220811)
06:44 moritzm: installed python-opencv on stat1005 (T220811)
05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2051 into s2 to replace db2035 as a master (duration: 01m 00s)
00:45 RoanKattouw: Running FlowReserializeRevisionContent.php on testwiki

2019-06-20

23:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable TimedMediaHandler's new video player Beta Feature T148103 (duration: 00m 57s)
23:01 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/TimedMediaHandler/resources/videojs/: Latest VideoJS for T222763 (duration: 00m 59s)
23:01 onimisionipe: pool maps1003 - node is ready to receive requests - T224395
22:31 jforrester@deploy1001: Finished scap: Full scap for new i18n in VisualEditor (duration: 31m 29s)
22:31 James_F: Scap is stuck in scap-cdb-rebuild with one server left to sync.
22:00 jforrester@deploy1001: Started scap: Full scap for new i18n in VisualEditor
21:49 James_F: Manually purged https://bn.m.wikipedia.org/w/load.php?lang=bn&modules=startup&only=scripts&skin=minerva&target=mobile from Varnish
21:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Ensure that wmgVisualEditorEnableNewMobileContext CS part is set on all servers (duration: 00m 59s)
21:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Ensure that wmgVisualEditorEnableNewMobileContext IS part is set on all servers (duration: 00m 59s)
21:34 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.MobileArticleTarget.js: Revert 'MobileArticleTarget: Update loading interface for new design' (duration: 00m 57s)
21:23 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/VisualEditor/: Pull VisualEditor wmf.8 all the way to wmf.10 (duration: 01m 08s)
20:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert Centralize enwiki's VisualEditor feedback page T224851 (duration: 00m 59s)
18:54 hashar: upgrading and restarting jenkins
18:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy partial blocks on hewikivoyage on community request (Bug: T218626) (duration: 00m 58s)
18:47 tgr@deploy1001: Synchronized php-1.34.0-wmf.10/extensions/GrowthExperiments/extension.json: SWAT: HomepageModule: Use newer schema with start module name (Bug: T222836) (duration: 00m 58s)
18:29 tgr@deploy1001: Synchronized docroot/wwwportal/.well-known/: SWAT: Add .well-known/matrix for wikimedia.org (Bug: T223835) (duration: 00m 57s)
18:16 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Ensure no lossy WTE→VE switching in public wikis (no-op) (duration: 00m 58s)
18:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Centralize enwikis VisualEditor feedback page (T224851) (duration: 00m 57s)
18:02 arlolra: Updated Parsoid to 4fa8d01 (T211251)
17:43 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0 (duration: 05m 38s)
17:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0
17:34 arlolra@deploy1001: Finished deploy [parsoid/deploy@1084a7b]: Updating Parsoid to 4fa8d01 (duration: 06m 17s)
17:27 arlolra@deploy1001: Started deploy [parsoid/deploy@1084a7b]: Updating Parsoid to 4fa8d01
17:25 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@7dc63ab]: Deploy Suggested Edits endpoints (T209997, T224233) (duration: 02m 55s)
17:22 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@7dc63ab]: Deploy Suggested Edits endpoints (T209997, T224233)
16:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert page-properties-change back to eventbus, new schema does not work with change prop - deploy take 3 (duration: 00m 56s)
16:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ACTUALLY Revert page-properties-change back to eventbus, new schema does not work with change prop (duration: 00m 57s)
16:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert page-properties-change back to eventbus, new schema does not work with change prop (duration: 00m 55s)
16:19 krinkle@deploy1001: Synchronized php-1.34.0-wmf.10/includes/specials/pagers/ImageListPager.php: T226102 / 294500d (duration: 00m 58s)
16:16 Krinkle: scb1001 is producing 120,000 errors per minute as of 16:09 UTC minute ago (under 500/min before that)
15:40 Krinkle: krinkle@deploy1001: pull down 98399b1032a0 to wmf.10 (test-only change)
15:05 jijiki: Rolling restart php-fpm on jobrunners to pick up new opcache settings - 518023
15:03 jijiki: Repool mw1311
15:01 jeh: T101631 updating replica views on labsdb1009
14:58 akosiaris: make sure all kubernetes hosts (except kubernetes2001 which is used to investigate some outgoing packet discards) are pooled and with the exact same weight
14:57 jijiki: enable puppet on jobrunners
14:57 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes1005.*
14:57 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes1006.*
14:56 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2006.*
14:56 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2005.*
14:54 jeh: T101631 updating replica views on labsdb1010
14:47 jeh: T101631 updating replica views on labsdb1011
14:41 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
14:36 jeh: T101631 updating replica views on labsdb1012
14:28 Amir1: end of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052)
14:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntityUsageTable addUsage batch size to 150 (T225500) (duration: 00m 56s)
14:18 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:18 Amir1: start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052)
14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet
14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2001.*
14:14 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Switch property terms migration to WRITE_BOTH on test wikidata (T225051) (duration: 00m 56s)
14:14 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
14:14 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:13 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet
14:12 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet
14:11 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:10 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2001.*
14:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_BOTH on test wikidata (T225051) (duration: 00m 56s)
14:06 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:04 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
13:58 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:56 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:50 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:38 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:35 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:31 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:28 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:23 marostegui: Stop replication on labsdb1011 to defragment tables T222978
13:22 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
13:21 jijiki: depool mw1311
13:20 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to depool labsdb1011 - T222978
13:16 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:11 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:04 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:59 jijiki: Disable puppet on jobrunners to merge 518023 and 518018
12:56 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:50 ema: powercycle cp2017, stuck rebooting
12:44 hashar: Upgrading packages on contint1001
12:44 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:40 hashar: Upgrading java/jenkins on releases* hosts # T226159
12:37 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:36 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:36 moritzm: updated jenkins package on apt.wikimedia.org to 2.176.1 for jessie and stretch (T226159)
11:54 Amir1: EU SWAT is done
11:49 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Switch property terms migration to WRITE_BOTH on test wikidata (T225051) (duration: 00m 58s)
11:42 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Introduce config variables for new terms store in mediawiki-config (T226086), Part II (duration: 00m 57s)
11:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Introduce config variables for new terms store in mediawiki-config (T226086) (duration: 00m 57s)
11:20 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Remove ExternalGuidanceEnableContentDetection (T219819) (duration: 01m 00s)
11:14 moritzm: rebooting mw2235, mw2255, mw2271 for MDS kernel update
11:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix import group name (duration: 00m 57s)
11:09 mlitn@deploy1001: Finished scap: [SDC] Enable depicts qualifiers on Commons & increase rate limits (duration: 20m 34s)
10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:58 moritzm: rebooting scb100[12], mw2139 for MDS kernel update (their CPUs were previously unsupported by Intel, but are now covered with the new release)
10:48 mlitn@deploy1001: Started scap: [SDC] Enable depicts qualifiers on Commons & increase rate limits
10:33 marostegui: Deploy schema change on the fishbowl wikis list on T225643
10:31 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:24 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:23 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:17 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:11 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:10 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:58 _joe_: upgraded service-checker T225707
09:56 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
09:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:50 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:44 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:25 marostegui: Remove dbprov1001:/srv/backups/tmp/db1112 - T225981
09:24 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:21 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:17 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:17 ema: cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998 T226048
08:39 marostegui: Stop Mysql on db1124: s1, s3, s5 and s8 to upgrade mysql, this will generate lag on labs
07:59 marostegui: Stop MYSQL and reboot db2084
07:15 marostegui: Transfer dbprov1001:/srv/backups/tmp/db1112/sqldata to db1077 T225981
07:00 moritzm: installing intel-microcode updates to June 2019 release (microcode is unmodified for most CPUs except for Sandybridge/Core-X models)
06:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool and remove from config db1077 T225981 (duration: 00m 54s)
06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s)
06:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
06:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
06:18 moritzm: rebooting sarin for some tests with updated intel-microcode for MDS (also covering Sandybridge server CPUs initially not supported by Intel)
06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 55s)
06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 57s)
05:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s)
05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s)
05:37 marostegui: Deploy schema change on centralauth.oathauth_users T225643
05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly pool db1112 into s3 T225981 (duration: 00m 55s)
05:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Slowly pool db1112 into s3 T225981 (duration: 00m 55s)
05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 T225981 (duration: 00m 55s)
04:53 marostegui: Stop replication in sync on db1112 and db1077 to move db1124 under db1112 - T225981
04:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T225981 (duration: 00m 59s)
04:00 onimisionipe: depooling maps1003 for reimage into new partition scheme - T224395

2019-06-19

18:09 legoktm: added MatmaRex to extension-VisualEditor-staff Gerrit group
16:50 moritzm: running racreset on multatuli
16:50 XioNoX: rollback redirect ns0 to authdns2001
16:45 moritzm: rebooting authdns1001 for kernel security update
16:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
16:39 XioNoX: redirect ns0 to authdns2001
16:37 XioNoX: rollback redirect ns1 to authdns1001
16:34 moritzm: rebooting authdns2001 for kernel security update
16:28 XioNoX: redirect ns1 to authdns1001
16:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
16:23 onimisionipe: pooling elastic1029 - T214283
16:01 ema: cache nodes: stop rolling reboots for today, 47/80 done T224694 T225998
15:43 reedy@deploy1001: rebuilt and synchronized wikiversions files: group0 back to .8 T226109
15:43 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:40 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:37 onimisionipe: pooled maps1002 - postgres init is complete and successfully joined to its cluster - T224395
15:36 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
15:33 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:21 moritzm: rolling reboot of proton* for kernel security update
15:18 moritzm: rebooting boron for kernel security update
15:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:16 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:13 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:08 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
15:06 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:57 XioNoX: update syslog target on frack network devices (T224128)
14:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:55 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
14:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:55 XioNoX: jnt push to knams, remove old protect-old-lvs-servers term + update syslog target (T224128) + replace /28 with /29 (T211254)
14:54 moritzm: rolling reboot of URL downloaders for kernel security update
14:48 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:48 XioNoX: jnt push to eqiad, remove old protect-old-lvs-servers term + update syslog target T224128
14:48 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:46 reedy@deploy1001: rebuilt and synchronized wikiversions files: group1 back to .8 T226109
14:40 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:40 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:20 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:13 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:11 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:06 moritzm: rolling reboot of mwdebug servers for kernel security update
14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling Avro ApiAction Monolog channel - T222267 (duration: 00m 57s)
13:53 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:51 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:50 cdanis: rebooting wikitech-static
13:48 cdanis: apt upgrade on wikitech-static
13:47 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:44 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:27 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:24 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:20 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:17 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:00 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:57 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:53 marostegui: Deploy schema change on the private wikis listed at T225643
12:51 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:51 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:31 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:31 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:25 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:21 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:20 ema: cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998
11:07 ema: cache nodes: pause rolling reboots for kernel and varnish upgrades T224694 T225998
10:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:54 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:52 moritzm: rebooting mx1001 for kernel security update
10:50 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:47 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:38 ladsgroup@deploy1001: scap-helm termbox finished
10:38 ladsgroup@deploy1001: scap-helm termbox cluster codfw completed
10:38 ladsgroup@deploy1001: scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: codfw]
10:36 moritzm: rebooting mx2001 for kernel security update
10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:33 akosiaris@deploy1001: scap-helm termbox finished
10:33 akosiaris@deploy1001: scap-helm termbox cluster staging completed
10:33 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-staging-values.yaml staging stable/termbox [namespace: termbox, clusters: staging]
10:30 jbond42: update late-install so it installs the correct puppet version https://gerrit.wikimedia.org/r/c/operations/puppet/+/515087
10:30 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:30 moritzm: installing glibc and ca-certificates-java updates from stretch point release
10:29 akosiaris@deploy1001: scap-helm termbox finished
10:29 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
10:29 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad]
10:27 ema@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
10:23 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:21 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:05 ema: cp3030: increase varnish-be thread_pool_max from 12000 (250 * 48) to 14400 (300 * 48) to observe impact on fetcherrors
10:03 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1077 (duration: 00m 55s)
10:01 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:56 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:54 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s)
09:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:34 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s)
09:29 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:25 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 T225981 (duration: 01m 00s)
09:20 XioNoX: jnt push to esams, remove old protect-old-lvs-servers term + update syslog target T224128
09:14 marostegui: Start MySQL on db1077 - s3 labsdb lag should start catching up T225981
09:13 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2001.*
09:09 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:06 akosiaris: repool kubernetes2002, kubernetes2003. Point proven, chasing down lead
09:06 akosiaris: repool kubernetes2002, kubernetes2003. Point proven, chasing down load
09:06 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2002.*
09:06 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes2003.*
09:05 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
09:03 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:57 akosiaris: depool kubernetes200{2,3} for the same out discards investigation
08:56 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:56 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2003.*
08:56 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2002.*
08:54 akosiaris: uncordon kubernetes2001, reschedule some pods on it. Investigating out discards still
08:51 XioNoX: jnt push to codfw, remove old protect-old-lvs-servers term + update syslog target T224128
08:43 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
08:43 akosiaris: depool kubernetes2001 from all services to investigate some IP out discard statistics
08:42 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes2001.*
08:36 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
08:36 akosiaris: cordon kubernetes2001 to investigate some IP out discard statistics
08:34 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:28 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:24 moritzm: installing new kernels with SACK fix on jessie servers
08:21 akosiaris: upgrade citoid, mathoid, termbox to latest chart releases to address the GC metric naming issue T220709 T222795
08:20 akosiaris@deploy1001: scap-helm termbox finished
08:20 akosiaris@deploy1001: scap-helm termbox cluster staging completed
08:20 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-staging-values.yaml staging stable/termbox [namespace: termbox, clusters: staging]
08:20 akosiaris@deploy1001: scap-helm termbox finished
08:20 akosiaris@deploy1001: scap-helm termbox cluster codfw completed
08:20 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
08:20 akosiaris@deploy1001: scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad,codfw]
08:19 akosiaris@deploy1001: scap-helm mathoid finished
08:18 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
08:18 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
08:18 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
08:14 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
08:13 akosiaris@deploy1001: scap-helm mathoid finished
08:13 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
08:13 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
08:13 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
08:13 akosiaris@deploy1001: scap-helm citoid finished
08:13 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
08:13 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
08:08 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
08:07 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:02 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
08:01 ema: cache nodes: resume rolling reboots for kernel and varnish upgrades T224694
08:00 akosiaris@deploy1001: scap-helm citoid finished
08:00 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
08:00 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
08:00 akosiaris@deploy1001: scap-helm citoid finished
07:59 akosiaris@deploy1001: scap-helm citoid cluster staging completed
07:59 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
07:56 moritzm: rearmed keyholder on acmechief-test2001
07:51 moritzm: installing vim security updates on stretch
07:46 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
07:35 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
07:34 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
07:18 XioNoX: jnt push to eqdfw, remove old protect-old-lvs-servers term + update syslog target T224128
07:17 XioNoX: jnt push to eqord, remove old protect-old-lvs-servers term + update syslog target T224128
07:13 XioNoX: jnt push to eqsin, remove old protect-old-lvs-servers term + update syslog target T224128
07:12 marostegui: s3 will be lagging on labsdb hosts due to maintenance on db1077 - T225981
07:02 XioNoX: jnt push to ulsfo, remove old protect-old-lvs-servers term + update syslog target T224128
06:57 marostegui: Stop MySQL on db1077 to transfer its data to db1112 - T225981
06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T225981 (duration: 01m 06s)
05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1135 T222682 (duration: 00m 56s)
05:37 marostegui: Upgrade db1068 (old s4 master) to 10.1.39
05:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1138 status (duration: 00m 55s)
05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s4 ready only T224852 (duration: 00m 33s)
05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s4 master eqiad from db1068 to db1081 T224852 (duration: 00m 33s)
05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s4 on read-only T224852 (duration: 00m 34s)
05:00 marostegui: Starting s4 failover from db1068 to db1081 - T224852
04:40 kartik@deploy1001: scap-helm cxserver finished
04:40 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
04:40 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
04:40 kartik@deploy1001: scap-helm cxserver finished
04:40 kartik@deploy1001: scap-helm cxserver cluster codfw completed
04:40 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
04:40 kartik@deploy1001: scap-helm cxserver finished
04:40 kartik@deploy1001: scap-helm cxserver cluster staging completed
04:39 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
04:28 marostegui: Starting pre-steps for the s4 failover that will happen at 05:00 UTC - T224852
04:25 kartik@deploy1001: scap-helm cxserver finished
04:25 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
04:25 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
04:24 kartik@deploy1001: scap-helm cxserver finished
04:24 kartik@deploy1001: scap-helm cxserver cluster codfw completed
04:24 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
04:21 onimisionipe: depooling maps1002 for reimaging into new partition scheme - T224395
04:20 kartik@deploy1001: scap-helm cxserver finished
04:20 kartik@deploy1001: scap-helm cxserver cluster staging completed
04:20 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
04:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 T224852 (duration: 00m 57s)

2019-06-18

22:33 jijiki: pool thumbor1001
22:20 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/includes/htmlform/fields/HTMLSelectAndOtherField.php: 90b513d96e36 / T222170 (duration: 00m 57s)
21:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.10 refs T220735 (duration: 00m 54s)
21:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.10 refs T220735
21:33 twentyafterfour: Promoting Group 1 wikis to MediaWiki 1.34.0-wmf.10 ahead of schedule because tomorrow is a WMF holiday.
20:49 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0a1c946]: deploy new GUI for T226017 (duration: 30m 03s)
20:31 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.10 refs T220735
20:24 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.10 refs T220735 (duration: 37m 10s)
20:19 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0a1c946]: deploy new GUI for T226017
19:47 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.10 refs T220735
19:15 twentyafterfour: branching 1.34.0-wmf.10
19:04 ebernhardson: deployed discovery.query_clicks_{hourly,daily} fill jobs updated to use eventgate to oozie
18:57 jijiki: depool thumbor1001
18:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@deb30dc]: Ship search analytics jobs updated to source from eventgate (duration: 00m 17s)
18:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@deb30dc]: Ship search analytics jobs updated to source from eventgate
18:20 jynus: running data compare on s4 (commons) databases T224852
17:38 jynus: testing switchover automation on es2001/es2002 T224852
17:34 mbsantos@deploy1001: Finished deploy [mobileapps/deploy@dea8e94]: Update mobileapps to c6804c5 (duration: 04m 41s)
17:29 mbsantos@deploy1001: Started deploy [mobileapps/deploy@dea8e94]: Update mobileapps to c6804c5
16:38 otto@deploy1001: scap-helm eventgate-analytics finished
16:38 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
16:38 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f /srv/scap-helm/eventgate/analytics/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
16:35 otto@deploy1001: scap-helm eventgate-analytics finished
16:35 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
16:34 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f /srv/scap-helm/eventgate/analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
16:32 otto@deploy1001: scap-helm eventgate-analytics finished
16:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:32 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f /srv/scap-helm/eventgate/analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
16:27 otto@deploy1001: scap-helm eventgate-main finished
16:27 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
16:26 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: eqiad]
16:25 otto@deploy1001: scap-helm eventgate-main finished
16:25 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
16:25 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
16:23 otto@deploy1001: scap-helm eventgate-main finished
16:23 otto@deploy1001: scap-helm eventgate-main cluster staging completed
16:23 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
16:07 ema: cache nodes: stop rolling reboots for today, 17/80 done T224694
16:06 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
16:01 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
16:01 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@09404fb]: Update the recommendation API service (duration: 03m 09s)
15:59 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
15:58 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@09404fb]: Update the recommendation API service
15:55 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
15:39 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:35 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:30 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
15:26 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
15:10 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:06 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
15:03 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:57 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:43 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:37 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:35 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:30 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=termbox
14:16 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore
14:15 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:12 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-restrictions-change to eventgate-main - T211248 (duration: 00m 47s)
14:10 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:09 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-links-change to eventgate-main - T211248 (duration: 00m 48s)
14:04 ottomata: deploying mediawiki-config to Produce page-linkT211248s-change stream to eventgate-main - T211248
14:04 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:56 Amir1: ladsgroup@mwmaint1002:~$ mwscript sql.php --wiki=wikidatawiki /srv/mediawiki/php-1.34.0-wmf.8/extensions/Wikibase/repo/sql/AddNormalizedTermsTablesDDL.sql (T225039)
13:56 Amir1: ladsgroup@mwmaint1002:~$ mwscript sql.php --wiki=testwikidatawiki /srv/mediawiki/php-1.34.0-wmf.8/extensions/Wikibase/repo/sql/AddNormalizedTermsTablesDDL.sql (T225039)
13:49 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce page-properties-change to eventgate-main - T211248 (duration: 00m 48s)
13:46 ottomata: deploying mediawiki-config to produce page-properties-change events to eventgate-main
13:44 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:42 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:42 XioNoX: push new syslog target to msw* - T224128
13:37 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:31 XioNoX: push new syslog target to mr* - T224128
13:22 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:17 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:12 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:10 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:10 ema: cache nodes: begin rolling reboots for kernel and varnish upgrades T224694
12:59 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:55 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:53 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:49 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:49 ema: cp3034 (ats-be upload) cp2002 (varnish-be upload): reboot for kernel and varnish upgrade T224694
12:38 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:37 XioNoX: merge puppet change to make all router down alerts paging - T224535
12:29 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:27 ema: cp5007 (varnish-be text): reboot for kernel and varnish upgrade T224694
12:23 XioNoX: activate bgp to telia on cr1-codfw - T222967
12:13 Urbanecm: Assigned an email address to Eritha@enwiki per user request (T223960)
12:12 akosiaris: slowly rolling restart php7 on mw1299-mw1338 to avoid opcache exhaustion
12:03 Urbanecm: EU SWAT done
12:03 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Allow sysops to manage flaggedrevs group membership only if the group exists (T225797) (duration: 00m 47s)
12:00 Urbanecm: EU SWAT is going a few minutes beyond its slot
11:55 Urbanecm: running namespaceDupes.php for eswikibooks (T216143)
11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set two new namespace aliases for es.wikibooks (T216143) (duration: 00m 47s)
11:49 akosiaris: set all termbox backends with weight 10 (from 0) for consistency's sake
11:49 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [3/3] (3/3) (duration: 00m 46s)
11:49 akosiaris@puppetmaster1001: conftool action : set/weight=10; selector: service=termbox
11:47 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [3/3] (2/3) (duration: 00m 47s)
11:46 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [3/3] (1/3) (duration: 00m 47s)
11:39 akosiaris: restart pybal on lvs1016
11:39 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [2/3] (duration: 00m 47s)
11:38 akosiaris: restart pybal on lvs2003
11:34 jijiki: restarting php-fpm on mwdebug1001
11:29 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] drop most wmgCirrusSearch* ephemeral config vars [1/3] (duration: 00m 46s)
11:26 akosiaris: pool all hosts for termbox
11:26 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: service=termbox
11:25 dcausse@deploy1001: Synchronized wmf-config/extension-list: [cirrus] Load cirrus using wfLoadExtension 2/2 (duration: 00m 46s)
11:24 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: [cirrus] Load cirrus using wfLoadExtension 1/2 (duration: 00m 47s)
11:22 akosiaris: set elastic1029 as inactive in all conftool data. Command was sudo confctl select "name=elastic1029.eqiad.wmnet" set/pooled=inactive T214283
11:21 akosiaris@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1029.eqiad.wmnet
11:15 akosiaris: deploy lvs termbox configuration changes
11:12 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add 'sms' and 'smn' langcodes to commons for use in captions (T222309) (duration: 00m 48s)
10:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:57 jbond42: reboot bast1002
10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:53 moritzm: rebooting pybal-test2001 for some tests with the new 4.9 kernel for jessie
10:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:46 jbond42: reboot bast2002
10:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:28 jbond42: reboot bast3002
10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:24 jbond42: reboot iron.wikimedia.org
10:19 jbond42: reboot bast4001
10:08 jbond42: reboot bast5001
10:01 moritzm: upgrading acmechief* to latest Buster
09:53 ema: upgrade varnish packages to 5.1.3-1wm10 on all A:cp (no restarts yet)
09:42 jbond42: I will start a rolling reboot of all bastion servers at 10:00UTC
09:36 jijiki: restarting php-fpm in mwdebug1002
09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:08 elukey: reboot analytics-tool1004 a second time to pick up the new kernel upgrades
08:54 akosiaris@deploy1001: scap-helm termbox finished
08:54 akosiaris@deploy1001: scap-helm termbox cluster codfw completed
08:54 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
08:54 akosiaris@deploy1001: scap-helm termbox upgrade --install -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad,codfw]
08:52 akosiaris: deploy termbox T220402
08:52 akosiaris@deploy1001: scap-helm termbox finished
08:52 akosiaris@deploy1001: scap-helm termbox cluster codfw completed
08:52 akosiaris@deploy1001: scap-helm termbox cluster eqiad completed
08:52 akosiaris@deploy1001: scap-helm termbox upgrade --install -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad,codfw]
08:23 marostegui: Stop MySQL on db2039 - T225988
08:16 marostegui: Remove db2039 from tendril and zarcillo - T225988
07:45 elukey: roll restart of cassandra on aqs* to pick up new openjdk upgrades
07:39 elukey: reboot matomo1001 for kernel upgrades
07:36 elukey: reboot archiva1001 for kernel upgrades
07:32 elukey: reboot analytics-tool100* and an-tool100* for kernel upgrades
07:21 elukey: upload matomo_3.9.1-3 to stretch-wikimedia and upgrade matomo1001
07:06 moritzm: disabling TCP selective acknowledgements on a number of internal test hosts
07:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2039 from config T221533 (duration: 00m 46s)
07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2039 from config T221533 (duration: 00m 51s)
06:56 onimisionipe: pooling maps1001 - reimage is complete - T224395
06:19 marostegui: Stop slave and mysql on db1112 to copy its content to dbprov1001:/srv/backups/tmp/db1112 - T225981
05:54 marostegui: Stop slave and mysql on db1112 to copy its content to dbstore1001:/srv/tmp/db1112 - T225981
04:44 marostegui: Deploy schema change on db1073 (labtestwiki and labswiki) - T225643
04:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 after optimizing its tables T210725 (duration: 00m 47s)
03:46 kartik@deploy1001: scap-helm cxserver finished
03:46 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
03:46 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
03:45 kartik@deploy1001: scap-helm cxserver finished
03:45 kartik@deploy1001: scap-helm cxserver cluster codfw completed
03:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
03:42 kartik@deploy1001: scap-helm cxserver finished
03:42 kartik@deploy1001: scap-helm cxserver cluster staging completed
03:42 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
00:22 RoanKattouw: Running populateRevisionSha1.php on dewikivoyage for T219816
00:15 RoanKattouw: Running populateRevisionSha1.php on testwiki for T219816

2019-06-17

23:40 Krinkle: Repopulating lost "coal.*" data in Graphite from NavigationTiming for 2019-04-17, ref T221401
23:27 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No further use of ShortUrl (duration: 00m 47s)
23:22 Krinkle: Prune debugging data "coal_tmp2.*" and "coal_tmp3.*" from graphite1004 and graphite2003 from last week, ref T221401
23:21 Krinkle: Prune random spare "BetaMediaWiki.*" data points from graphite1004 and graphite2003 from pre Nov 2018.
20:53 arlolra: Updated Parsoid to 2bf94f0 (T225217)
20:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@a8d9f6e]: Updating Parsoid to 2bf94f0 (duration: 10m 28s)
20:34 arlolra@deploy1001: Started deploy [parsoid/deploy@a8d9f6e]: Updating Parsoid to 2bf94f0
20:18 halfak@deploy1001: Finished deploy [ores/deploy@04fbd58]: T224484 (duration: 15m 17s)
20:02 halfak@deploy1001: Started deploy [ores/deploy@04fbd58]: T224484
18:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments (testwiki): Switch on mobile homepage feature (duration: 00m 47s)
18:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce mediawiki.user-blocks-change stream to eventgate-main, again (duration: 00m 49s)
18:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ExtensionDistributor log channel to help with T225243 (duration: 00m 47s)
18:24 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Enable REL1_33 (beta), drop pre-REL1_30 (duration: 00m 48s)
18:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy Partial blocks to English wikisource, wiktionary and wikivoyage T218626 (duration: 00m 47s)
18:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Extend wgCopyUploadsDomains T213901 T224875 T225852 (duration: 00m 47s)
18:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/EventBus/includes/EventFactory.php: SWAT: Ensure user-blocks-change expiry_dt is in ISO-8601 (duration: 00m 48s)
18:07 jforrester@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/FlaggedRevs/frontend/modules/ext.flaggedRevs.advanced.js: SWAT: FlaggedRevs: Bring back diff toggle T225351 (duration: 00m 48s)
18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Turn off mobile-ab test for VE section editing (duration: 00m 48s)
18:03 moritzm: disabled TCP selective acknowledgements on caches/bastions
18:00 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@dcf3338]: New Updater, GUI and Blazegraph build (duration: 17m 37s)
17:56 otto@deploy1001: scap-helm eventgate-main finished
17:56 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
17:56 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: eqiad]
17:56 otto@deploy1001: scap-helm eventgate-main finished
17:56 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
17:56 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
17:54 otto@deploy1001: scap-helm eventgate-main finished
17:54 otto@deploy1001: scap-helm eventgate-main cluster staging completed
17:54 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
17:43 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@dcf3338]: New Updater, GUI and Blazegraph build
17:29 onimisionipe: pooled wdqs1003 - after rolling back failed deployment.
17:26 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@d6ed70b]: New Updater, GUI and Blazegraph build (duration: 10m 19s)
17:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - Produce user-blocks-change to eventgate-main. Depends on https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/514560 (duration: 00m 47s)
17:16 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d6ed70b]: New Updater, GUI and Blazegraph build
17:14 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Produce user-blocks-change to eventgate-main - T211248 (duration: 00m 48s)
17:10 ottomata: mw-config change to produce user-blocks-change event to eventgate-main - T211248
16:27 jynus: starting data check on db2097+db2046, expect increase in read row rate T225378
16:02 otto@deploy1001: scap-helm eventgate-main finished
16:02 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
16:02 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: eqiad]
15:57 otto@deploy1001: scap-helm eventgate-main finished
15:57 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
15:57 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
15:55 otto@deploy1001: scap-helm eventgate-main finished
15:55 otto@deploy1001: scap-helm eventgate-main cluster staging completed
15:55 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
15:42 ema: cp4026: ats-backend-restart to apply systemd unit hardening changes
15:32 otto@deploy1001: scap-helm eventgate-main finished
15:32 otto@deploy1001: scap-helm eventgate-main cluster staging completed
15:32 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
15:30 otto@deploy1001: scap-helm eventgate-main finished
15:30 otto@deploy1001: scap-helm eventgate-main cluster staging completed
15:30 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
15:17 thcipriani: gerrit back
15:16 thcipriani: gerrit restart to pick up new config changes.
14:45 elukey: stop eventlogging on eventlog1002 and reboot for kernel upgrades
14:32 otto@deploy1001: scap-helm eventgate-main finished
14:32 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
14:32 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: codfw]
14:26 otto@deploy1001: scap-helm eventgate-main finished
14:26 otto@deploy1001: scap-helm eventgate-main cluster staging completed
14:26 otto@deploy1001: scap-helm eventgate-main upgrade main -f /srv/scap-helm/eventgate/main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
14:15 moritzm: installing poppler security updates on jessie
14:03 cdanis: cdanis@cobalt.wikimedia.org ~ % sudo systemctl start gerrit.service
13:53 ema@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:49 moritzm: installing libav security updates
13:45 ema@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:45 ema: reboot cp4027 for dist and Varnish upgrade T224694
13:34 elukey: reboot of an-worker* (Hadoop worker nodes) for kernel + openjdk upgrades
13:25 ema: cp4027: upgrade Varnish packages to 5.1.3-1wm10 T224694
12:37 jbond42: upgrade mtail on lithium - T225604
12:35 jbond42: add mtail_3.0.0~rc24.1-1+wmf1_amd64.deb to jessie-wikimedia backports
12:13 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: FileImporter configuration to fetch sitelinks from Wikidata (T225609 T224007) - finishing partial deployment (duration: 00m 47s)
12:06 awight: EU SWAT complete
12:05 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 517391 Enable AMC mode for Persian, Japanese, Thai and Italian wikis (T225123) (duration: 00m 47s)
12:02 Urbanecm: EU SWAT is going a few minutes beyond its window
11:55 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 516608 Enable feature flag for breaking Wikibase API change (T223303) (duration: 00m 47s)
11:49 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 516478 Set EntityUsageTable addUsage batch size to 200 (T225500) (duration: 00m 47s)
11:46 awight@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/ContentTranslation: SWAT: Fix undefined index notices (T225198) (duration: 00m 49s)
11:33 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add autoreview protection level on ar.wikipedia (T225896) (duration: 00m 47s)
11:28 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable VisualEditor in draft namespace on sr.wiki (T223024) (duration: 00m 47s)
11:23 awight: ran mwscript namespaceDupes.php nds_nlwiki, no dupes found
11:22 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set nds_nlwiki's sitename and metanamespace back to defaults (T224349) (duration: 00m 47s)
11:12 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: wmf-config/CommonSettings-labs.php SWAT: FileImporter configuration to fetch sitelinks from Wikidata (T225609 T224007) (duration: 00m 47s)
10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 47s)
10:51 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
09:39 moritzm: rebooting mw2184, mw1265 for some tests
09:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:31 elukey: set cpu governor to performance (was powersave) on analytics1070 (hadoop worker node)
09:17 moritzm: rebooting sulfur for some tests
09:15 _joe_: The governor was set to "powersave", not "ondemand"
09:13 _joe_: setting cpufreq governor to "ondemand" on mw1348, T225713
08:52 onimisionipe: remove maps1001 from cassandra cluster - T224395
07:25 XioNoX: restart snmp daemon on mr1-eqsin
07:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2107 (duration: 00m 47s)
06:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2084 (duration: 00m 47s)
06:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2084 for a reboot (duration: 00m 48s)
06:04 marostegui: Stop MySQ on db2084 to reboot the host T225884
05:16 marostegui: Stop MySQL on db2107 to clone db2051 - T221533
05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2107 to clone db2051 (duration: 00m 47s)
05:03 marostegui: Optimize all pc1008's tables T210725
05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 and pool pc1010 temporarily while pc1008 gets all its tables optimized T210725 (duration: 00m 59s)

2019-06-16

14:20 Urbanecm: running mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='AKA MBG' /home/urbanecm/T225886
08:21 elukey: roll restart of druid brokers on druid100[4-6], stuck after regular data drop maintenance

2019-06-15

20:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots (duration: 21m 42s)
20:17 smalyshev@deploy1001: Started deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots
20:16 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots (duration: 00m 54s)
20:15 smalyshev@deploy1001: Started deploy [wdqs/wdqs@55174a4]: deploy new pattern for bots
19:14 SMalyshev: repooled wdqs1004
17:35 elukey: restart hadoop-yarn-resourcemanager on an-masters as attempt to fix yarn.w.o
07:44 SMalyshev: depooled wdqs1004 to catch it up

2019-06-14

23:23 ejegg: updated payments-wiki from 75abd71cc1 to 79d1822644
23:19 SMalyshev: repooled wdqs1003
23:13 SMalyshev: repooled wdqs2003
23:10 _joe_: set cpufreq governor for mw1348 to performance
19:56 SMalyshev: depooled wdqs2003 to catch up
19:17 SMalyshev: depooled wdqs1003 to catch up
15:56 gehel: repooling wdqs1003, not catching up anyway (high edit load)
15:24 godog: test setting 'performance' governor on ms-be2035 - T210723
14:35 godog: powercycle mw1294, down and no console
13:26 gehel: depooling wdqs1003 to allow it to catch up on lag
13:22 joal@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
12:38 godog: test setting 'performance' governor on ms-be2032 - T210723
11:36 godog: test setting 'performance' governor on ms-be2034 - T210723
10:22 marostegui: Optimize tables on pc2008 - T210725
10:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1077 after recovering from a crash (duration: 00m 49s)
10:14 godog: test setting 'performance' governor on ms-be2031 - T210723
09:44 godog: test setting 'performance' governor on ms-be2037 - T210723
09:43 godog: test setting 'performance' governor on ms-be2033 - T210723
09:28 godog: test setting 'performance' governor on ms-be2038 - T210723
09:26 godog: test setting 'performance' governor on ms-be2016 - T210723
03:57 SMalyshev: repooled wdqs1005
00:11 SMalyshev: depooled wdqs1005 - let it catch up
00:10 SMalyshev: repooled wdqs1006 - caught up

2019-06-13

23:25 SMalyshev: depooled wdqs1006 to let it catch up quicker
18:10 fdans@deploy1001: Finished deploy [analytics/refinery@67b34fe]: retrying deployment of analytics refinery (duration: 00m 19s)
18:10 fdans@deploy1001: Started deploy [analytics/refinery@67b34fe]: retrying deployment of analytics refinery
18:01 fdans@deploy1001: Finished deploy [analytics/refinery@67b34fe]: deploying refinery source 0.0.92 into refinery (duration: 16m 45s)
17:44 fdans@deploy1001: Started deploy [analytics/refinery@67b34fe]: deploying refinery source 0.0.92 into refinery
17:34 bstorm_: T203254 set cpu scaling governor to performance on labstore1004 and labstore1005
16:02 gehel: restart blazegraph on wdqs public cluster completed
15:58 gehel: restart blazegraph on wdqs public cluster
15:36 gehel: restarting blazegraph on wdqs-internal / eqiad (just in case)
08:09 jynus: reloading proxies for wikireplicas to rebalance load
07:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 50s)
00:45 paravoid: setting the CPU governor to performance for ms-be1036 (a while ago)

2019-06-12

18:15 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/thumb.php: T225197 / 06b631fae5 (duration: 00m 47s)
18:13 krinkle@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/ArticlePlaceholder/includes/: T207235 / a42aa15 (duration: 00m 49s)
16:06 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
15:49 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
15:37 legoktm: re-enabled bawolff's gerrit account
15:14 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-restart (exit_code=97)
14:38 marostegui: Start replication on all threads on labsdb1010 - T222978
14:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 47s)
13:19 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
11:55 godog: swift eqiad-prod: put back ms-be1033 - T223518
10:52 godog: force-upgrade mtail to 3.0.0~rc24.1-1 on wezen - T225604
10:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1077 after recovering from a crash (duration: 00m 47s)
10:18 akosiaris@deploy1001: scap-helm zotero finished
10:18 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
10:17 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
10:17 akosiaris@deploy1001: scap-helm zotero upgrade --dry-run --debug production stable/zotero [namespace: zotero, clusters: eqiad,codfw]
10:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 after a crash (duration: 00m 48s)
09:51 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
08:59 hashar: Gracefully stopping Zuul (kill -SIGUSR1) to prepare for the restart of the CI Jenkins T225322
08:41 onimisionipe: pool map2003. reimage and setup is complete - T224395
08:31 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 after a crash (duration: 00m 49s)

2019-06-11

19:24 tzatziki: Removing four (4) files for legal compliance
15:41 gehel: shutting down elastic1029 for investigation - T214283
12:54 godog: swift eqiad-prod: put back ms-be1033 - T223518
11:52 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
10:54 godog: wipe fs on ms-be1033 data partitions - T223518
09:56 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
09:20 godog: free up space wrongly allocated onto / with sdc1 umounted on ms-be2018
08:26 gehel: repooling maps200[124]

2019-06-10

19:39 thcipriani: restarting jenkins
19:11 akosiaris: refresh all zotero pods in all clusters
19:11 akosiaris@deploy1001: scap-helm zotero finished
19:11 akosiaris@deploy1001: scap-helm zotero cluster staging completed
19:11 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
19:11 akosiaris@deploy1001: scap-helm zotero finished
19:10 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
19:10 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
19:10 akosiaris@deploy1001: scap-helm zotero finished
19:10 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
19:10 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
17:55 ottomata: rolling restart of AQS service using scap deploy for new mediawiki_history_snaphost
17:55 otto@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
16:24 marostegui: Power reset db1077 from the idrac T225391
13:18 mvolz@deploy1001: scap-helm citoid finished
13:18 mvolz@deploy1001: scap-helm citoid cluster codfw completed
13:18 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
13:13 mvolz@deploy1001: scap-helm citoid finished
13:13 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
13:13 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
13:04 mvolz@deploy1001: scap-helm citoid finished
13:04 mvolz@deploy1001: scap-helm citoid cluster staging completed
13:04 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
05:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 - host crashed (duration: 00m 52s)

2019-06-09

08:30 vgutierrez: rebooting lvs4007 after NIC driver crash

2019-06-08

11:58 godog: stop swift processes on ms-be1033 - T223518
10:46 reedy@deploy1001: Synchronized wmf-config/throttle.php: T225344 (duration: 00m 51s)

2019-06-07

18:56 herron: performing rolling reboots of logstash codfw frontends for security updates
18:22 cstone: Update payments-wiki revision changed from c6c7bbf71e to 75abd71cc1
15:34 godog: bounce rsyslog on wezen - T199406

2019-06-07

15:09 elukey: reboot thorium for kernel upgrades
14:00 ema: pool cp3039 w/ ATS backend T222937
13:15 ema: depool cp3039 and reimage as upload_ats T222937
13:04 arturo: aborrero@cumin1001:~ $ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --enable && run-puppet-agent" (patch already merged)
13:03 arturo: aborrero@cumin1001:~$ sudo cumin "P{R:Systemd::Timer::Job}" "puppet agent --disable 'arturo merging systemd timer nrpe change'" (19 hosts affected) merging: https://gerrit.wikimedia.org/r/c/operations/puppet/+/514988
11:45 ema: pool cp3043 w/ ATS backend T222937
10:51 jbond42: upload libcpp-hocon0.1.6_0.1.6-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
10:45 jbond42: upload libleatherman-data_1.4.0+dfsg-1\~bpo9+1_all.deb to wikimedia-stretch component/facter3
10:43 ema: depool cp3043 and reimage as upload_ats T222937
10:09 _joe_: restarting php-fpm on the codfw hosts to pick up the recent changes in opcache
09:59 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo9+1_amd64.deb to wikimedia-stretch component/facter3
09:49 jbond42: upload libleatherman1.4.0_1.4.0+dfsg-1~bpo8+1_amd64.deb to wikimedia-jessie component/facter3
09:16 mobrovac@deploy1001: scap-helm mathoid finished
09:16 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
09:16 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
09:16 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
09:00 marostegui: Upgrade x1 codfw hosts in preparation for its failover T220170
08:46 elukey: start the reboot of the Analytics Hadoop's worker nodes for kernel+openjdk upgrades
08:24 marostegui: Upgrade s2 codfw to 10.1.39 in preparation for its codfw failover - T221533
08:19 XioNoX: remove BGP session to AS55658 on cr1-eqsin (left the IXP)
08:12 vgutierrez: upgrading certbot in wikitech-static
07:29 marostegui: Drop unused temporary test tables on db1111 and db1112
05:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2051 from s4 to s2T221533 (duration: 00m 49s)
00:00 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove unused preference T47877-buster (duration: 00m 47s)
00:00 bstorm_: T224850 repooled labsdb1009 after completing view updates

2019-06-06

23:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Specify the fluidsynth paths for TMH MIDI conversion T135597 (duration: 00m 47s)
23:56 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove T225183 (duration: 00m 48s)
23:03 jeh: T224850 depooled labsdb1009
22:42 bstorm_: T224850 repooled labsdb1011
21:01 bstorm_: T224850 depooled labsdb1011
20:58 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: Stop setting wgSquidServersNoPurge, MW now uses wgCdnServersNoPurge (duration: 00m 47s)
20:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgSquidMaxage, MW now uses wgCdnMaxAge (duration: 00m 46s)
20:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgUseSquid or using wgSquidServersNoPurge, duplicate existing values (duration: 00m 48s)
20:49 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Drop backwards-compatibility for dataSquidMaxage (duration: 00m 48s)
19:47 herron: performing rolling reboot of eqiad logstash hw for MDS security updates
18:58 jbond42: reimage sarin to stretch
18:39 jbond42: mw1249 - sudo systemctl restart php7.2-fpm.service
18:38 papaul: shutting down backup2001 for 10G nic troubleshooting
18:24 bstorm_: T224850 repooled labsdb1010 after completing view run
18:04 jijiki: Continuing rolling restarts of php-fpm in eqiad
17:30 elukey: restart mcrouter on mw2271 (codfw proxy) to pick up new config changes
15:56 bstorm_: T224850 depooled labsdb1010 for view updates
15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:05 moritzm: rolling reboot of sessionstore hosts in eqiad for kernel security update
15:02 _joe_: rolling restart of php-fpm on {appservers,api} in eqiad, in groups of 4, staggered by 10 minutes, to pick up the new opcache settings
14:57 bstorm_: T224850 update views on labsdb1012
14:43 moritzm: updating qemu packages on ganeti hosts to deploy support for md_clear/MDS for Ganeti instances
14:43 elukey: restart mcrouter on mw2255 (codfw proxy) to pick up new config changes
14:22 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: fix logspam (duration: 00m 48s)
14:18 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
13:54 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: fix logspam (duration: 00m 47s)
13:44 moritzm: rolling reboot of sessionstore hosts in codfw for kernel security update
13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:36 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
13:35 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.8
13:35 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart-wdqs (exit_code=99)
13:35 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
13:34 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
13:33 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
13:32 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.restart-wdqs (exit_code=0)
13:31 gehel@cumin1001: START - Cookbook sre.wdqs.restart-wdqs
12:44 jbond42: reimage neodymium
12:23 _joe_: running puppet, restarting php-fpm on the canaries to pick up the new opcache size
12:11 ema: cp1075: repool with varnish 5.1.3-1wm10 T224694
12:10 elukey: restart mcrouter on mw2235
12:05 Lucas_WMDE: EU SWAT done
{{safesubst:SAL entry|1=12:04 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: [[gerrit:514700|Revert "Specify $wgWBRepoSettings['conceptBaseUri']" (duration: 00m 56s)}}
12:00 ema: cp1075: upgrade varnish to 5.1.3-1wm10 T224694
11:55 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 8/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
11:48 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikisource --fix (T216322)
11:47 Urbanecm: running mwscript namespaceDupes.php --wiki=thwikibooks --fix for T216322
11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new namespaces for several Thai projects|gerrit:514678Add new namespaces for several Thai projects (T216322) (duration: 00m 54s)
11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove unused config variable wgWikibaseEnableSenses|gerrit:514534Remove unused config variable wgWikibaseEnableSenses (duration: 00m 55s)
11:23 gehel@cumin2001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
11:22 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/CirrusSearch/: SWAT: Fix event validation error for cirrussearch-request event|gerrit:514566Fix event validation error for cirrussearch-request event (duration: 01m 06s)
10:55 elukey: restart mcrouter on mw2163 (codfw mcrouter proxy)
10:43 mobrovac@deploy1001: scap-helm mathoid finished
10:43 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
10:43 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
10:43 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
10:30 ema: varnish 5.1.3-1wm10 uploaded to stretch-wikimedia T224694
10:19 elukey: rolling restart of mcrouter on mw1* hosts to pick up config change (batch of 5 hosts, depool/run-puppet/pool)
10:12 elukey: disable puppet on mw1* and mw[2163,2235,2255,2271] as prep step for mcrouter config deploy
10:10 fsero: rollbacked last deployment of mathoid to revision 16
09:59 mobrovac@deploy1001: scap-helm mathoid finished
09:59 mobrovac@deploy1001: scap-helm mathoid cluster codfw completed
09:59 mobrovac@deploy1001: scap-helm mathoid cluster eqiad completed
09:59 mobrovac@deploy1001: scap-helm mathoid upgrade production stable/mathoid -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
09:32 moritzm: rebooting mwdebug2002 for some tests
09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:28 moritzm: updating qemu on ganeti2004 for some tests
09:24 gehel@cumin2001: START - Cookbook sre.postgresql.postgres-init
08:38 marostegui: Stop MySQL on db1117:3322 - this will trigger haproxy alerts - T222682
07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 after upgrade T224852 (duration: 00m 53s)
07:20 marostegui: Stop MySQL on db1121 for upgrade, this will generate lag on labs hosts for s6 - T224852
07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2046 to s6 master as db2039 will be decommissioned T221533 (duration: 00m 55s)
06:31 marostegui: Start topology changes on s6 codfw to promote db2046 as master - T221533
06:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 for upgrade T224852 (duration: 00m 55s)
06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after getting its BBU replaced (duration: 00m 54s)
06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 01m 01s)
05:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 00m 55s)
05:41 marostegui: Upgrade MySQL on s6 codfw hosts in preparation for s6 codfw master failover - T221533
05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after getting its BBU replaced (duration: 00m 55s)
05:18 marostegui: Remove db2042 from tendril and zarcillo T225090
05:18 marostegui: Remove db2042 from tendril and zarcillo
05:14 marostegui: Stop MySQL on db2042 to copy its content to dbprov2001 as a temporary backup - T225090
05:11 marostegui: Disable notifications db2042 - T225090
05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after getting its BBU replaced T225060 (duration: 00m 56s)

2019-06-05

22:15 chaomodus: restarting gerrit on cobalt due to it being down (seems like Java out of heap space)
20:43 mforns@deploy1001: Finished deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to 0660e70 (duration: 19m 30s)
20:39 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Turn off some FR config T225138 (duration: 00m 54s)
20:25 akosiaris@deploy1001: scap-helm blubberoid finished
20:25 akosiaris@deploy1001: scap-helm blubberoid cluster codfw completed
20:25 akosiaris@deploy1001: scap-helm blubberoid cluster eqiad completed
20:25 akosiaris@deploy1001: scap-helm blubberoid upgrade -f blubberoid-values.yaml production stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
20:23 mforns@deploy1001: Started deploy [analytics/refinery@0660e70]: deploying analytics/refinery up to 0660e70
19:57 hashar: contint1001: docker container prune -f && docker image prune -f # reclaimed 166 MB and 3.4 GB
19:48 marostegui: Check data consistency on db1091 against db1135 - T225060
19:45 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: T225115 (duration: 00m 54s)
17:36 marostegui: Start replication db1091 - T225060
17:32 marostegui: Start MySQL with replication stopped on db1091 - T225060
16:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert user-blocks-change to use eventbus and old schema - T211248 (duration: 00m 54s)
16:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: use eventgate-main for 2 events on all wikis - T211248 (duration: 00m 55s)
16:11 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceStreamConfig and switch 2 topics in group0 T222822 (duration: 00m 56s)
16:11 XioNoX: remove BGP to AS38082 on cr4-ulsfo (left the IXP)
15:46 reedy@deploy1001: Scap failed!: Call to mwscript eval.php returned: None
15:44 reedy@deploy1001: Finished scap: Rebuild .8 i18n for FlaggedRevs (duration: 41m 14s)
15:36 moritzm: installing exim4 security updates
15:03 reedy@deploy1001: Started scap: Rebuild .8 i18n for FlaggedRevs
14:24 marostegui: Poweroff db1091 for BBU replacement - T225060
13:57 elukey: restart mcrouter on MediaWiki app/api canaries to pick up new config change (timeouts before marking a memcached shard as TKO from 3 to 10) - T203786
13:56 jijiki: enabling puppet and pooling on mw* canaries
13:17 jynus: start es2,es3 backup on codfw
13:17 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.8
13:03 hashar: restarting Jenkins
12:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 54s)
12:46 Lucas_WMDE: EU SWAT finished
12:32 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/WikimediaMessages/: SWAT: Fix wikidata copyright message (T224536)|gerrit:514460Fix wikidata copyright message (T224536) (duration: 00m 56s)
11:43 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable the new history page in the advanced mobile contributions mode (T219895)|gerrit:514449Enable the new history page in the advanced mobile contributions mode (T219895) (duration: 00m 56s)
11:27 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Remove project namespace from flaggedrevs on ruwikisource|gerrit:514413Remove project namespace from flaggedrevs on ruwikisource (T225037) (duration: 00m 54s)
10:57 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.8/extensions/FlaggedRevs: Add ext.flaggedRevs.icons to modules registeration|gerrit:514456Add ext.flaggedRevs.icons to modules registeration (duration: 00m 57s)
10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1135 (duration: 00m 55s)
10:09 godog: mount sdb3 on ms-be1022 - T225079
09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1135 with very low weight on s4 (duration: 00m 55s)
09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool without traffic db1135 into s4 T225060 (duration: 00m 55s)
09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool without traffic db1135 into s4 T225060 (duration: 00m 56s)
08:42 onimisionipe: removing maps2001 from cassandra cluster. It is going to be reimaged - T224395
08:40 _joe_: rolling restart of php7 on the api servers, to test a different strategy of restarting compared to the appservers.
08:21 _joe_: performing a rolling restart of the php appservers via cumin to test speed and safety of the operations proposed in T224857
08:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:12 moritzm: rebooting pybal-test2001 for tests with new qemu
08:12 ema: pool cp3035 w/ ATS backend T222937
08:12 marostegui: Reboot db1091 T225060
08:05 moritzm: installing qemu security updates on Ganeti hosts
07:45 marostegui: Transfer dbprov1001.eqiad.wmnet:snapshot.s4.2019-06-04--21-37-03.tar.gz to db1135 to provision it on s4 T225060
07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1091 status (duration: 00m 56s)
07:22 ema: depool cp3035 and reimage as upload_ats T222937
07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 - host went down (duration: 00m 55s)
06:45 marostegui: Restart MySQL on db2110 to get the binlog format changed to STATEMENT - T220170
06:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2090 to s4 codfw master T220170 (duration: 00m 54s)
06:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Mimic s4 codfw weights to eqiad T220170 (duration: 00m 55s)
06:17 marostegui: Start topology changes on s4 codfw to replace current master db2051 with db2090 - T220170
06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 into API (duration: 00m 54s)
05:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 after upgrade T224852 (duration: 00m 55s)
05:49 marostegui: Upgrade MySQL on db1084 T224852
05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 for upgrade T224852 (duration: 01m 06s)
05:31 marostegui: Stop MySQL on db1125 (sanitarium) s2,s4,s6,s7 to upgrade mysql - T224852
05:29 marostegui: Keep compressing tables on labsdb1012 - T222978
05:22 marostegui: Change replication topology on m3 codfw to promote db2065 as codfw master instead of db2042 - T221533
05:07 marostegui: Upgrade Mysql on labsdb1012 - T224852
04:09 onimisionipe: starting postgres slave init on maps2001 - T224395

2019-06-04

23:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change log level to debug for PageTriage (duration: 01m 03s)
22:06 eileen: civicrm revision changed from 506ebe2f2a to 5c02e62d6e, config revision is 63438eea43
21:08 jbond42: finished rolling reboots of mw1* servers
21:07 jbond42: finished tolling reboots of mw1* servers
20:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
20:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
20:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
20:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
20:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
20:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
19:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
19:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
19:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
19:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
19:48 XioNoX: replace logstash.svc.eqiad.wmnet syslog target with syslog.codfw.wmnet on cr4-ulsfo - T224128
19:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
19:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
19:41 jbond42: reboot mwdebug1002
19:36 jbond42: reboot mwdebug1001
19:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
19:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
19:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
19:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
19:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
19:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
18:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
18:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
18:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
18:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
18:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
18:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
18:10 herron: correction — performing rolling reboots of codfw logstash hardware hosts for MDS security updates
18:10 herron: performing rolling reboots of eqiad logstash hardware hosts for MDS security updates
18:06 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
18:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
18:04 bblack: pool cp3045 - T222937
17:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:38 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:33 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:58 legoktm: deleted some gerrit changes
16:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
16:33 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:32 marostegui: Compress some more tables on labsdb1012 before upgrading the host tomorrow T222978
16:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:14 bblack: repool cp3035 (still varnish-be, but freshly installed!)
16:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:12 jbond42: starting rolling reboots of mw1*
16:09 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3045.esams.wmnet
16:08 bblack: depool cp3045 for reimage - T222937
15:56 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: JADE - T212182 (duration: 00m 53s)
15:55 reedy@deploy1001: Synchronized wmf-config/extension-list: JADE - T212182 (duration: 00m 53s)
15:52 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Jade: Consistency (duration: 01m 08s)
15:50 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Configure eventgate-main EventService. No-op in prod. T211248 (duration: 01m 19s)
15:41 bblack: reboot cp3035 post-reimage
15:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Use eventgate-main in beta. No-op in prod. T211248 (duration: 00m 49s)
15:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.8
15:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:13 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:13 moritzm: draining ganeti1003 for eventual reboot to MDS-enabled Linux kernel
15:13 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache (duration: 29m 46s)
15:04 moritzm: failover Ganeti master in eqiad to ganeti1001
14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:51 bblack: depool cp3035 for ATS reimage - T222937
14:43 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.8 and rebuild l10n cache
14:41 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.5 [keeping static files] (duration: 01m 38s)
14:39 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 01m 34s)
14:36 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 (duration: 11m 02s)
13:53 jbond42: restart mtail on lithium
13:46 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
13:46 fsero@cumin1001: START - Cookbook sre.hosts.decommission
13:30 jbond42: starting rolling reboots of mw1*
13:12 moritzm: draining ganeti1008 for eventual reboot to MDS-enabled Linux kernel
12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:22 Urbanecm: ran mwscript deleteBatch.php --wiki=sawikisource -r 'T214553|phab:T214553T214553: deleting useless red
12:13 akosiaris: restart pybal on lvs2003, lvs1015 for sessionstore LVS configuration. T220401
12:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 46s)
12:04 akosiaris: restart pybal on lvs2006 for sessionstore LVS configuration. T220401
11:40 akosiaris: restart pybal on lvs1015 for sessionstore LVS configuration. T220401
11:39 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/includes/: T221577 / 1286d131c01886 (duration: 01m 04s)
11:39 jijiki: enabling puppet on mc1*
11:38 Urbanecm: run mwscript namespaceDupes.php --wiki=kuwiktionary --fix (T224327)
11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Custom namespaces for ku.wiktionary|gerrit:514239Custom namespaces for ku.wiktionary (T224327) (duration: 00m 46s)
11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote (2/2, T222065) (duration: 00m 47s)
11:34 urbanecm@deploy1001: Synchronized static/images/project-logos/: Add localized project logo for sahwikiquote|gerrit:507931Add localized project logo for sahwikiquote (1/2, T222065) (duration: 00m 47s)
11:31 jijiki: enabling puppet on mc2*
11:29 Urbanecm: running mwscript namespaceDupes.php --wiki=sawikisource --add-prefix=T214553 --fix (T214553)
11:28 Urbanecm: run mwscript namespaceDupes.php --wiki=thwiki --fix (T216322)
11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add Author namespace in Sanskrit Wikisource|gerrit:486221Add Author namespace in Sanskrit Wikisource (T214553) (duration: 00m 46s)
11:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary (2/2, T216885) (duration: 00m 47s)
11:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create new protection levels for dewiktionary|gerrit:495918Create new protection levels for dewiktionary (1/2, T216885) (duration: 00m 47s)
11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add editcontentmodel right to the templateeditor group on testwiki|gerrit:494016Add editcontentmodel right to the templateeditor group on testwiki (T217499) (duration: 00m 47s)
11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new namespaces for th.wiki|gerrit:491054Add new namespaces for th.wiki (T216322) (duration: 00m 47s)
11:09 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/: T221577 / 1286d131c01886 (duration: 01m 07s)
11:02 moritzm: draining ganeti1007 for eventual reboot to MDS-enabled Linux kernel
11:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:44 jbond42: mw1* restarts will be delayed untill 11:15
10:42 jbond42: will start rolling reboots of mw1* servers 1t 10:50
09:27 moritzm: draining ganeti1006 for eventual reboot to MDS-enabled Linux kernel
09:25 jijiki: disable puppet on mc* hosts to merge 511963 and 511973
09:01 moritzm: draining ganeti1005 for eventual reboot to MDS-enabled Linux kernel
08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:32 elukey: remove memcached nutcracker config from mw1* hosts (not used). Changes will be picked up when nutcracker will be restarted (after reboots, etc..) - T214275
08:23 moritzm: draining ganeti1004 for eventual reboot to MDS-enabled Linux kernel
08:04 marostegui: Stop MySQL on db2046 to clone db2058 - T221533
08:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 47s)
08:03 elukey: restart hive-server2 on an-coord1001 to pick up new GC/Heap settings
07:35 mobrovac@deploy1001: Finished deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - T210651 (duration: 19m 16s)
07:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:21 moritzm: draining ganeti1002 for eventual reboot to MDS-enabled Linux kernel
07:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2058 from s4 to s6 (duration: 00m 47s)
07:16 mobrovac@deploy1001: Started deploy [restbase/deploy@abcb534]: Use only Proton for PDF rendering - T210651
06:57 elukey: restart hive metastore on an-coord1001 to apply new GC/heap settings
06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 after upgrade (duration: 00m 48s)
06:21 elukey: restart pdfrender on scb1002 (flapping)
06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 after upgrade (duration: 00m 47s)
05:54 marostegui: Stop MySQL on db2078:m3 - T221533
05:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 after upgrade (duration: 00m 47s)
05:40 marostegui: Stop MySQL on db1091 for MySQL upgrade T224852
05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 for upgrade (duration: 00m 48s)
05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097 after upgrade (duration: 00m 46s)
05:19 marostegui: Stop MySQL on db1097 for upgrade
05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade (duration: 00m 47s)
04:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1081 from API (duration: 00m 49s)
01:10 bstorm_: T223406 depooled/repooled labsdb1009 for view updates
00:09 bstorm_: T223406 repooled labsdb1011 after completing view updates

2019-06-03

22:20 bstorm_: T223406 depooled labsdb1011
22:09 bstorm_: T223406 repooled labsdb1010 after completing view updates
21:29 XioNoX: drop all ICMP frag on all routers - T224186
19:57 XioNoX: stop sampling from cr2-eqiad
18:48 XioNoX: Add RPKI validators to all routers - T220669
18:35 hashar: switch most Quibble jobs to node 10 T222406 - ttps://gerrit.wikimedia.org/r/#/c/integration/config/+/514034/ T222406
18:35 XioNoX: drop all ICMP frag on cr1/2-eqiad - T224186
18:17 XioNoX: add routinator 0.4.0 to APT repo - T220669
17:16 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4 (duration: 11m 29s)
17:05 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9e3035c]: Blazegraph version wmf.4
16:40 onimisionipe: started osm-import on maps2004 - T224395
16:30 bstorm_: T223406 depooled labsdb1010 for view updates
15:39 bstorm_: T223406 labsdb1012 updated views for actor table changes
14:46 akosiaris: deploy kask in sessionstore kubernetes namespace in eqiad, codfw T220401
14:34 arturo: T221769 reimaging cloudservices1003 to stretch
14:20 vgutierrez: upgrading acme-chief to version 0.17 in acme-chief production instances - T220518
13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:53 moritzm: draining ganeti1001 for eventual reboot to MDS-enabled Linux kernel
13:44 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Drop caption edit counter unlock delay to 0 (duration: 00m 49s)
13:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1138 into s4 API (duration: 00m 48s)
13:19 marostegui: Move db2078:3321 under db2062 T220170
13:03 arturo: add prometheus-pdns-rec-exporter v0.7 to stretch-wikimedia (T224877)
12:56 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on remaining wikis (T188327) (duration: 00m 48s)
12:24 arturo: add prometheus-pdns-exporter v0.4 to stretch-wikimedia (T224877)
11:28 gehel: reboot relforge for microcode + jvm upgrade
11:17 jijiki: Restarting php7.2-fpm in eqiad in batches of 2 for 513949
11:15 Urbanecm: EU SWAT done
11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki|gerrit:513740Add Wikiprojekti namespace to wgExtraSignatureNamespaces for fiwiki (T224215) (duration: 00m 47s)
11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add 5 active namespaces for VisualEditor on en.wikiversity|gerrit:503680Add 5 active namespaces for VisualEditor on en.wikiversity (T220881) (duration: 00m 48s)
11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add "Zerrenda" (list) namespace to VisualEditor on euwiki|gerrit:513720Add "Zerrenda" (list) namespace to VisualEditor on euwiki (T224801) (duration: 00m 48s)
10:52 moritzm: upgrading maps servers to new Java security release
10:47 moritzm: upgrading WDQS servers to new Java security release
10:42 vgutierrez: upgrading prometheus-trafficserver-exporter in upload_ats ulsfo instances
10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546) (duration: 00m 47s)
10:40 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:513972 Bumping portals to master (T128546) (duration: 00m 49s)
10:36 jijiki: Restarting php7.2-fpm in codfw in batches of 2 for 513949
10:34 moritzm: upgrading Elastic servers to new Java security release
10:26 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service (duration: 03m 15s)
10:23 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5046f3c]: Update the recommendation API service
10:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=kartotherian
10:02 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=kartotherian
09:48 onimisionipe: depooled maps codfw due to lag and disk issues - T224395
09:46 moritzm: upgrading Druid/Kafka-Jumbo servers to new Java security release (will be picked up by forthcoming MDS reboots)
09:43 moritzm: upgrading AQS servers to new Java security release (will be picked up by forthcoming MDS reboots)
09:33 moritzm: upgrading Hadoop servers to new Java security release (will be picked up by forthcoming MDS reboots)
08:18 ema: cp1077: restart varnish-be
08:17 elukey: manually removed phab_clean_tmp from www-data's crontab on phab1001 to reduce cronspam
08:16 ema: cp1075: restart varnish-be
08:03 marostegui: Stop MySQL on db1064 T223217
08:01 marostegui: Remove db1064 from tendril and zarcillo T223217
07:58 elukey: refresh field list for logstash (via kibana Management -> Index patterns -> etc..)
07:48 marostegui: Repool db1103 after upgrade T224852
07:29 marostegui: Stop MySQL on db1103 (s2 and s4) for upgrade T224852
07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 for upgrade (duration: 00m 47s)
07:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API after upgrade (duration: 00m 48s)
06:50 elukey: roll restart varnishkafka (via puppet) for a config change - T224236
06:46 kartik@deploy1001: scap-helm cxserver finished
06:46 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
06:45 kartik@deploy1001: scap-helm cxserver finished
06:45 kartik@deploy1001: scap-helm cxserver cluster codfw completed
06:45 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
06:44 kartik@deploy1001: scap-helm cxserver finished
06:44 kartik@deploy1001: scap-helm cxserver cluster staging completed
06:44 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
06:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API after upgrade (duration: 00m 49s)
06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 after upgrade (duration: 00m 46s)
06:04 marostegui: Stop MySQL on db1081 for upgrade - T224852
06:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 for upgrade (duration: 00m 47s)
05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 T213422 (duration: 00m 46s)
05:45 marostegui: Upgrade mariadb on dbstore1004 - T224852
05:17 marostegui: Upgrade MariaDB on codfw hosts in preparation for s4 master failover T217396
05:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 T213422 (duration: 00m 46s)
05:05 marostegui: Remove db2037 from tendril and zarcillo T224720
05:04 marostegui: Stop MySQL on db2037 for decommission T224720
04:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 T213422 (duration: 00m 51s)

2019-06-02

20:28 onimisionipe: pooled wdqs1007. It caught up on lag
15:24 onimisionipe: depooled wdqs1007 to catch up on lags
15:22 onimisionipe: depool wdqs internal cluster to allow them catch up on lags. depool one at a time
03:09 andrewbogott: restarting pdns-recursor on cloudservices 1003 and 1004 (but not at the same time)

2019-06-01

22:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: T224812 / bd4fbfddbe1a0 (duration: 01m 07s)

2019-05-31

21:47 aaron@deploy1001: Synchronized wmf-config/db-eqiad.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 47s)
21:46 aaron@deploy1001: Synchronized wmf-config/db-codfw.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 50s)
21:10 bblack: cp3034: repool - T222937
20:04 bblack: cp3034: depool for reimage - T222937
18:44 marostegui: Start MySQL on es1019 - T213422
18:34 jgleeson: payments-wiki updated from a76658f0a3 to c6c7bbf71e
17:29 andrewbogott: added jeh to the 'ops' group in ldap
16:20 ariel@deploy1001: Finished deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now (duration: 00m 03s)
16:20 ariel@deploy1001: Started deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now
15:05 bblack: cp3039: restart varnish-be for mbox lag (likely induced by 3049's depool for ATS conversion!)
15:00 Krinkle: krinkle@deploy1001: pulling down 6f91b41 for php-1.34-wmf.7/extensions/ORES (without deploy), commit seems test-only
14:59 Krinkle: krinkle@deploy1001: git status in php-1.34-wmf.7/ is dirty (extensions/ORES)
14:52 bblack: pool cp3049 back into service - T222937
14:32 onimisionipe: depool maps2004 (again) - T224395
14:32 elukey: powercycle notebook1003 - host stuck due to user processes, no ssh available, OOM didn't trigger
14:20 _joe_: rolling restart of php-fpm across production to pick up the shorter revalidate frequency for T224491
14:10 bblack: reboot cp3049 - T222937
13:16 bblack: depool cp3049 for reimage - T222937
11:46 jynus: stop and upgrade db2084
11:09 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after maintenance (duration: 00m 48s)
10:54 jynus: depool labsdb1010 for maintenance
10:47 arturo: merging multiple commits to labs/private.git. We now require `puppet-merge --labsprivate` and people may not be yet aware of that
09:28 jynus: stop and upgrade db2073
09:11 jynus: stop and upgrade db2095 (s2, s4, s6, s7)
08:33 jynus: upgrade and restart db2065
08:16 jynus: depool labsdb1011 for maintenance
07:54 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 with low weight (duration: 00m 49s)
07:43 _joe_: restarting php-fpm on canaries
07:24 _joe_: repooling mw1348
07:24 jynus: upgrade and restart labsdb1009
07:15 _joe_: draining mw1348 from traffic
07:14 jynus: depool labsdb1009 for maintenance
06:55 jynus: upgrade and restart db2058
06:33 _joe_: repooled mw1348
06:21 jijiki: depool mw1348
06:16 _joe_: restarting php-fpm on mw1348
00:08 jgleeson: Updating civicrm from bb4acf3d8a to e028bfcd63

2019-05-30

23:36 XioNoX: remove BGP sessions to starhub on cr4-ulsfo (left the IXP)
22:59 marxarelli: deleted 95 docker images from contint1001, freeing ~ 8G on / cc: T219850
22:59 XioNoX: add terms to drop specific icmp frag packets from cr1/2-eqiad - T224186
22:53 marxarelli: deleting stale docker images from contint1001, cc: T207707 T219850
22:25 mutante: phab2001 / phab1003 - why is 'git status' in /srv/phab/phabricator unclean with lots of file deletions but also not identical
22:24 mutante: phab2001 - scap pull - but it fails with directory /srv/mediawiki not found that's so wrong
22:20 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/WikimediaEvents/: Avoid division by zero warnings T224686 (duration: 00m 49s)
22:19 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage/: Fix broken feed - T224693 (duration: 00m 51s)
21:27 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on test2wiki db, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
21:12 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on testwiki db, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
21:11 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on enwiki, based on PageTriageTagsPatch-recreated.sql. T224693, T189929
21:10 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage: Bump wgPageTriageCacheVersion T224693 (duration: 00m 51s)
21:07 XioNoX: add RPKI sessions on cr4-ulsfo - T220669
20:39 twentyafterfour: phabricator: restart ssh-phab.service
19:49 mutante: sodium (mirrors) - sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
18:49 Urbanecm: Morning SWAT finished
18:47 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/: QuestionPoster: Correctly set timestamp when question is posted|gerrit:513300QuestionPoster: Correctly set timestamp when question is posted (T223338) (duration: 00m 51s)
18:26 mutante: phab1003 - switch 'vcs' user to 'NP' to match phab1001 setup and then /srv/phab/phabricator# ./bin/config set diffusion.ssh-user vcs (T224677)
18:24 XioNoX: bounce eqord-ulsfo interface to try to fix BFD sessions
18:12 Krinkle: Running `php7adm /opcache-free` on mw1348 and mw1321, T224491
18:12 Krinkle: Running `php7adm /opcache-free` on mw1348 and mw1321
18:11 Krinkle: mw1348 (recent api/php72 100% experiment) shows signs of corruption
18:11 Krinkle: mw1321 php7.2 shows signs of corruption for over 2 hours – https://phabricator.wikimedia.org/T224491#5224464
18:03 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: (no justification provided) (duration: 00m 53s)
16:24 bblack: re-pool cp3047 into service as ats-be - T222937
16:04 mutante: phab1001 - removing 2620:0:861:103:10:64:32:186/128 from eth0
16:03 mutante: phab1001 - removing 10.64.32.186/32 from eth0
16:01 mutante: phab1001 - removing git-ssh.wm.org IP from interface - phab1003 - activating IPv6 listen address for git-ssh
15:36 jynus: stop es1019 for maintenance T213422
15:26 cmjohnson1: shutting down db1099 to swap DIMM T221502
15:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with full weight; depool es1019 (duration: 00m 52s)
15:19 herron: performing rolling reboots of eqiad kafka main cluster hosts for security updates
15:06 onimisionipe: pooled maps2004 - osm import is complete - T224395
14:44 andrewbogott: reimaging cloudvirtan1001 for T224566
14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:42 andrewbogott: reimaging cloudvirtan1001
14:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:22 bblack: rebooting cp3047 (post-reimage/puppetization for T222937)
14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:57 jijiki: enable puppet on mw* in eqiad
13:44 volans: rm /root/.ssh/known_hosts on cumin[12]001
13:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:36 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.7
13:28 jijiki: Enabling puppet on mw*.codfw.net
13:22 zfilipin@deploy1001: Synchronized php-1.34.0-wmf.7/resources/src/jquery/jquery.suggestions.js: SWAT: [[gerrit:513237|jquery.suggestions: Do not show suggestions on prefilled values ([T224524])]] (duration: 00m 58s)
13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1015.eqiad.wmnet
13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1014.eqiad.wmnet
13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1013.eqiad.wmnet
13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1012.eqiad.wmnet
13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1011.eqiad.wmnet
13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1010.eqiad.wmnet
13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1009.eqiad.wmnet
13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1008.eqiad.wmnet
13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1007.eqiad.wmnet
13:08 bblack: cp3047 puppet-disable + depool for reimage to ATS - T222937
13:03 marostegui: Stop MySQL on db1099 for onsite maintenance - T221502
13:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 T221502 (duration: 00m 56s)
13:00 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/tests/phpunit/includes/: T222628 (duration: 01m 06s)
12:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/includes/Linker.php: T222628 (duration: 01m 04s)
12:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:34 akosiaris: reboot ganeti2003 for kernel upgrades
11:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:14 _joe_: freed opcache on mw1281
11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:05 Urbanecm: EU SWAT finished
11:04 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: gerrit:Enable abusefilter blocking ability in plwiki (T224617) (duration: 00m 58s)
11:00 jijiki: Disable puppet on mw* servers to merge 507939 - T219150
10:42 jynus: upgrade and restart db1117 (temporary proxy fail for passive host, reduced redundancy for m*)
10:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:19 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
10:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:07 jynus: upgrade and restart test-s4 hosts (db1111, db1112)
09:42 jynus: stop and upgrade db1102
09:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
09:31 _joe_: depooling mw1261 for benchmarking for T224491
09:26 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 55s)
08:54 jynus: stop and restart db1089 for upgrade
08:50 onimisionipe: maps2001 postgres initialization - T224395
08:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance (duration: 00m 57s)
08:32 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2087 for maintenance (duration: 01m 00s)
08:10 mobrovac: drop old Parsoid tables from cassandra -- T223998
07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956 (duration: 19m 28s)
07:33 _joe_: upgraded service-checker on icinga1001,2
07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956
00:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2091 - T224393 (duration: 00m 56s)
00:24 mutante: re-enabling puppet on phab1001 now that it does not have the phab role anymore (T221389)
00:17 mutante: rsyncing /srv/repos again. pulling on phab2001 from phab1003 (T221389)

2019-05-29

23:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wikibase sameAs A/B test config, part II (duration: 00m 56s)
23:36 jforrester@deploy1001: sync-file aborted: Remove wikibase sameAs A/B test config, part I (duration: 00m 00s)
23:35 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove wikibase sameAs A/B test config, part I (duration: 00m 56s)
23:26 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/AbuseFilter/includes/parser/AbuseFilterTokenizer.php: SWAT AbuseFilter: Tokenizer caching back to APC I8c6a4a95e (duration: 00m 54s)
23:19 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Replace FR constants with numbers Ia52f644948 (duration: 00m 56s)
23:17 jforrester@deploy1001: Synchronized multiversion/MWScript.php: Mark refreshMessageBlobs.php as a global script (duration: 00m 56s)
23:15 mutante: repooled phab2001-vcs , fixes pybal / lvs alerts
23:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
23:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable wgSpecialSearchFormOptions on production Wikidata T55652 (duration: 00m 57s)
23:01 mutante: phab2001 - same issue with tin.eqiad.wmnet still showing up when first trying to git clone
22:52 mutante: misweb2001 - a2dismod mpm_event ; systemctl restart apache2 to fix php7.0 dependency issue
22:50 mutante: miscweb2001 - when first trying to git pull iegreview - still tries to resolve 'tin.eqiad.wmnet' which is long gone. fix is still to manually edit /srv/deployment/iegreview/iegreview-cache/cache/.git/config
22:46 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Hot-deploy T224634 to fix CirrusSearch for extension registration (duration: 00m 57s)
21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
21:47 mutante: installing OS on miscweb2001 VM failed at grub install step :( T224323
21:47 mutante: sign puppet cert request for phab2001 after reinstall (for some reason it needed me to connect to console and hit enter, reimage script itself was stuck)
20:54 mutante: creating new ganeti VM miscweb2001.codfw.wmnet with same specs as krypton.eqiad.wmnet (T224323)
20:35 arlolra: Updated Parsoid to 8546c79 (T219927, T211125)
20:35 ejegg: updated payments-wiki from 332aaa96e2 to 45b73e7749
20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@6caac43]: Updating Parsoid to 8546c79 (duration: 07m 46s)
20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@6caac43]: Updating Parsoid to 8546c79
20:10 bblack: pool cp3044 (esams cache_upload ats-be) - T222937
19:46 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 00m 57s)
19:45 XioNoX: enable cr1-codfw:et-0/2/1 - T224511
19:45 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 01m 01s)
19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
19:32 mutante: phab2001 - reinstalling with stretch - upgrade from jessie (T190568)
19:09 XioNoX: enable cr1-codfw:et-0/2/0 - T224511
18:37 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
17:44 XioNoX: enable cr1-codfw:et-0/0/1 - T224511
17:13 XioNoX: enable cr1-codfw:et-0/0/0 - T224511
17:02 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 3/3 (T220186) (duration: 00m 56s)
17:00 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 2/3 (T220186) (duration: 00m 56s)
16:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change arwiki default user preferences|gerrit:501926Change arwiki default user preferences, part 1/3 (T220186) (duration: 00m 56s)
16:48 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:512942 Revert: Hardcode korean help desk config (duration: 00m 56s)
16:45 sbisson@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: gerrit:512941 Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 00m 56s)
16:42 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: gerrit:512940 Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 01m 00s)
16:32 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel/QuestionRecord.php: SWAT: gerrit:512950 Revert: Fix phan job: ignore line using JsonSerializable (duration: 00m 57s)
16:08 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
15:55 jynus: upgrade and restart db2087
15:11 moritzm: draining ganeti2008 for eventual reboot to pick up MDS-enabled kernel
15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:06 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 1 (T188327) (duration: 00m 57s)
14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:54 moritzm: draining ganeti2007 for eventual reboot to pick up MDS-enabled kernel
14:51 XioNoX: `request chassis fpc online slot 0` on cr1-codfw - T224511
14:48 XioNoX: `request chassis fpc offline slot 0` on cr1-codfw - T224511
14:47 XioNoX: disable et- interfaces on cr1-codfw - T224511
14:45 andrewbogott: reimaging cloudcontrol1003 T221770
14:34 moritzm: draining ganeti2006 for eventual reboot to pick up MDS-enabled kernel
14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
14:32 andrewbogott: powering off cloudcontrol1003 as one last check to see what explodes before I reimage it
14:30 _joe_: installing the new service checker on restbase in eqiad
14:29 _joe_: installing new service checker version on restbase in codfw
14:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:58 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
13:58 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:48 urandom: decommissioning restbase1015-c -- T223976
13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:19 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.7 (duration: 00m 58s)
13:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.7
13:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:12 Urbanecm: mwscript emptyUserGroup.php --wiki=fawiki 'uploader' finished (T221441)
13:06 andrewbogott: stopping openstack services on cloudcontrol1003 in anticipation of a re-image
13:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
13:02 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
13:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
12:42 Zppix: [12:27:02] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:41 Zppix: [12:27:02] jbond@cumin1001 START - Cookbook sre.hosts.downtime
12:40 Zppix: [12:23:06] <jijiki> Rolling restart pdfrender on scb*
{{safesubst:SAL entry|1=12:39 Zppix: [[12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime}}
12:39 Zppix: [12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime
12:38 Zppix: [12:11:55] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:38 Zppix: [12:11:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
12:37 Zppix: [12:01:54] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0
12:36 Zppix: [12:01:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
12:36 Zppix: [12:00:21] marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2037 from config as it will be decommissioned T221533 (duration: 00m 56s)
12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
12:34 Zppix: [11:59:19] marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2037 from config as it will be decommissioned T221533
12:33 Zppix: [11:58:16] <arturo> T221770 icinga downtime cloudcontrol1003.wikimedia.org for upcoming rebuild as stretch
12:32 Zppix: [11:57:57] aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:32 Zppix: [11:57:55] aborrero@cumin1001 START - Cookbook sre.hosts.downtime
12:31 Zppix: [11:55:54] <Urbanecm> EU SWAT finished, maintenance script emptyUserGroup.php still running in separate tmux session
12:31 Zppix: [11:55:11] urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set wgLocaltimezone for euwiki to Europe/Berlin|gerrit:511849Set wgLocaltimezone for euwiki to Europe/Berlin (T224091) (duration: 00m 57s)
12:30 Zppix: [11:55:10] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
12:29 Zppix: [11:55:09] jbond@cumin1001 START - Cookbook sre.hosts.downtime
11:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site|gerrit:471260RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site (T208458) (duration: 00m 57s)
11:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:46 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
11:45 Urbanecm: Started mwscript emptyUserGroup.php --wiki=fawiki 'uploader' (T221441)
11:44 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed, part 2 (T221441) (duration: 00m 55s)
11:43 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove uploader user group from fawiki and merge it with autoconfirmed|gerrit:505228Remove uploader user group from fawiki and merge it with autoconfirmed, part 1 (T221441) (duration: 00m 55s)
11:40 Urbanecm: Purged angwikibooks HD logos
11:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: Add HD logo for angwikibooks|gerrit:512433Add HD logo for angwikibooks, logo files (T150618) (duration: 00m 56s)
11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable transwiki import between sqwiki and sqwikiquote|gerrit:512478Enable transwiki import between sqwiki and sqwikiquote (T221234) (duration: 00m 56s)
11:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:30 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:509130 Enable Advanced Mobile Contributions Overflow menu (T223883) (duration: 00m 57s)
11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove bureaucrat protection level for all Serbian projects|gerrit:512488Remove bureaucrat protection level for all Serbian projects (T217005) (duration: 00m 57s)
11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix Serbian projects wgRestrictionLevels|gerrit:512487Fix Serbian projects wgRestrictionLevels (T217005) (duration: 00m 57s)
11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add namespace aliases on zhwiktionary|gerrit:506892Add namespace aliases on zhwiktionary (T222024) (duration: 00m 57s)
11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:59 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
10:57 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2087 for maintenance (duration: 01m 11s)
10:57 Urbanecm: deleteBatch.php for srwikinews finished (T212346)
10:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:33 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3 (duration: 03m 36s)
10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3
09:51 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
09:45 _joe_: uploading a new service-checker version to jessie-wikimedia
09:18 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
08:51 moritzm: draining ganeti2002 for eventual reboot to pick up MDS-enabled kernel
08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:31 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:31 moritzm: draining ganeti2001 for eventual reboot to pick up MDS-enabled kernel
07:42 mobrovac: decommission restbase1015-b -- T223976
07:40 godog: ms-be2043 start sdd rebuild - T222654
07:03 jijiki: restarting pdfrender on scb1003

2019-05-28

23:19 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/ApiTimedText.php: T224522 Fix fatal in ApiTimedText following redirect pages (duration: 00m 56s)
23:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: T224367 Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 57s)
23:17 bstorm_: T221339 completed view updates on labsdb1009 without depooling
23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: T224367 Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 56s)
23:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/ApiTimedText.php: T224522 Fix fatal in ApiTimedText following redirect pages (duration: 00m 58s)
23:11 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: FlaggedRevisions: Copy in rest of the config, for static registration I77d70519f Id0cd2e18c (duration: 00m 56s)
23:10 bstorm_: T221339 repooled labsdb1011
23:06 jforrester@deploy1001: Synchronized wmf-config/throttle.php: Remove expired throttle rules I4ba3d489 (duration: 00m 55s)
23:06 bstorm_: T221339 depooled labsdb1011 and updated views
23:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T55652 Enable wgSpecialSearchFormOptions on testwikidata (duration: 00m 56s)
22:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Fix order of edit tabs for multi-tabs on SET wikis T223793 (duration: 00m 57s)
22:28 cstone_: Re-enabled fundraising thank you mail job
22:25 mutante: cp3034 - sudo -i varnish-backend-restart
22:18 cstone_: Updated fundraising civicrm from 21afd001b6 to bb4acf3d8a
22:14 mutante: cp3035 - varnish-backend-restart
22:13 bstorm_: repooled labsdb1010
22:09 mutante: cp3034 - restart varnish backend
22:09 XioNoX: restart varnish backend on cp3039
22:02 cstone_: Disabled fundraising thank you mail job
21:46 bstorm_: depool labsdb1010 for view updates
21:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update (duration: 14m 37s)
21:35 urandom: decommissioning restbase1015-a -- T223976
21:24 smalyshev@deploy1001: Started deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update
21:23 ebernhardson: restart elasticsearch on cloudelastic1001 to test sanely sized readahead on /dev/dm-0
21:11 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
20:58 mutante: phab1003 / phab2001 - removing 'apache restart' from root's crontab (gerrit:512977) (T187790)
20:28 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Update caption edit target counts (duration: 00m 57s)
19:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
19:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1064 from config as it will be decommissioned T223217 (duration: 00m 55s)
19:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1064 from config as it will be decommissioned T223217 (duration: 00m 56s)
19:02 marostegui: Reboot db2091 for full OS and MySQL upgrade - T224393
18:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMediaInfoEnableFilePageDepicts, no longer read (duration: 00m 57s)
18:51 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Add forwards-compatibility for dataCdnMaxAge (duration: 01m 00s)
18:11 marostegui: Start mysql for s2 and s4 on db2091 T224393
17:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
17:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:42 moritzm: rebooting yubiauth* servers for kernel update
17:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
17:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0735c45]: Update mobileapps to ab67b78 (duration: 05m 56s)
17:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0735c45]: Update mobileapps to ab67b78
17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:35 hoo: Ran scap pull on mw1240 (curl -H 'Host: www.wikidata.org' … mw1240.eqiad.wmnet/wiki/Special:SetEntitySchemaLabelDescriptionAliases/E10/en returned 404)
16:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1271:~$ scap pull
16:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
16:15 moritzm: rearmed keyholder on deploy2001 following reboot
16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
16:09 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
16:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:54 papaul: shutting down db2091 for firmware upgrade
15:53 godog: put back wrongly-replaced sdf on ms-be2043 - T222654
15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:42 Lucas_WMDE: Extension:EntitySchema deployment finished successfully
15:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=wikidatawiki
15:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable extension EntitySchema in production|gerrit:512909Enable extension EntitySchema in production (duration: 00m 56s)
15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:34 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: Steal maintenance script user|gerrit:512911Steal maintenance script user (duration: 00m 58s)
15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
15:17 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: Steal maintenance script user|gerrit:512912Steal maintenance script user – forgot `git submodule update` before previous sync (duration: 00m 57s)
15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: Steal maintenance script user|gerrit:512912Steal maintenance script user (duration: 00m 59s)
15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:01 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
14:57 jbond42: reboot ms-be2016
14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:36 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
14:30 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.7
14:10 herron: beginning rolling reboots of codfw kafka-main cluster for security updates
14:10 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache (duration: 34m 22s)
14:04 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
13:50 _joe_: hhvm restarted on mwdebug1001
13:48 _joe_: stopping hhvm on mwdebug1001 for testing
13:39 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
13:35 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
13:32 gilles@deploy1001: Finished deploy [performance/asoranking@60369cc]: T224388 (duration: 00m 03s)
13:31 gilles@deploy1001: Started deploy [performance/asoranking@60369cc]: T224388
13:31 gilles@deploy1001: deploy aborted: T224388 (duration: 00m 01s)
13:31 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: T224388
13:24 urandom: decommissioning restbase1014-c -- T223976
13:23 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
12:55 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
12:51 gilles@deploy1001: Finished deploy [performance/asoranking@1c60db1]: T224388 (duration: 00m 04s)
12:50 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: T224388
12:40 gilles@deploy1001: Finished deploy [performance/asoranking@157c25f]: T224388 (duration: 00m 06s)
12:40 gilles@deploy1001: Started deploy [performance/asoranking@157c25f]: T224388
12:13 raynor: EU SWAT done
12:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:512743 Disable the rdf2latex Collection portlet format(T224433) (duration: 00m 55s)
12:00 raynor: EU SWAT re-opened
11:58 Lucas_WMDE: EU SWAT done
11:54 Lucas_WMDE: ^ error, no change to wiki
11:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
11:52 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: SWAT: Add maintenance script to create preexisting Schemas|gerrit:512689Add maintenance script to create preexisting Schemas + Small maintenance script adjustments|gerrit:512717Small maintenance script adjustments (duration: 00m 54s)
11:48 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema: SWAT: Skip configured IDs|gerrit:512677Skip configured IDs (duration: 00m 57s)
11:43 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add a list of IDs to skip in production|gerrit:511753Add a list of IDs to skip in production (duration: 00m 54s)
11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config: SWAT: Add feature flag config for breaking Wikibase API change (T223300)|gerrit:510204Add feature flag config for breaking Wikibase API change (T223300) (duration: 00m 54s)
11:31 Urbanecm: Ran namespaceDupes.php for urwikibooks, urwikiquote, urwiktionary and aswikisource
11:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects|gerrit:512426Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects (T223039) (duration: 00m 54s)
11:25 arturo: merging change to the puppet sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/508311
11:18 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308)|gerrit:512422Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308) (duration: 02m 36s)
10:54 zfilipin@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4182265560" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 03m 00s)
10:51 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
10:48 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 [keeping static files] (duration: 01m 32s)
10:45 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 06m 06s)
09:32 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow MW to honour the X-Request-Id header if set - T201409 (duration: 01m 12s)
09:28 moritzm: installing php5 security updates
09:00 moritzm: installing ffmpeg security updates
08:58 gehel: rebooting wdqs nodes for kernel upgrade
08:54 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148 (duration: 01m 21s)
08:52 jiji@deploy1001: Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob to PHP7 - T219148
08:52 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf3 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
08:47 vgutierrez: uploaded acme-chief 0.17 to apt.wikimedia.org (buster) - T220518 T213820
08:40 volans: T224448 sudo cumin -b 15 -p 95 'R:git::clone' 'run-puppet-agent -q --failed-only'
08:29 volans: restarting gerrit due to stack threads - T224448
07:17 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf1 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
07:02 mobrovac: decommission restbase1014-b -- T223976
06:40 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 20% of anonymous users to PHP7.2 - T219150 (duration: 00m 51s)
00:38 urandom: decommissioning restbase1014-a -- T223976

2019-05-27

23:19 thcipriani: gerrit back after restarting due to T224448
23:10 thcipriani: restarting gerrit due to active threads being stuck being a sendemail thread.
22:52 gilles@deploy1001: Finished deploy [performance/asoranking@bacfc37]: T224388 (duration: 00m 05s)
22:52 gilles@deploy1001: Started deploy [performance/asoranking@bacfc37]: T224388
22:19 gilles@deploy1001: Finished deploy [performance/asoranking@d0c156e]: T224388 (duration: 00m 05s)
22:19 gilles@deploy1001: Started deploy [performance/asoranking@d0c156e]: T224388
20:19 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 06s)
20:19 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
18:41 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/rdbms: 66556bf37e8 / T223310, T223978 (duration: 00m 50s)
18:06 krinkle@deploy1001: Synchronized errorpages/: 4ffcbfc2ba3 (duration: 00m 48s)
17:56 andrewbogott: re-imaging cloudservices1004 in order to make sure our apt magic is working properly
17:37 andrewbogott: refreshing puppet-compiler facts
16:40 volans: removed unreferenced files in /etc/dhcp/ on install[12]002
16:34 mobrovac: decommission restbase1013-c - T223976
15:40 akosiaris: initialize termbox namespace on eqiad/codfw/staging kubernetes clusters T220402
15:36 akosiaris: initialize sessionstore namespace on eqiad/codfw/staging kubernetes clusters T220401
13:03 godog: swift eqiad-prod: ms-be1033 weight to 0 - T223518
11:33 onimisionipe: starting osm initial import on maps2004 - T224395
10:35 mobrovac: decommission restbase1013-b - T223976
10:31 onimisionipe: rebooting maps2004 - cassandra unit failed and got stuck
09:59 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148 (duration: 01m 09s)
09:58 jiji@deploy1001: Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage to PHP7 - T219148
09:52 _joe_: disabling puppet on mw1261, running some tests for T223180
08:52 arturo: 1 day downtime systemd check for cloudcontrol1003
08:27 jiji@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2091 - T224393 (duration: 00m 49s)
08:03 gehel: depool maps2004 - T224395
07:05 gehel: running nodetool repair on maps2004 -T224395
04:23 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 28s)
04:23 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
02:59 urandom: decommissioning restbase1013-a -- T223976

2019-05-26

20:39 urandom: decommissioning restbase1012-c -- T223976
14:09 urandom: decommissioning restbase1012-b -- T223976
13:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/debug: T187147 / 2be7aa4bc4af36 (duration: 00m 51s)
08:01 mobrovac: decommission restbase1012-a - T223976

2019-05-25

22:41 urandom: decommissioning restbase1011-c -- T223976
22:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/Linker.php: T222628 / c735a545df3a (duration: 00m 51s)
19:12 andrewbogott: reimaging cloudservices1004 with Stretch
13:46 urandom: decommissioning restbase1011-b -- T223976
12:28 godog: bounce thumbor on thumbor1002
12:21 godog: bounce thumbor on thumbor1002
11:48 _joe_: restarted tumbor-instances on thumbor1001
09:20 mobrovac: decommission restbase1011-b - T223976
04:56 ariel@deploy1001: Finished deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants (duration: 00m 07s)
04:56 ariel@deploy1001: Started deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants
00:30 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy T224319 for VisualEditor switching and auto-restore (duration: 00m 50s)

2019-05-24

21:56 urandom: decommissioning restbase1011-a -- T223976
16:34 XioNoX: add routinator package to reprepro/APT - T220669
15:44 urandom: decommissioning restbase1010-c -- T223976
15:30 XioNoX: disable bgp to telia on cr1-codfw for X-connect investigation - T222967
15:01 jbond42: upload python{,3}-statsd.3.2.1-2 to jessie-wikimedia
14:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/objectcache/: d262078b1 / T220470 (duration: 01m 06s)
11:45 hoo: Updated the Wikidata property suggester with data from the 2019-05-13 JSON dump and applied the T132839 workarounds
11:32 jbond42: [actully] rebooting prometheous1004 now
11:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:23 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:23 jbond42: rebooting prometheous1004
10:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:56 jbond42: rebooting prometheous2003
10:25 jbond42: rebooting prometheous2004
10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
10:09 mobrovac: decommission restbase1010-b - T223976
07:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:32 moritzm: rebooting labweb* for kernel security update
07:05 mobrovac: restbase-dev1006 force-stop the cassandra instances, fsync exception during decomm - T224260
06:47 moritzm: bounced ferm on mw2286, wasn't correctly started after reboot
06:45 mobrovac: restbase-dev1006 decommission cass-b - T224260
06:43 _joe_: disable notifications in icinga for restbase-dev1006 T224260
06:40 mobrovac: restbase-dev1006 decommission cass-a - T224260
06:39 mobrovac: restbase-dev1006 stop restbase - T224260
06:38 mobrovac: restbase-dev1006 puppet disabled - T224260
06:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing (duration: 05m 41s)
06:20 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing
06:20 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055 (duration: 21m 30s)
06:17 marostegui: Stop MySQL on db2078:m1 to clone db2062 - T220170
06:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to new hosts T220170 (duration: 00m 48s)
05:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055
05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2062 from config T220170 (duration: 00m 48s)
05:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2062 from config T220170 (duration: 00m 49s)
05:30 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
00:32 XioNoX: remove lvs1001-5 bgp sessions from cr1/2-eqiad - T224223
00:27 XioNoX: remove term protect-old-lvs-servers from cr1/2-eqiad - T224223
00:20 urandom: decommissioning restbase1010-a -- T223976
00:04 ebernhardson@deploy1001: Finished scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ T223738 Consider searching out of limits an error (duration: 21m 32s)

2019-05-23

23:43 ebernhardson@deploy1001: Started scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ T223738 Consider searching out of limits an error
23:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VII–X, InitialiseSettings (duration: 00m 48s)
23:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VII–X, CommonSettings (duration: 00m 47s)
23:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VI, InitialiseSettings (duration: 00m 47s)
22:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VI, CommonSettings (duration: 00m 48s)
22:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup V, InitialiseSettings (duration: 00m 47s)
22:56 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup V, CommonSettings (duration: 00m 47s)
22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup IV, InitialiseSettings (duration: 00m 47s)
22:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup IV, CommonSettings (duration: 00m 48s)
22:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup III, InitialiseSettings (duration: 00m 47s)
22:47 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup III, CommonSettings (duration: 00m 48s)
22:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup II, InitialiseSettings (duration: 00m 48s)
22:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup II, CommonSettings (duration: 00m 48s)
22:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup I, InitialiseSettings (duration: 00m 47s)
22:37 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup I, CommonSettings (duration: 00m 48s)
22:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseClusterSquid, never varied, no longer used (duration: 00m 48s)
22:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgUseClusterSquid, never varied (duration: 00m 47s)
22:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 3 (duration: 00m 47s)
22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 2 (duration: 00m 48s)
22:23 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: T104148 Duplicate …Squid variables into …Cdn ahead of MW renaming, part 1 (duration: 00m 48s)
22:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223793 Drop wmgVisualEditorSingleEditTabSecondaryEditor and wmgVisualEditorSecondaryTabs from InitialiseSettings (duration: 00m 48s)
22:17 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223793 Read wmgVisualEditorIsSecondaryEditor in CommonSettings (duration: 00m 48s)
22:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223793 Add wmgVisualEditorIsSecondaryEditor to InitialiseSettings (duration: 00m 49s)
19:48 ejegg: updated payments-wiki from 786d76e212 to 332aaa96e2
18:54 urandom: decommissioning restbase1009-c -- T223976
16:13 twentyafterfour: restarting phd on phab1003 to pick up new php module config
15:57 moritzm: rebooting furud/flerovium for kernel updates
15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
15:33 ottomata: rolling restart of swift-proxy to apply creation of analytics_admin account
15:31 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Hardcode korean help desk config - T224224 (duration: 00m 48s)
15:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:31 jbond42: reboot thumbor2004
15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:02 jbond42: reboot thumbor2003
14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:57 jbond42: reboot thumbor2002
14:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:51 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:50 jbond42: reboot thumbor2001
14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:43 jbond42: reboot thumbor1004
14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:36 jbond42: reboot thumbor1003
14:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:28 jbond42: reboot thumbor1002
14:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
13:56 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Echo: SWAT: Don't add CommentStoreComment as plaintext params|gerrit:512070Don't add CommentStoreComment as plaintext params (duration: 00m 50s)
13:55 urandom: decommissioning restbase1009-b -- T223976
13:41 bblack: stopped pybal on lvs1001-6 - T224223
13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.6
13:00 godog: swift eqiad-prod: ms-be1033 weight to 1500 - T223518
12:04 moritzm: powercycling mw2268 (stuck after reboot)
11:50 jbond42: will shortly start rolling reboots of thumbor servers
11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
11:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:34 jmm@cumin1001: START - Cookbook sre.hosts.downtime
11:23 moritzm: rebooting auth1002 for kernel update
11:21 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:21 jmm@cumin1001: START - Cookbook sre.hosts.downtime
10:51 Amir1: Deploying EntitySchema to testwikidatawiki is done
10:50 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=wikidatawiki extensions/EntitySchema/sql/EntitySchema.sql (T216955)
10:50 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: deploy WikibaseSchema to test (T216956)|gerrit:511844deploy WikibaseSchema to test (T216956) (duration: 00m 56s)
10:44 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=testwikidatawiki extensions/EntitySchema/sql/EntitySchema.sql (T216956)
10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1080 (duration: 00m 57s)
10:15 _joe_: restarted php7.2-fpm on mw1261 to assess the effect of a larger APCu shm size T223180
10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:00 moritzm: rebooting remaining mw servers in codfw (sans mcrouter proxies for now)
10:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:51 hashar@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection: Rename wfAjaxCollectionGetItemList() T224093 (duration: 00m 57s)
09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 into API (duration: 00m 55s)
09:22 godog: bounce rsyslog on lithium - listener stuck /T199406
09:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:10 moritzm: rebooting scb servers in eqiad
09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 55s)
08:29 marostegui: Upgrade MySQL and kernel on db1080
08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
08:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:26 moritzm: rebooting scb servers in codfw
07:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 56s)
07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:33 moritzm: rebooting swift frontends in eqiad
07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 53s)
07:11 marostegui: Stop MySQL on db1117:3323 to clone db1128 T222682
06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 55s)
06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 55s)
06:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 56s)
06:14 mobrovac: start ruwiki dumps to fill the new parsoid tables - T215956
05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2070 as m5 codfw master - T221533 (duration: 00m 54s)
05:29 marostegui: Promote db2070 to m5 codfw master instead of db2037 - T221533
05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2107 status - will be the new master (duration: 00m 54s)
05:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1136 into s7 T222682 (duration: 00m 55s)
05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1136 into s7 T222682 (duration: 00m 55s)
04:57 mobrovac: decommission restbase1009-a - T223976
04:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
04:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 (duration: 00m 58s)
04:24 mobrovac: start nl, pt, pl wiki dumps to fill the new parsoid tables - T215956
03:50 twentyafterfour: m3 database activity levels look like they have returned to normal
03:48 twentyafterfour: puppet runs cleanly on phab1003
03:39 mutante: phab1003 - disabling puppet; /etc/php/7.2/fpm/conf.d# ln -s /etc/php/7.2/mods-available/ldap.ini 20-ldap.ini ; systemctl restart php7.2-fpm
03:27 twentyafterfour: restarted php-fpm on phab1003
02:56 mutante: phab1001 - removing community_metrics and project_changes cron jobs to avoid duplicate mails
02:51 mutante: phab1003 - chown -R phd /srv/repos/
02:41 twentyafterfour: downtimed the systemd state on phab1001 for 1 year
02:35 mutante: phabricator - going read-write again
02:24 twentyafterfour: manually started aphlict on phab1003
02:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
02:04 mutante: puppetmaster1001 - sudo -i conftool-merge
01:52 twentyafterfour: phabricator is now served by phab1003 though still in read-only mode for a bit longer
01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
01:49 mutante: puppetmaster1001 - conftool-merge
01:41 eileen: civicrm revision changed from e6e846708f to 21afd001b6, config revision is 87e78d3eac
01:37 mutante: depooled phab1001-vcs from git-ssh via conftool
01:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
01:33 mutante: run puppet on mx1001/mx2001 - switch mail route for phab to phab1003
01:30 mutante: switched from phab1001 to phab1003 - applied on cp1008 varnish canary first
01:28 twentyafterfour: stopping phd on phab1001
01:18 mutante: phabricator going readonly momentarily
01:09 twentyafterfour: extended phab downtime in icinga, actual downtime hasn't started yet, prep work taking longer than expected
00:52 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e040c6c]: Deploy GUI update (duration: 09m 54s)
00:45 mutante: phab1003 - rsyncing /srv/repos from phab1001
00:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e040c6c]: Deploy GUI update
00:33 ejegg: updated payments-wiki from fa005a0640 to 786d76e212

2019-05-22

23:30 twentyafterfour: scheduling downtime for phabricator from 0:00 to 1:00 utc
23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511889/ (duration: 00m 55s)
22:18 mdholloway: mobileapps rolled back deployment (again) due to occasional references endpoint timeouts
22:17 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2 (duration: 07m 19s)
22:15 foks: reset user email and password for Nv8200pa
22:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724, take 2
22:09 mdholloway: mobileapps rolled back deployment due to endpoint check failure (not the same one as before); retrying momentarily
22:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724 (duration: 03m 25s)
22:08 foks: reset user email and password for DarkKyoushu
22:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to fcf3724
21:51 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/resourceloader/MessageBlobStore.php: T222539 / 734b3d84f7 (duration: 00m 56s)
21:47 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/resourceloader/MessageBlobStore.php: T222539 / 3cb01cc73ce9 (duration: 00m 56s)
21:41 urandom: decommissioning restbase1008-c -- T223976
20:46 mdholloway: mobileapps rolled back deployment due to endpoint check failures
20:43 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2 (duration: 04m 19s)
20:39 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298, take 2
20:38 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298 (duration: 02m 41s)
20:35 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to b058298
19:26 jforrester@deploy1001: Finished scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731 (duration: 32m 55s)
18:53 jforrester@deploy1001: Started scap: Re-build i18n and re-scap everything for i18n issues for T224116 T224124 T220731
18:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/FlaggedRevs: Hot-deploy reverting FlaggedRevs config for T224116 T224124 (duration: 00m 58s)
18:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/UrlShortener/modules/ext.urlShortener.special.js: Fix i18n/command mix-up Ic99cf063a (duration: 01m 00s)
17:38 bblack: repool cp3046 as esams cache_upload ats-be node - T222937
17:06 urandom: decommissioning restbase1008-b -- T223976
16:17 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.34.0-wmf.5 T224116 T224124 # T220731
15:11 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:08 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
15:07 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
15:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:04 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
15:00 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
14:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:58 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
14:57 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:54 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
14:49 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=nescio.wikimedia.org
14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:45 jbond@cumin1001: conftool action : set/pooled=no; selector: name=nescio.wikimedia.org
14:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org
14:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org
14:17 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org
14:14 hashar: 1.34.0-wmf.6 deployed to group1 with the exception of cawikinews due to T224116
14:14 mobrovac: start it, es wiki dumps (fr and de completed) to fill the new parsoid tables - T215956
14:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org
14:09 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org
14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:02 marostegui: Stop MySQL on db2078 for upgrade
13:58 bblack: depool cp3046 for reimage to ats-be - T222937
13:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:57 moritzm: rebooting swift frontends in codfw
13:46 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org
13:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org
13:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
13:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org
13:27 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/templates/: T224092 (duration: 00m 58s)
13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.6 (duration: 00m 54s)
13:06 urandom: decommissioning restbase1008-a -- T223976
12:39 marostegui: Stop replication on db2048 (s1 codfw master) to rebuild revision table - this will generate lag on codfw - T224017
12:35 bblack: cp3035: restarting varnish backend
12:34 marostegui: Stop replication on db1080 to rebuild revision table - T224017
12:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 to rebuild revision table T224017 (duration: 00m 55s)
11:30 Amir1: EU SWAT is done
11:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove constraint-suggestions beta feature (T220609)|gerrit:503342Remove constraint-suggestions beta feature (T220609) (duration: 00m 57s)
11:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add configuration for EntitySchema ShExSimpleUrl (T223120)|gerrit:509878Add configuration for EntitySchema ShExSimpleUrl (T223120) (duration: 00m 56s)
11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511674|[SDC] Enable depicts qualifiers on testcommons]] (duration: 00m 57s)
10:01 vgutierrez: restarting varnish-backend on cp3039
09:52 mobrovac: start the en, fr and de wiki dumps again to populate the new parsoid table - T215956
09:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956 (duration: 27m 07s)
09:42 marostegui: Stop MySQL on db2078:m5 to clone db2070 - T221533
09:16 mobrovac@deploy1001: Started deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956
08:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2070 from s1 to m5 (duration: 00m 55s)
08:51 marostegui@deploy1001: sync-file aborted: Move db2070 from s1 to m5 (duration: 00m 03s)
08:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 56s)
08:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 into API (duration: 00m 56s)
08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 55s)
07:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s8 codfw weights T220170 (duration: 00m 55s)
07:36 mobrovac: decommission restbase1007-c - T223976
07:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s4 codfw weights T220170 (duration: 01m 06s)
07:23 marostegui: Restart MySQL on db2090 to change binlog format T220170
06:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2040 from config T224079 (duration: 00m 55s)
06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2040 from config T224079 (duration: 00m 56s)
06:13 marostegui: Remove db2040 from zarcillo and tendril - T224079
06:01 marostegui: Stop MySQL on db2040 - T224079
05:42 marostegui: Stop MySQL on db1086 to clone db1136
05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 55s)
05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s)
05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2118 and db2120 into s7 T222772 (duration: 00m 55s)
05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1118 from s1 api and pool db1134 instead T224017 (duration: 00m 57s)
04:41 gilles: purging ruwiki and eswiki to make them get the new origin trial tokens
04:39 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Renew origin trial tokens (duration: 00m 57s)
03:22 legoktm: removed 2fa for T224075
01:46 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/SpecialWatchlist.php: 68eeaa5 (duration: 00m 57s)
01:22 aaron@deploy1001: Synchronized php-1.34.0-wmf.6/includes/specials/SpecialWatchlist.php: 447bf50 (duration: 00m 57s)

2019-05-21

23:47 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511668/ (duration: 00m 57s)
23:34 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511667/ (duration: 00m 56s)
22:56 mutante: ms-be2034 - degraded systemd state was cleared and originally caused by " failed Session 72587 of user debmonitor"
22:56 mutante: ms-be2034 - sudo systemctl reset-failed
22:51 urandom: decommissioning restbase1007-b -- T223976
21:35 ejegg: updated payments-wiki from d5ef5ad067 to fa005a0640
21:21 mutante: re-enabling puppet on mc1* hosts
20:43 mutante: re-enabling puppet on all hosts using memcached class - except mc1*
20:31 mutante: mc2019 - stopping memcached and letting puppet restart it to confirm no issues after switching to systemd::service
20:20 mutante: disabling puppet on all servers using class memcached (57)
20:06 tzatziki: removing (another) two files for legal compliance
19:43 tzatziki: removing two files for legal compliance
19:12 thcipriani: gerrit back on 2.15.13
19:09 thcipriani: restart gerrit for 2.15.13 update
19:08 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming) (duration: 00m 20s)
19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming)
19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only) (duration: 00m 11s)
19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only)
18:50 bblack: repooling cp1085 frontends (weren't meant to be depooled)
18:38 bblack: re-pooling eqiad front edge traffic (onto new LVSes from T184293 )
18:36 XioNoX: update lvs static routes on cr1/2-eqiad - T184293
18:06 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 (turning on HA queues)
17:59 bblack: rebooting lvs1016 in attempt to clear interface config issues - T224027
17:51 XioNoX: add BGP sessions to AS202053 in esams
17:31 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected (again, after merging last-minute fixup https://gerrit.wikimedia.org/r/c/operations/puppet/+/511759 )
17:25 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected
17:24 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1006, basically no-op
17:21 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1015, bringing back pybal in primary role, shifting traffic to lvs1015
17:20 bblack: eqiad LVS: low-traffic (all internal services): disable pybal on lvs1016 + lvs1015, shifting traffic to lvs1006
17:18 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/includes/CollectionHooks.php: Fix paths (duration: 00m 56s)
17:17 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1005, basically no-op
17:15 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1002, bringing back pybal in backup role, no traffic shift
17:13 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1014, bringing back pybal in primary role, shifting traffic to lvs1014
17:11 bblack: eqiad LVS: high-traffic2 (upload): disable pybal on lvs1014 + lvs1002, shifting traffic to lvs1005
17:09 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1004, basically no-op
17:07 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1001, bringing back pybal in backup role, no traffic shift
17:06 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1013, bringing back pybal in primary role, shifting traffic to lvs1013
17:04 bblack: eqiad LVS: high-traffic1 (text): disable pybal on lvs1013 + lvs1001, shifting traffic to lvs1004
16:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:55 jbond42: rebooting wtp1046-1048
16:55 bblack: starting Eqiad LVS re-arrangement shortly - T184293 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/511717 (eqiad front edge is still depooled from public traffic)
16:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:50 jbond42: rebooting wtp1043-1045
16:46 mutante: rebooting phab1003 (non-prod)
16:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:44 jbond42: rebooting wtp1040-1042
16:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:39 jbond42: rebooting wtp1037-1039
16:26 mobrovac: truncate "others_T_parsoid".data
16:25 mobrovac: restbase truncate "commons_T_parsoid".data
16:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:24 jbond42: rebooting wtp1033-1034
16:18 mobrovac: restbase truncate "enwiki_T_parsoid".data
16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:16 jbond42: rebooting wtp1031-1032
16:10 mobrovac: restbase truncate "wikipedia_T_parsoid".data
16:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:09 jbond42: rebooting wtp1029-2030
16:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:01 jbond42: rebooting wtp1027-2028
15:56 urandom: decommissioning restbase1007-a -- T208087
15:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:54 jbond42: rebooting wtp1025-2026
15:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007 (duration: 02m 43s)
15:42 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007
15:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found (duration: 02m 40s)
15:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:40 jbond42: rebooting wtp2019-2020
15:39 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found
15:38 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2 (duration: 00m 45s)
15:38 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2
15:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956 (duration: 07m 10s)
15:37 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Moving to 10% of users on php7 T219150 (duration: 00m 57s)
15:32 XioNoX: enable BGP to telia on cr1-codfw - T222967
15:30 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956
15:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:23 jbond42: rebooting wtp2017-2018
15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:13 jbond42: rebooting wtp2015-2016
15:10 XioNoX: disable BGP to telia on cr1-codfw - T222967
15:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:05 jbond42: rebooting wtp2013-2014
15:02 crusnov@deploy1001: Finished deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - T220422 (duration: 00m 55s)
15:01 crusnov@deploy1001: Started deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - T220422
14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:57 jbond42: rebooting wtp2011-2012
14:57 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.6
14:50 jbond42: rebooting wtp2009-2010
14:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:44 jbond42: rebooting wtp2007-2008
14:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:37 jbond42: rebooting wtp2005-2006
14:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:31 jbond42: rebooting wtp2003-2004
14:27 hashar@deploy1001: Finished scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # T220731 (duration: 48m 09s)
14:26 volans: restarting wikibugs
14:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
14:25 elukey@cumin1001: START - Cookbook sre.hosts.decommission
14:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:13 jbond42: rebooting wtp2001-2002
13:50 bblack: rebooting lvs1013,14,15 for verification
13:39 hashar@deploy1001: Started scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # T220731
13:37 hashar@deploy1001: Pruned MediaWiki: 1.34.0-wmf.1 (duration: 02m 12s)
13:36 hashar: scap clean --verbose --delete 1.34.0-wmf.1 # T220731
13:29 hashar: scap clean --verbose --delete 1.33.0-wmf.25 # T220731
13:25 godog: swift eqiad-prod: start depool ms-be1033 - T223518
13:24 hashar: Applied security patches to 1.34.0-wmf.6 # T220731
13:24 hashar: Applied security patches to 1.34.0-wmf.6
13:23 bblack: rebooting lvs1013 (possibly a few times, debugging startup issues)
13:20 hashar: scap prep 1.34.0-wmf.6 # T220731
13:11 hashar: Updated plugins on https://releases-jenkins.wikimedia.org/
13:09 hashar: Restarting Jenkins T224002
12:45 hashar: Cutting branch wmf/1.34.0-wmf.6 # T220731
12:22 volans: restarting Icinga on icinga1001 to pick up new open files limits
12:08 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148 (duration: 00m 54s)
12:07 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - T219148
11:59 mobrovac: started dewiki dumps - T215956
11:58 mobrovac: started frwiki dumps - T215956
11:46 mobrovac: started enwiki dumps - T215956
11:27 Amir1: EU SWAT is done
11:27 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Switch off php7 for investigation of production instabilities"|gerrit:511658Revert "Switch off php7 for investigation of production instabilities" (duration: 00m 50s)
11:20 volans: restarting Icinga on icinga2001 (passive server) to pick up new open file limits
11:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
11:17 jbond42: reboot wtp1025.eqiad.wmnet
11:10 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Define wmgUseEntitySchema (T221651)|gerrit:505816Define wmgUseEntitySchema (T221651), part II (duration: 00m 49s)
11:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956 (duration: 25m 50s)
11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Define wmgUseEntitySchema (T221651)|gerrit:505816Define wmgUseEntitySchema (T221651), part I (duration: 00m 50s)
11:07 godog: swift codfw-prod: remove ms-be201[345] - T221068
10:59 _joe_: rolling restart of php7.2-fpm across the fleet to pick up a config change
10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956
10:39 jijiki: updating prometheus-mcrouter-exporter on mw* servers
10:26 godog: pool new restbase hosts - T219404
10:20 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
09:49 moritzm: updated buster netboot image to daily image from 20190521
09:26 moritzm: reimaging graphite2001 to buster for some d-i tests
08:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2104 as candidate master and as API (duration: 00m 51s)
08:56 marostegui: Stop MySQL on db2041 as it will be decommissioned T223950
06:59 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Turning off php7 sampling for investigation in T223952 (duration: 00m 53s)
06:55 elukey: reboot of stat100[4,5,6,7] and notebook100[3,4] for kernel upgrades
06:31 marostegui: Stop mariadb on db2104 to convert it to s2 candidate master
06:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2104 (duration: 00m 51s)
05:50 marostegui: Remove db2041 from tendril and zarcillo - T223950
05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2041 for decommissioning T223950 (duration: 00m 51s)
05:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2041 for decommissioning T223950 (duration: 00m 51s)
05:16 marostegui: Stop MySQL on db2040
05:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2040 (duration: 00m 50s)
05:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2114 into s6 - T222772 (duration: 00m 50s)
05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2114 into s6 - T222772 (duration: 00m 51s)
03:36 urandom: bootstrapping restbase1027-c -- T219404
00:47 urandom: bootstrapping restbase1027-b -- T219404
00:05 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/libs/objectcache/APCUBagOStuff.php: 982299d (duration: 00m 54s)

2019-05-20

21:07 ejegg: updated payments-wiki from 8397ccf9cc to d5ef5ad067
19:20 mobrovac: bootstrap restbase1027-a - T219404
18:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/Linker.php: T222857 / Iecc2140fabd3 (duration: 00m 54s)
16:43 onimisionipe: rolling reboot of maps eqiad to pick kernel upgrades
16:38 mobrovac: bootstrap restbase1026-c - T219404
15:26 onimisionipe: rebooting codfw maps to pick up kernel upgrades
15:26 marostegui: Stop replication on labsdb1011 to start compressing tables - T222978
15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 0 (T188327) (duration: 00m 55s)
14:54 bblack: rebooting lvs1013, lvs1014, lvs1015 (not in active service, yet)
14:43 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148 (duration: 00m 55s)
14:42 jiji@deploy1001: Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - T219148
14:21 marostegui: Reload haproxy on dbroxy1010 to depool labsdb1011
14:14 marostegui: Reload haproxy on dbroxy1010 to repool labsdb1010
13:58 mobrovac: bootstrap restbase1026-b - T219404
12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 50s)
11:44 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:44 fsero@cumin1001: START - Cookbook sre.hosts.downtime
11:28 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:28 fsero@cumin1001: START - Cookbook sre.hosts.downtime
11:21 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:21 fsero@cumin1001: START - Cookbook sre.hosts.downtime
11:17 mobrovac: bootstrap restbase1026-a - T219404
11:16 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:15 fsero@cumin1001: START - Cookbook sre.hosts.downtime
11:01 arturo: icinga downtime toolschecker for 3h for T223332
10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:511398 Bumping portals to master (T128546) (duration: 00m 49s)
10:42 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:511398 Bumping portals to master (T128546) (duration: 00m 50s)
10:27 moritzm: rebooting contint1001 for kernel update
10:25 hashar: contint1001: docker image prune -f | Total reclaimed space: 7.115GB | T207707
10:20 hashar: Stopped Zuul gracefully
10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:18 fsero: puppet reenabled certs renewed - T221346
10:08 fsero: rolling over certs into mcrouter proxies codfw - T221346
10:03 fsero: rolling over certs into mcrouter proxies eqiad - T221346
09:42 marostegui: Remove db2036 from tendril and zarcillo - T223885
09:39 marostegui: Stop MySQL on db2036 T223885
09:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2036, going to be decommissioned T223885 (duration: 00m 49s)
09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2036, going to be decommissioned T223885 (duration: 00m 49s)
09:36 fsero: rolling over new certs to all mcrouter hosts except proxys - T221346
09:26 fsero: continue to rolling over new certs - T221346
09:01 fsero: disabling puppet on mcrouter hosts for regenerating certs - T221346
08:49 moritzm: installing atftpd security updates
08:43 mobrovac: bootstrap restbase1025-c - T219404
08:38 moritzm: installing samba security updates
08:36 moritzm: installing ghostscript security updates on jessie
08:25 moritzm: installing cups-filter security updates on jessie (prerequisite for ghostscript security update)
07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 48s)
07:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 50s)
06:25 elukey: rebuild and upload memkeys 20181031-1 to stretch-wikimedia
06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 49s)
06:20 elukey: upgrade memkeys to version 20181031-1 on all the mc* hosts (was deployied only on a few of them) - T208376
06:11 mobrovac: bootstrap restbase1025-b - T219404
06:00 elukey: powercycle analytics1071 - soft lockups error messages in the dmesg
05:51 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
05:42 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to repool labsdb1009 and restore original weights
05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1126 into s8, db1134 into s1 T222682 (duration: 00m 49s)
05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1126 into s8, db1134 into s1 T222682 (duration: 00m 49s)
05:12 marostegui: Stop MySQL on db2046
05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 50s)
05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2038 (duration: 00m 49s)
05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2038 (duration: 00m 55s)
02:42 cdanis: cdanis@cp1075.eqiad.wmnet ~ % sudo -i varnish-backend-restart

2019-05-19

20:16 ariel@deploy1001: Finished deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace (duration: 00m 03s)
20:16 ariel@deploy1001: Started deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace
17:51 mobrovac: bootstrap restbase1025-a - T219404
13:26 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: T223734: Depool cloudelastic100[12] (duration: 00m 49s)
12:37 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: update (duration: 00m 57s)
10:32 reedy@deploy1001: Synchronized wikiversions-labs.json: T223770 (duration: 00m 48s)
10:31 reedy@deploy1001: Synchronized dblists/all-labs.dblist: T223770 (duration: 00m 51s)
10:12 mobrovac: bootstrap restbase1024-c - T219404
09:59 ebernhardson: eqiad psi elasticsearch high disk watermark to 89% to allow unallocated shard to initialize
09:56 ebernhardson: eqiad psi elasticsearch low disk watermark to 79% to allow unallocated shard to initialize
08:13 jijiki: varnish-backend-restart on cp1087
06:56 mobrovac: bootstrap restbase1024-b - T219404
05:09 marostegui: varnish-backend-restart on cp1081

2019-05-18

23:53 bblack: rebooting lvs1015 for interface changes
22:44 bblack: imaging lvs1013-lvs1015
21:01 bblack: depooling eqiad public front edge in authdns
19:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/Collection/templates/CollectionSuggestTemplate.php: T223742 / 89bd434 (duration: 00m 49s)
19:16 mobrovac: bootstrap restbase1024-a - T219404
18:50 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: T222146 / 9385b2dd66 (duration: 00m 50s)
16:53 mobrovac: bootstrap restbase1023-c - T219404
15:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/TimedMediaHandler/includes/handlers/WebMHandler/WebMHandler.php: T223445 / a9df59c59d7a30 (duration: 00m 51s)
14:59 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: whitespace is srs (duration: 00m 49s)
14:56 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Copy in default config (duration: 01m 04s)
13:51 urandom: bootstrapping restbase1023-b - T219404
05:41 mobrovac: bootstrap rb1023-a - T219404
02:37 urandom: bootstrapping restbase1022-c - T219404

2019-05-17

23:55 urandom: bootstrapping restbase1022-b - T219404
23:11 foks: removing one file for legal compliance
15:20 hashar@deploy1001: Synchronized php-1.34.0-wmf.5/includes/api/ApiUpload.php: Revert "Always validate uploads over api" - T223448 (T222994 T223446) (duration: 01m 00s)
15:18 hashar: Deploying hotfix https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/510924/ . Should restore upload of large files on commons and other wikis #T223448 (poke T22994 T223446 )
14:51 mobrovac: bootstrap restbase1022-a - T219404
14:43 fsero: reenabling puppet puppet on mcrouter hosts for T221346, checks in place is there any alert for cert expiration and mcrouter this is the source :)
14:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098 & db1131 after maintenance (duration: 00m 49s)
14:09 fsero: second round of setting up cert check, disablign puppet on mcrouter hosts T221346
12:58 mobrovac: bootstrap restbase1021-c - T219404
10:59 mobrovac: bootstrap restbase1021-b - T219404
09:27 godog: swift remove ms-be101[345] from rings - T220590
09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
08:24 fsero: reenabling puppet after reverting T221346
08:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 59s)
07:57 fsero: disabling puppet on mcrouter hosts for T221346
07:12 marostegui: Compress s7 on labsdb1012 T222978
06:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2111 and db2113 into s5 T222772 (duration: 00m 49s)
06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2111 and db2113 into s5 T222772 (duration: 00m 50s)
05:19 marostegui: Stop MySQL on db1083 to clone db1134
05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 50s)
05:00 mobrovac: bootstrap 1021-a - T219404

2019-05-16

21:02 Jeff_Green: authdns-update to switch payments.wikimedia.org back to eqiad cluster
19:24 onimisionipe: pooling elastic2038 - shards are properly balanced across nodes
18:31 onimisionipe: depooling elastic2038 to investigate more
17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:26 jbond42: reboot ores1007-1009
17:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:15 jbond42: reboot ores1005-1006
17:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:10 jbond42: reboot ores1003-1004
17:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
17:05 jbond42: reboot ores1001-1002
17:00 jbond42: reboot orespoolcounter[12]002
16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:53 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:51 jbond42: reboot orespoolcounter[12]001
16:44 jbond42: reboot ores2008-2009
16:38 jbond42: will frist reboot ores2006-2007
16:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:36 jbond42: reboot ores2006-2009
16:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
16:28 jbond42: reboot ores2003-2005
16:22 XioNoX: add BGP session to Hetzner in AMS-IX
16:19 akosiaris: switch all etcd* kubestagetcd* servers from "drbd" ganeti disk template to "plain" ganeti disk template
16:17 jbond42: reboot ores2001-2002
16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:59 akosiaris: build service-checker OCI container 0.0.2 with 0.1.5 service-checker version T220401
15:49 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/CirrusSearch/includes/InterwikiSearcher.php: Hot-deploy CirrusSearch interwiki no result UBN T223449 (duration: 00m 49s)
15:45 marostegui: Drop the following databases from tendril to recreated them with the right user: db1127,db1129,db1130, db1131, db1137,db1138
15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/pagers/ContribsPager.php: Hot-deploy Contribs getNamespaceInfo UBN fix T223440 (duration: 00m 53s)
15:25 aborrero@puppetmaster1001: conftool action : set/pooled=yes; selector: name=labweb1001.wikimedia.org,service=labweb
15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
15:02 jbond42: rebooting aqs1009
14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:54 jbond42: rebooting aqs1008
14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:45 jbond42: rebooting aqs1007
14:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:34 jbond42: rebooting aqs1006
14:28 jbond42: rebooting aqs1005
14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:18 moritzm: powercycling mw2199, stuck during reboot
14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
13:57 marostegui: and recreate the following hosts in tendril: db2103,db2104,db2105,db2106,db2107,db2108,db2109,db2110,db2111,db2112,db2113,db2115,db2116,db2117,db2119 T222772
13:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
13:39 cmjohnson1: replacing pdu in rack B5 eqiad
13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.5
13:00 arturo: labweb1001 depooled
12:59 mobrovac: bootstrap restbase1020-c - T219404
12:21 godog: stop swift and rsync on ms-be10[16,17,18,32,33] for eqiad B5 pdu replacement - T223126
12:03 jynus: stop and shutdown db1098,db1131,db1139 T223126
11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
11:54 moritzm: rebooting mw app servers in codfw for kernel update
11:32 hoo@deploy1001: Synchronized wmf-config/extension-list: Add EntitySchema to extension-list (T221650) (duration: 00m 56s)
11:22 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 & db1131 for maintenance (duration: 00m 57s)
11:00 arturo: T223148 downtime cloudvirt[1014,1028].eqiad.wmnet and labweb1001.wikimedia.org for 8 hours
11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
10:50 godog: bootstrap restbase1020-b - T219404
10:27 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148 (duration: 01m 07s)
10:26 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - T219148
08:52 akosiaris: upgrade mathoid to statsd_exporter 0.9 T220709
08:48 akosiaris@deploy1001: scap-helm mathoid finished
08:48 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
08:48 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
08:48 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
08:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
08:37 godog: bootstrap restbase1020-a - T219404
08:32 elukey: depool/restart-nutcracker-pool mw1293/1313 - T214275
08:22 elukey: depool/restart-nutcracker-pool mw1238 - T214275
08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 (duration: 00m 56s)
07:57 moritzm: installing linux 4.9.168-1+deb9u2~deb8u1 kernel on jessie hosts (no reboots, just installing the new package)
07:45 moritzm: removed intel-microcode 3.20180807a from jessie-wikimedia (superceded by newer version in security.debian.org, which doesn't get picked up by apt due to the higher apr priority of jessie-wikimedia)
07:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 into API (duration: 00m 56s)
07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 (duration: 00m 57s)
06:59 moritzm: installing intel-microcode updates
05:34 elukey: roll restart of nutcracker on mw2* to pick up new config changes (no more memcached config) - T214275
05:33 marostegui: Stop MySQL on db1104 to clone db1126
05:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 56s)
05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2106, db2110, db2119 into s4 - T222772 (duration: 00m 56s)
05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2106, db2110, db2119 into s4 - T222772 (duration: 00m 58s)
02:27 onimisionipe: pooling elastic2038 after unbanning - T217398

2019-05-15

22:16 mutante: phab1003 - start ssh-phab service after adding service IPs
22:01 eileen: civicrm update - lost the commit versions but 5.13.4 release
21:47 mutante: phab1003 - ip -6 addr del 2620:0:861:ed1a::3:16/128 dev lo - remove extra service IP for phab's separate sshd, duplicated with phab1001 (T190568)
21:24 jforrester@deploy1001: Synchronized wmf-config/MetaContactPages.php: Add movecomsignup contact page on meta T218363 (duration: 00m 56s)
21:23 eileen: civicrm revision changed from 7d3ef1f2ae to c69c6e2e6a, config revision is a099f13a55
21:00 fdans@deploy1001: Finished deploy [analytics/refinery@ffa4931]: deploying analytics refinery (duration: 15m 31s)
20:45 tgr@deploy1001: Finished deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist (T213362) (duration: 02m 41s)
20:45 fdans@deploy1001: Started deploy [analytics/refinery@ffa4931]: deploying analytics refinery
20:42 tgr@deploy1001: Started deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist (T213362)
20:20 robh: rebooting cloudvirt1015 into dell hardware tests per T220853
20:18 arlolra@deploy1001: Finished deploy [parsoid/deploy@8f28977]: Updating Parsoid to 6658cad (duration: 06m 23s)
20:12 arlolra@deploy1001: Started deploy [parsoid/deploy@8f28977]: Updating Parsoid to 6658cad
19:42 hashar: group 1 promoted to 1.34.0-wmf.5 apparently without any issue # T220730
19:03 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.5 (duration: 00m 58s)
19:02 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.5
18:38 andyrussg@deploed php-1.34.0-wmf.5/extensions/CentralNotice/: Revert CentralNotice (duration: 01m 00s)
17:32 thcipriani: deploy1001:sudo -u www-data /usr/local/bin/foreachwiki extensions/WikimediaMaintenance/refreshMessageBlobs.php
17:19 onimisionipe: unban elastic2038 from shard allocation - T217398
17:19 XenoRyet: updated civicrm from 4b6d569383 to 7d3ef1f2ae
17:09 elukey: powerup elastic2038 (was down for maintenance)
17:01 godog: bootstrap restbase1019-c - T219404
16:58 bstorm_: T212972 updated all views on labsdb1012
16:50 elukey: restart Hadoop HDFS namenodes on an-master100[1,2] to pick up new settings
16:40 urandom: bootstrap restbase1019-c - T219404
16:28 elukey: restart nutcracker on mw2240 to pick up the new config (no more memcached settings)
16:26 bstorm_: T212972 updated all views on labsdb1009
16:17 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223166 (duration: 00m 56s)
16:16 reedy@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/WikimediaEvents/: T219128 (duration: 01m 13s)
16:14 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/WikimediaEvents/: T219128 (duration: 01m 06s)
16:03 jynus: disable puppet on all production databases
15:21 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: T222980 (duration: 00m 57s)
14:28 andrewbogott: repooling labweb1002
14:16 andrewbogott: depooling labweb1002 to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509916/
14:15 godog: bootstrap restbase1019-b - T219404
13:21 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on testwikis and mediawikiwiki (T188327) (duration: 00m 57s)
12:22 Lucas_WMDE: EU SWAT done
12:20 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: SWAT: VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections|gerrit:510217VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections + DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892)|gerrit:510218DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892) (duration: 01m 15s)
12:20 akosiaris: depool esams, network issues
11:47 akosiaris@deploy1001: scap-helm mathoid finished
11:47 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
11:46 akosiaris@deploy1001: scap-helm mathoid upgrade --wait -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
11:41 akosiaris@deploy1001: scap-helm citoid finished
11:41 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
11:41 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
11:32 akosiaris@deploy1001: scap-helm citoid finished
11:32 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
11:31 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
11:31 godog: bootstrap restbase1019-a - T219404
11:29 akosiaris: upgrade to statsd_export 0.9 for citoid T220709
11:27 akosiaris@deploy1001: scap-helm citoid finished
11:27 akosiaris@deploy1001: scap-helm citoid cluster staging completed
11:27 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
10:31 elukey: superset.wikimedia.org moved to analytics-tool1004 (Buster + python 3.7 + Superset 0.32 upgrade)
10:27 moritzm: installing linux 4.9.168-1+deb9u2 kernel on stretch hosts (no reboots, just installing the new package)
10:04 elukey@deploy1001: Finished deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency (duration: 00m 26s)
10:04 elukey@deploy1001: Started deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency
09:33 hashar: Disable CI castor cache system since the instance is being migrated. Some / most CI jobs might have failed for the last 20 minutes or so T223148
08:45 elukey@deploy1001: Finished deploy [analytics/superset/deploy@31c2c30]: Superset 0.32 (duration: 00m 26s)
08:44 elukey@deploy1001: Started deploy [analytics/superset/deploy@31c2c30]: Superset 0.32
08:36 elukey: stop superset on analytics-tool1003 as prep step for the migration to the new host - T212243
08:31 moritzm: rebooting mw2164
07:33 elukey: restart nutcracker on mw2245 to pick up config changes (removal of memcached config)
07:29 elukey: powercycle an-worker1094 (OEM event occurred, checking if temporary)
07:21 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove the php7 beta feature T219128 (duration: 00m 59s)
06:24 elukey: force remount of /mnt/hdfs on stat1007 - fuse hdfs stuck
01:40 eileen: process control updated - omnigroupmember.load re-enabled
01:39 eileen: civicrm revision changed from 5024c968ed to 4b6d569383, config revision is a099f13a55

2019-05-14

20:44 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin (duration: 00m 07s)
20:43 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin
20:41 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: (no justification provided) (duration: 00m 01s)
20:41 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: (no justification provided)
20:13 chaomodus: restarting gerrit on cobalt to pick up metrics export changes
19:37 herron: adding logstash filter truncate plugin to prod logstash collectors
19:28 gehel: shutting down elastic2038 for memory replacement - T217398
19:25 gehel: ban elastic2038 from elasticsearch cluster for memory replacement - T217398
18:21 mutante: mwmaint1002 - deleting /root/home-mwmaint2001 to save space - confirmed we have bacula backups of home on mwmaint2001
17:55 mutante: elastic2029 - enable puppet agent - was disabled without reason and nobody seems to have logged in recently
17:54 mutante: elastic2038 - restart nagios-nrpe-server - attempt to fix "CHECK_NRPE STATE UNKNOWN" for a single check
17:32 mutante: contint1001 - mkdir /srv/zuul-logs ; mv /var/log/zuul/debug.log* /srv/zuul-logs/ to prevent CI running out of disk again (T207707)
17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@881b22b]: Update chromium-render to 8cc96e7 make timeout handler more robust (T217724) (duration: 02m 23s)
17:20 mbsantos@deploy1001: Started deploy [proton/deploy@881b22b]: Update chromium-render to 8cc96e7 make timeout handler more robust (T217724)
16:30 jynus: stop replication and start table recompression on labsdb1009 T222978
16:22 godog: statsd_exporter 0.9 upgrade on thumbor - T220709
16:04 gilles@deploy1001: Finished deploy [performance/coal@5a32eb2]: T221401 (duration: 00m 06s)
16:04 gilles@deploy1001: Started deploy [performance/coal@5a32eb2]: T221401
15:56 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix T223281 (duration: 00m 55s)
15:51 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix T223281 (duration: 00m 57s)
15:49 crusnov@deploy1001: Finished deploy [netbox/deploy@81059c6]: Deploy new reqs for reports (duration: 00m 55s)
15:49 crusnov@deploy1001: Started deploy [netbox/deploy@81059c6]: Deploy new reqs for reports
15:43 jynus: reload haproxy config @ dbproxy1010, dbproxy1011
15:38 XioNoX: re-activate bgp to telia on cr1-codfw - T222967
15:33 XioNoX: deactivate bgp to telia on cr1-codfw - T222967
15:19 papaul: shutting down elastic2038 for memory replacement
15:14 hashar: mw1263: scap pull
14:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.5
14:50 moritzm: rebooting mw1263 for kernel update
14:47 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 62m 47s)
14:07 _joe_: apt-get lean on mwmaint1002
13:44 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
13:44 godog: rearm keyholder on deploy and cumin hosts
13:27 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 14m 39s)
13:12 hashar: train delay, I forgot to sync 1.34.0-wmf.5
13:12 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
12:37 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: Hot-deploy T223023 fix I1b35b28e42 for mobile VE edit section switches (duration: 00m 54s)
12:10 moritzm: rebooting mw2164 for kernel update
11:33 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.24 (duration: 03m 20s)
11:30 hashar: Deleting 1.33.0-wmf.24 from deploy1001 # T220730
11:28 kart_: EU-Mid day SWAT Done.
11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Decrease idwiki MT thresold for publishing|gerrit:508818Decrease idwiki MT thresold for publishing (T222782) (duration: 00m 51s)
11:23 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.23 (duration: 14m 31s)
11:23 jbond42: cumin1001 ~ % sudo cumin A:all '/usr/local/sbin/run-puppet-agent --failed-only
11:18 jbond42: enable puppet issue fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/510131
11:12 ema: pool cp3036 reimaged to ATS T222937
11:09 hashar: Deleting 1.33.0-wmf.23 from deploy1001 # T220730
11:09 jbond42: disable puppet
10:58 hashar: scap prep 1.34.0-wmf.5 # T220730
10:16 hashar: Cutting branches for 1.34.0-wmf.5
10:01 ema: depool cp3036 and reimage as upload_ats T222937
09:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2034 from config T219493 (duration: 00m 49s)
09:53 marostegui@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
09:52 marostegui: Remove db2034 from tendril and zarcillo - T219493
09:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2034 from config T219493 (duration: 00m 50s)
09:34 jynus: restart apache on ununpentium
09:29 marostegui: Parsercache deployment window FINISHED
09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy second parsercache key change everywhere after deploying it in batches first T210725 (duration: 00m 50s)
09:15 godog: statsd_exporter 0.9 upgrade on ores - T220709
09:02 godog: statsd_exporter 0.9 upgrade on logstash - T220709
08:53 jynus: failing connections over dbproxy1006 to dbproxy1001
07:48 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
06:45 ema: cp-ats: upgrade trafficserver to 8.0.3-1wm2
06:20 ema: cp4021: upgrade trafficserver to 8.0.3-1wm2
06:15 ema: upload trafficserver 8.0.3-1wm2 to stretch-wikimedia
06:02 marostegui: Deploy parsercache change to eqiad canaries - T210725
06:01 marostegui: Lock wmf-config deployment on deploy1001 to slowly change parsercache key on eqiad - T210725
06:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change parsercache on codfw T210725 (duration: 00m 54s)
01:55 mutante: re-scheduled nginx / HTTP availability icinga checks
01:42 mutante: cumin -b 6 'R:git::clone' 'run-puppet-agent -q --failed-only'
01:37 mutante: restarting Gerrit to apply 2 config changes - disable DNS reverse lookup (gerrit:508127) & list projects from index (gerrit:508892) - removes blockers for 2.16 upgrade (T200739)
00:32 mutante: restarting wikibugs because it left some channels

2019-05-13

20:29 ejegg: updated payments-wiki from 6e0172bac3 to 8397ccf9cc
20:24 halfak@deploy1001: Finished deploy [ores/deploy@c17a1a2]: T202202 (duration: 04m 16s)
20:20 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: T202202
20:19 ariel@deploy1001: Finished deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis (duration: 00m 03s)
20:19 ariel@deploy1001: Started deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis
20:04 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: T202202
18:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync: re-enabling all eventgate-analytics monolog events - T222962 (duration: 00m 49s)
18:28 ejegg: updated SmashPig standalone deploy 22b6982 Try turning off WSDL caching for Adyen
18:25 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T222954 (duration: 00m 49s)
18:19 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-enabling all eventgate-analytics monolog events - T222962 (duration: 00m 50s)
18:17 ottomata: re-enabling all eventgate-analytics monolog events - T222962
18:12 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223006 T222740 T222044 (duration: 00m 49s)
18:07 otto@deploy1001: scap-helm eventgate-analytics finished
18:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
18:07 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
18:05 otto@deploy1001: scap-helm eventgate-analytics finished
18:05 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
18:04 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
18:03 fsero: deleting eventgate-analytics-production releases on codfw
18:01 otto@deploy1001: scap-helm eventgate-analytics finished
18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
17:57 fsero: deleting eventgate-analytics and eventgate-analytics-staging releases on staging
17:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - disabling all eventgate-analytics monolog events for eventgate chart migration - T222962 (duration: 00m 50s)
17:11 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: disabling all eventgate-analytics monolog events for eventgate chart migration - T222962 (duration: 00m 50s)
17:10 ottomata: disabling all eventgate-analytics monolog events for eventgate chart migration - T222962
16:14 Amir1: removing tokipona language terms from items using maintenance script (T200432)
16:00 andrewbogott: reimaging clouvirt1024 (for the last time I hope)
14:33 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
14:32 otto@deploy1001: Synchronized wmf-config/LabsServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
14:05 moritzm: uploaded puppet 4.8.2-5+wmf1 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia (T219803)
14:00 elukey: roll restart of aqs on aqs1* to pick up new druid settings
13:50 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-fe2*' 'run-puppet-agent'
13:46 moritzm: updating puppet on deployment-puppetmaster03 to 4.8.2-5+wmf1 (T219803)
13:39 akosiaris: bump eventgate-analytics chart to 0.0.36. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. T220709
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
13:36 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on all wikis (T188327) (duration: 00m 50s)
13:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-be2*' 'run-puppet-agent'
13:29 cdanis: swift codfw-prod: deploy I1035824d
13:25 moritzm: uploaded puppetdb 4.4.0-1~wmf2 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia (T219803)
13:07 akosiaris: bump cxserver chart to 0.0.7. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. T220709
13:06 akosiaris@deploy1001: scap-helm cxserver finished
13:06 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
13:06 akosiaris@deploy1001: scap-helm cxserver finished
13:06 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
13:06 akosiaris@deploy1001: scap-helm cxserver finished
13:06 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
13:05 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
13:04 arturo: install libjs-jquery from stretch in cloudnet servers T222862
13:03 arturo: enable puppet in cloudvirt1024 to refresh some apt config T222862
12:50 moritzm: updating puppetdb on deployment-puppetdb02 to 4.4.0-1~wmf2 (T219803)
12:36 cdanis: root@ms-be2013.codfw.wmnet ~ # umount /srv/swift-storage/sda1 && mount /srv/swift-storage/sda1 && umount /srv/swift-storage/sdb1 && mount /srv/swift-storage/sdb1
12:36 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/resources/src/startup/startup.js: I76a2c8d52fa (duration: 00m 51s)
12:33 cdanis: root@ms-be2013.codfw.wmnet ~ # mount /srv/swift-storage/sdf1
12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdl1 && sudo mount /srv/swift-storage/sdl1
12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdf1 && sudo mount /srv/swift-storage/sdf1
12:18 cdanis: cdanis@ms-be2015.codfw.wmnet /var/log % sudo mount /srv/swift-storage/sda1
12:08 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/lib/includes/Formatters/CachingKartographerEmbeddingHandler.php: T223085 (duration: 00m 50s)
11:59 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/composer.json: T215746 (duration: 00m 49s)
11:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/vendor/: T215746 (duration: 01m 30s)
11:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: T222639 (duration: 00m 52s)
11:04 ema: cp-ats rolling restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509456/
10:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/includes/http/HttpRequestFactory.php: T222935 Hot-deploy fix for HttpRequestFactory (duration: 00m 50s)
10:38 jbond42: update puppet5 and facter3 in eqiad
10:17 vgutierrez: rebooting cloudvirt1024 - T209707
09:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 T217396 (duration: 00m 49s)
09:33 hashar: Upgrading Zuul 2.5.1-wmf7 -> 2.5.1-wmf9 T105474
07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully pool db1130 (s5) and db1138 (s4) T222682 (duration: 00m 50s)
07:08 elukey: slow roll restart of celery on ores* nodes to allow cores to be generated upon segfault - T222866
07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 50s)
06:53 moritzm: installing ghostscript security updates
06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 49s)
06:09 marostegui: Compress s2, s6 and s7 on labsdb1012 - T222978
05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) T222682 (duration: 00m 49s)
05:41 marostegui: Optimize tables on pc2007
05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1130 into s5 and db1138 into s4 T222682 (duration: 00m 49s)
05:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1130 into s5 and db1138 into s4 T222682 (duration: 00m 51s)

2019-05-12

15:32 elukey: rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 - T222941
12:14 elukey: restart eventlogging on eventlog1002 - all processors stuck due to kafka python (T222941)
05:31 marostegui: DIsable notifications for db1116:s8 Slave LAG check as this is a snapshot source

2019-05-11

18:26 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 57s)
06:37 elukey: restart eventlogging on eventlog1002 - huge kafka consumer lag accumulated (T222941)
02:01 mutante: actinium - low disk space - apt-get clean - gzip /var/log/squid3/access.log.1

2019-05-10

18:58 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
18:51 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
18:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'enable-puppet "Puppet breakages on all hosts -- cdanis"'
18:39 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'disable-puppet "Puppet breakages on all hosts -- cdanis"'
16:50 reedy@deploy1001: Synchronized dblists/: Update size related dblists (duration: 00m 49s)
16:31 ebernhardson: drop archive indices from cloudelastic
16:11 ariel@deploy1001: Finished deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run (duration: 00m 05s)
16:11 ariel@deploy1001: Started deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run
16:05 ejegg: moved adyen smashpig job runner to frdev1001
15:25 _joe_: wiped opcache clean on all api, appservers
15:05 cdanis: cdanis@mw1239.eqiad.wmnet ~ % sudo php7adm /opcache-free
15:05 Krinkle: fix opcache krinkle@mw1268:~$ scap pull
15:04 cdanis: cdanis@mw1268.eqiad.wmnet ~ % sudo php7adm /opcache-free
15:03 Krinkle: ran 'scap pull' on mw1239.eqiad.wmnet to fix opcache corruption
14:56 jbond42: uploade zuul_2.5.10-wmf9 to jessie-wikimedia
14:54 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: T99740 / d9dbecad9c7b (duration: 00m 51s)
14:33 akosiaris@deploy1001: scap-helm eventgate-analytics finished
14:32 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
14:32 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f lala.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
13:30 ema: pool cp3038 w/ ATS backend T222937
12:19 ema: depool cp3038 and reimage as upload_ats T222937
11:52 jbond42: (un)load edac kernel modules on elastic1029 to test resetting counters
11:04 jbond42: restart refinery-eventlogging-saltrotate on an-coord1001
10:30 moritzm: installing symfony security updates
09:17 jynus: disabling replication lag alerts for backup source hosts on s1, s4, s8 T206203
07:14 moritzm: uploaded linux-meta 1.21 for jessie-wikimedia (pointing to the new -9 ABI introduced with the 4.9.168 kernel)
07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1100 into API (duration: 00m 50s)
06:55 ema: swift-fe: rolling restart to enable ensure_max_age T222937
06:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 into API (duration: 00m 50s)
06:27 ema: ms-fe1005: pool with ensure_max_age T222937
06:26 ariel@deploy1001: Finished deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis (duration: 00m 05s)
06:26 ariel@deploy1001: Started deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis
06:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 (duration: 00m 50s)
06:18 ema: ms-fe1005: depool and test ensure_max_age T222937
06:09 _joe_: depooling mw1261 for tests
05:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2105 db2109 into s3 T222772 (duration: 00m 49s)
05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2105 db2109 into s3 T222772 (duration: 00m 52s)
05:40 elukey: execute kafka preferred-replica-election on kafka-jumbo1001 as attempt to rebalance traffic (1002 seems handling way more than others since some days)
05:32 elukey: restart eventlogging daemons on eventlog1002 - kafka consumer errors in the logs, some lag built over time
05:08 marostegui: Stop MySQL on db1100
05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 (duration: 00m 50s)
04:56 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2112 (duration: 00m 51s)
00:15 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for T222471 (duration: 00m 37s)
00:14 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for T222471

2019-05-09

23:52 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625: Dont write to private wikis on cloudelastic (duration: 00m 50s)
23:48 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: T220819 Uniquely identify connections in connection pool (duration: 00m 58s)
23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: T220625 Limit the clusters archive index is written to (duration: 00m 59s)
23:41 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/view/resources/jquery/wikibase/jquery.wikibase.entityselector.js: T172937 T222346 Revert Close entityselector after selecting exact match (duration: 00m 51s)
23:24 chaomodus: spicerack upgraded to 0.0.25 on cumin1001 and cumin 2001
22:58 volans: uploaded spicerack_0.0.25-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
22:57 bawolff: Manually cleared extdistributor cache T188692
22:50 mutante: labweb1001/labweb1002 - remove "runJob" cron job from www-data's crontab, it is already also a systemd timer and puppet was meant to remove it (T222917)
21:27 foks: change user email for Melamrawy (WMF)@global
21:23 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikipediaAppCaptionEditCounter (T222211) (duration: 00m 52s)
19:56 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.4
19:28 XioNoX: renumber mr1-esams<->cr2-knams link to 91.198.174.224/31 - T211254
19:24 XioNoX: renumber mr1-esams<->cr1-esams link to 91.198.174.240/31 - T211254
18:22 XioNoX: simplify filter analytics-in4 term mysql-dbstore on cr1/2-eqiad
16:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Restore original weight on db1084 (duration: 00m 59s)
16:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1081 (duration: 01m 13s)
15:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1081 (duration: 01m 01s)
15:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 01m 00s)
15:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2112 (duration: 00m 59s)
15:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 56s)
15:20 marostegui: Stop mysql on db2112 for onsite work
15:16 otto@deploy1001: scap-helm eventgate-main finished
15:16 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
15:16 otto@deploy1001: scap-helm eventgate-main install -n main -f main/eqiad-values.yaml stable/eventgate [namespace: eventgate-main, clusters: eqiad]
15:13 otto@deploy1001: scap-helm eventgate-main finished
15:13 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
15:13 otto@deploy1001: scap-helm eventgate-main install -n main -f main/codfw-values.yaml stable/eventgate [namespace: eventgate-main, clusters: codfw]
15:12 papaul: shurtting down db2114 for main board replacement
14:53 otto@deploy1001: scap-helm eventgate-main finished
14:52 otto@deploy1001: scap-helm eventgate-main cluster staging completed
14:52 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
14:48 moritzm: removing unused uwsgi packages from scb* hosts
14:13 otto@deploy1001: scap-helm eventgate-main finished
14:13 otto@deploy1001: scap-helm eventgate-main cluster staging completed
14:13 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
13:34 bblack: recdns: wiping dyna.wikimedia.org from pdns-recursors
13:13 fsero: running authdns-update for new docker-registry T221101
12:49 fsero: switching traffic from old-registry to new registries registry[12]00[12] - T221101
12:01 _joe_: reenabling puppet across the fleet
11:57 jbond42: all puppetmasters and puppetdbs should be restored'
11:55 jbond42: clean up old source files sudo cumin A:puppetmaster 'rm /etc/apt/sources.list.d/component-facter3.list /etc/apt/sources.list.d/component-puppet5.list'
11:49 volans: updated netbox statues for decommissioning and spare hosts according to T222352
11:23 jbond42: running sudo apt-get install puppet-master=4.8.2-5~bpo8+1 puppet-master-passenger=4.8.2-5~bpo8+1 on labtestpuppetmaster2001
11:19 jbond42: running sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger on labpuppetmaster1001
11:18 jbond42: starting puppetdb on puppetdb2001
11:15 jbond42: run sudo apt-get install puppetdb on puppetdb2001
11:14 jbond42: ran the folloowing on puppetdb2001 sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5
11:14 jbond42: ran the folloowing on puppetmaster200{1,2} sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger
11:04 _joe_: disabling puppet across the fleet
11:02 volans: stopped ircecho to avoid spam
10:43 marostegui: Stop MySQL on db1081
10:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 57s)
10:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give API traffic to db1129 (new host on s2) (duration: 00m 57s)
10:15 _joe_: restarting low-traffic pybals in codfw, eqiad
10:05 akosiaris: restart proton on proton1001. Host Out of memory T214975
09:57 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry) (duration: 00m 06s)
09:57 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry)
09:54 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (duration: 00m 06s)
09:54 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more
09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1129 (new host on s2) (duration: 00m 57s)
09:29 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=docker-registry,name=codfw
09:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
09:12 godog: bounce rsyslog on lithium
09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 57s)
08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 (duration: 00m 55s)
08:23 elukey: upload uwsgi 2.0.14+20161117-3+deb9u2+wmf1 packages to stretch-wikimedia - T212697
08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1129 with low weight on s2 - T222682 (duration: 00m 56s)
08:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 56s)
08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db1129, db2104, db2107, db2108 T222772 T222682 (duration: 00m 57s)
08:06 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db1129, db2104, db2107, db2108 T222772 T222682 (duration: 00m 59s)
07:54 moritzm: installing jquery security updates for stretch
07:50 elukey: roll restart HDFS masters on an-master100[1,2] to pick up new logging settings
07:23 moritzm: installing twitter-bootstrap3 security updates
06:53 _joe_: restarted nagios-nrpe-server on proton1001
05:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify disk status for db2103, db2112, db2116 (duration: 00m 58s)
05:29 marostegui: Stop replication on db2098:s2
05:25 marostegui: Stop MySQL on db1076
05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
05:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2103, db2112 and db2116 into s1 T222772 (duration: 01m 41s)
05:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2103, db2112 and db2116 into s1 T222772 (duration: 01m 22s)
04:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
00:57 twentyafterfour: stopped phd, now running `puppet agent --test` manually on phab1001
00:08 twentyafterfour: phabricator upgrade successful
00:04 twentyafterfour: starting phabricator deployment, momentary downtime expected (~1 minute)

2019-05-08

23:06 krinkle@deploy1001: Synchronized php-1.34.0-wmf.3/includes/specials/SpecialWatchlist.php: T218511 / I423874 (duration: 00m 57s)
23:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/includes/Hooks.php: T219342 / 164a7c1 (duration: 00m 59s)
22:20 ejegg: re-enabled fundraising jobs
22:15 ejegg: updated SmashPig standalone install from 78b92b7fef to 88fd9650ec
22:14 ejegg: disabled fundraising jobs for SmashPig update
22:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseAdvancedSearch, no longer read; drop rcenhancedfilters from BF whitelist (duration: 00m 57s)
22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Unconditionally load AdvancedSearch everywhere, the config is always true (duration: 00m 57s)
22:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Beta Feature config cleanup: doc change plus drop advancedsearch and templatewizard-betafeature (duration: 00m 57s)
21:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: UBN T209599 ApiVisualEditor: Fix use of getBlockInfo() (duration: 00m 57s)
21:52 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/tests/phpunit/: Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 01m 09s)
21:50 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/includes/Block.php: Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 00m 59s)
21:49 niharika29@deploy1001: sync aborted: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246 (duration: 00m 03s)
21:49 niharika29@deploy1001: Started scap: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to Ie8bebd8 T222246
20:12 thcipriani: restarting gerrit due to threads stuck behind sendemail
20:10 gehel: upgrade to nodejs 10 for maps completed - T210704
20:08 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 (T215852) (duration: 00m 20s)
20:08 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 (T215852)
20:07 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 (T215852) (duration: 00m 24s)
20:07 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 (T215852)
19:58 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 (T215852) (duration: 00m 58s)
19:57 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 (T215852)
19:56 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 (T215852) (duration: 00m 59s)
19:55 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 (T215852)
19:47 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 (T215852) (duration: 00m 54s)
19:46 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 (T215852)
19:46 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 (T215852) (duration: 00m 56s)
19:45 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 (T215852)
19:35 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 (T215852) (duration: 01m 12s)
19:33 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 (T215852)
19:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 (T215852) (duration: 00m 57s)
19:31 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 (T215852)
19:26 gehel: continue upgrade to nodejs 10 for maps - T210704
19:22 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.4 (duration: 01m 48s)
19:21 cdanis: swift codfw-prod: deploy I59c88aed T221068
19:20 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.4
19:01 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be2*[4,7].codfw.wmnet' 'for DISK in /sys/block/sd*/queue/scheduler ; do echo cfq > $DISK ; done'
18:09 mutante: restarting gerrit to apply logging changes (gerrit:508391)
17:58 bblack: public authdns: deploying the big DYNA/CNAME change in https://gerrit.wikimedia.org/r/c/operations/dns/+/507399
17:44 jforrester@deploy1001: Synchronized wmf-config/extension-list: Re-sort extension-list (prod no-op) (duration: 00m 56s)
17:42 jforrester@deploy1001: Synchronized wmf-config/env.php: Clean-up: Allow for running outside the cluster for local testing (no-op for prod) (duration: 00m 56s)
17:22 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Retry: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
17:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
16:55 otto@deploy1001: scap-helm eventgate-main finished
16:55 otto@deploy1001: scap-helm eventgate-main cluster staging completed
16:55 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
16:08 gehel: restart tileratorui on maps2001 - T222801
15:59 jynus: restart db2117 after first puppet run
15:56 mforns@deploy1001: Finished deploy [analytics/refinery@698f213]: deploying analytics-refinery up to 698f213 with source=v0.0.89 (duration: 15m 38s)
15:52 gehel: reset authentication on cassandra / maps / codfw - T222801
15:40 mforns@deploy1001: Started deploy [analytics/refinery@698f213]: deploying analytics-refinery up to 698f213 with source=v0.0.89
15:19 moritzm: installing ruby-i18n security updates
15:14 moritzm: installing rails security updates
15:04 XioNoX: fix typo on asw2-ulsfo<->cr2-ulsfo interface (Xlink2 instead of Xlink1)
14:21 otto@deploy1001: scap-helm eventgate-main finished
14:21 otto@deploy1001: scap-helm eventgate-main cluster staging completed
14:21 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
14:18 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 (T215852) (duration: 00m 27s)
14:17 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 (T215852)
14:14 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 (T215852) (duration: 00m 27s)
14:14 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 (T215852)
14:05 fsero@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
14:03 gehel: starting upgrade to nodejs 10 for maps - T210704
13:50 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
13:18 ema: cp3035: restart varnish-be
12:07 kart_: EU-Midday SWAT done.
12:06 kartik@deploy1001: Synchronized php-1.34.0-wmf.3: SWAT: Log warning and show error on empty username (T222529)|gerrit:508559Log warning and show error on empty username (T222529) (duration: 07m 29s)
11:56 akosiaris@deploy1001: scap-helm cxserver finished
11:56 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml staging stable/cxserver [namespace: cxserver, clusters: codfw]
11:54 akosiaris: bump prometheus-statsd-exporter for cxserver to 0.0.5 T220709
11:54 akosiaris@deploy1001: scap-helm cxserver finished
11:54 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
11:54 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add publish restrictions config for enwiki|gerrit:495677Add publish restrictions config for enwiki (T217237) (duration: 00m 58s)
11:06 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148 (duration: 01m 30s)
11:05 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - T219148
10:17 _joe_: restarted pybal on lvs1016 to pick up changes for T222705
10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 T222682 (duration: 00m 57s)
09:51 _joe_: restarted proton on proton1001
09:50 _joe_: restarted pybal on lvs1006 to pick up changes for T222705
09:49 _joe_: restarted pybal on lvs2003 to pick up changes for T222705
09:45 marostegui: Stop replication on db2097:3311
09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 T222682 (duration: 01m 07s)
09:26 _joe_: restarting pybal on lvs2006 to pick up changes for T222705 (3/3)
09:24 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon2001 to test a uwsgi bug fix - T212697
09:12 _joe_: restarting pybal on lvs2006 to pick up changes for T222705 (2/3)
08:57 _joe_: restarting pybal on lvs2006 to pick up changes for T222705
08:56 godog: upload prometheus-statsd-exporter 0.9.0+ds1-1 to stretch-wikimedia T220709
08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1131 into s6 with low weight T222682 (duration: 00m 51s)
08:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1131 into s6 with low weight T222682 (duration: 00m 53s)
08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1093 (duration: 00m 58s)
08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1093 (duration: 00m 58s)
07:49 marostegui: Stop replication s1 on db2102
07:45 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon1002 to test a uwsgi bug fix - T212697
07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 57s)
07:41 vgutierrez: upgrading pybal to version 1.15.6 in lvs1001 - T222705
07:40 godog: bounce prometheus on bast3002 to finalize migration
07:37 vgutierrez: upgrading pybal to version 1.15.6 in lvs1004 - T222705
07:33 vgutierrez: upgrading pybal to version 1.15.6 in lvs1002 - T222705
07:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2115 into x1 T222772 (duration: 00m 56s)
07:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2115 into x1 T222772 (duration: 01m 09s)
07:26 vgutierrez: upgrading pybal to version 1.15.6 in lvs1005 - T222705
07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 56s)
07:21 vgutierrez: upgrading pybal to version 1.15.6 in lvs1016 - T222705
07:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1127 and db1137 into x1 T222682 (duration: 00m 56s)
07:14 vgutierrez: upgrading pybal to version 1.15.6 in lvs1006 - T222705
07:13 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1127 and db1137 into x1 T222682 (duration: 01m 03s)
07:04 vgutierrez: upgrading pybal to version 1.15.6 in lvs2001 - T222705
07:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs2004 - T222705
06:58 vgutierrez: upgrading pybal to version 1.15.6 in lvs2002 - T222705
06:51 vgutierrez: upgrading pybal to version 1.15.6 in lvs2005 - T222705
06:42 vgutierrez: upgrading pybal to version 1.15.6 in lvs2003 - T222705
06:36 vgutierrez: upgrading pybal to version 1.15.6 in lvs3001 - T222705
06:32 vgutierrez: upgrading pybal to version 1.15.6 in lvs3003 - T222705
06:29 elukey: restart uwsgi-netbox on netmon1002 after the daily segfault (upon restart)
06:29 vgutierrez: upgrading pybal to version 1.15.6 in lvs3002 - T222705
06:24 vgutierrez: upgrading pybal to version 1.15.6 in lvs3004 - T222705
06:20 marostegui: Stop MySQL on db2096
06:19 vgutierrez: upgrading pybal to version 1.15.6 in lvs4005 - T222705
06:16 vgutierrez: upgrading pybal to version 1.15.6 in lvs4006 - T222705
06:13 vgutierrez: upgrading pybal to version 1.15.6 in lvs4007 - T222705
06:07 vgutierrez: upgrading pybal to version 1.15.6 in lvs5001 - T222705
06:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs5002 - T222705
05:59 vgutierrez: upgrading pybal to version 1.15.6 in lvs5003 - T222705
05:48 vgutierrez: upgrading pybal to version 1.15.6 in lvs2006 - T222705
05:25 marostegui: Stop MySQL on db1093
05:01 marostegui: Optimize tables on pc1007
05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 00m 59s)

2019-05-07

23:31 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T220625 Configure wgCirrusSearchPrivateClusters (duration: 00m 58s)
22:06 ppchelko@deploy1001: Finished deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956 (duration: 18m 12s)
21:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956
21:47 ppchelko@deploy1001: deploy aborted: Do not cache html if stash was requested T215956 (duration: 00m 12s)
21:47 ppchelko@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Do not cache html if stash was requested T215956
21:46 mutante: deploy1001 - renabled puppet - deployment can go ahead
21:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -p80 -b10 'C:profile::mediawiki::php and *.codfw.wmnet' 'run-puppet-agent' 'systemctl reload php7.2-fpm.service'
20:43 mutante: gerrit2001 - restarting apache.. failed
20:38 ejegg: updated payments-wiki from 558427f731 to 6e0172bac3
20:31 mutante: gerrit2001 - temp disabling puppet - testing apache rewrites for T218844 on non-prod host
20:14 mutante: deploy1001 - temp disabled puppet - debugging issue with apache-fast-test script
19:52 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.4
19:42 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache (duration: 28m 55s)
19:13 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache
19:04 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.22 (duration: 02m 15s)
18:50 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.21 (duration: 08m 48s)
18:38 mutante: LDAP - adding awight to 'wmde' group (T222538)
18:08 mutante: restarting icinga via web UI button
17:45 thcipriani: starting branchcut for train (1.34.0-wmf.4)
17:31 arturo: rebooting cloudvirt1024 to test interfaces configuration
16:59 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
16:39 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
16:38 arturo: rebooting cloudvirt1024 to test interfaces configuration
16:05 fsero: created eventgate-main tokens - T218346
16:05 fsero: created eventgate-main tokens
15:47 fsero: creating eventgate-main namespace on k8s clusters
15:38 vgutierrez: uploaded pybal 1.15.6 to apt.wikimedia.org (stretch && jessie)
15:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/CirrusSearch/maintenance/forceSearchIndex.php: T222641: Cirrus maint script handle ancient logging rows (duration: 00m 52s)
14:53 cdanis: pool mw1271
14:53 cdanis: pool mw1256
14:44 cdanis: cdanis@mw1256.eqiad.wmnet ~ % sudo php7adm /opcache-free
14:43 cdanis: cdanis@mw1271.eqiad.wmnet ~ % sudo php7adm /opcache-free
14:40 vgutierrez: uploaded pybal 1.15.5 to apt.wikimedia.org (stretch && jessie)
14:26 _joe_: repooling mw1320
14:25 _joe_: resetting opcache on mw1320
14:13 vgutierrez: uploaded pybal 1.15.4 to apt.wikimedia.org (stretch)
14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1256.eqiad.wmnet
14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1271.eqiad.wmnet
14:09 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1320.eqiad.wmnet
14:09 cdanis: depool mw1320
14:07 otto@deploy1001: scap-helm eventgate-analytics finished
14:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
14:07 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/eqiad-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
14:02 otto@deploy1001: scap-helm eventgate-analytics finished
14:02 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
14:02 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
14:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
14:01 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
13:59 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
13:58 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
13:57 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
13:50 vgutierrez: uploaded prometheus-trafficserver-exporter 0.2.3 to apt.wikimedia.org (stretch) - T221217
13:45 marostegui: Stop MySQL and poweroff db1093 for BBU replacement - T222127
13:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 for BBU replacement T222127 (duration: 00m 51s)
13:37 otto@deploy1001: scap-helm eventgate-analytics finished
13:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
13:17 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be1*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
13:08 ema: sudo ipmitool -I lanplus -H cp2009.mgmt.codfw.wmnet -U root mc reset cold T222459
13:07 ema: sudo ipmitool -I lanplus -H "cp2009.mgmt.codfw.wmnet" -U root -E chassis power cycle T222459
13:02 cdanis: T221904 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be2*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
12:45 jynus: remove dbstore1001, dbstore2001, dbstore2002 from tendril and zarcillo T220002
12:09 marostegui: Stop Replication on db1140:3320 to provision db1127 and db1137 T222682
11:16 hashar: Downgraded Zuul back to 2.5.1-wmf7 # T105474 T140297
11:08 hashar: Upgraded Zuul and it is broken. So downgrading back :-(
10:51 hashar: Gracefully stopping Zuul for upgrade
10:46 mlitn@deploy1001: Finished scap: SDC: Enable Depicts in UploadWizard on Commons (duration: 22m 45s)
10:40 ema: libvmod-uuid 1.4-1 uploaded to stretch-wikimedia T221977
10:23 mlitn@deploy1001: Started scap: SDC: Enable Depicts in UploadWizard on Commons
10:16 hashar: contint1001: upgrading python-pbr from 0.8.2-1 to 1.10.0-1 , no more needed with recent Zuul # T218559
10:16 hashar: contint1001, contint2002: rm /etc/apt/preferences.d/python_pbr.pref /etc/apt/preferences.d/python-pbr.pref # T218559
10:08 jbond42: upload zull_2.5.1-wmf8 package to jessie-wikimedia
09:51 godog: test statsd-exporter 0.9 upgrade on deployment-imagescaler03 - T220709
09:47 jbond42: restart pdfrender on scb1004 - T174916
08:51 arturo: T222685 remove facter from jessie-wikimedia/openstack-mitaka-jessie
08:39 ema: repool cp1083 T222620
07:59 moritzm: updating base-files from recent stretch point release
07:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - T216636 (duration: 24m 46s)
07:27 godog: upgrade prometheus on bast3002 - T187987
07:26 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - T216636
07:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API (duration: 03m 02s)
07:21 marostegui: Optimize tables on pc1010
07:18 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API
06:59 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
06:44 elukey: restart uwsgi-netbox on netmon1002 after segfault
05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2045 to codfw x1 master T219493 (duration: 00m 55s)
05:12 marostegui: Change topology on x1 codfw to promote db2045 to master T219493
02:12 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use Preprocessor_Hash unconditionally (duration: 00m 52s)
00:53 mutante: install2002 - disabling puppet, live hacking DHCP config for db2103 to not serve installer via http to debug install issue for T221532 which seems like T190424#4548003
00:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy fix for visual diffs on mobile in non-section mode T222489 (duration: 00m 53s)
00:32 ejegg: disabled fundraising scheduled jobs for CiviCRM maintenance

2019-05-06

23:25 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/503546/ (duration: 00m 50s)
22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync. (duration: 03m 53s)
22:43 RoanKattouw: Running refreshMessageBlobs.php on all wikis for T222539
22:42 crusnov@deploy1001: Started deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync.
21:59 mutante: LDAP - remove 'sukhe' from 'nda' and add to 'wmf' instead (T221990)
21:24 cdanis: experimenting with different disk scheduler on ms-be2014 -- cdanis@ms-be2014.codfw.wmnet ~ % for D in /sys/block/sd*/queue/scheduler ; echo cfq | sudo tee $D
21:15 godog: swift codfw-prod: push up-to-date rings, mistakenly pushed earlier an older version
19:48 gehel: rolling restart of cassandra on maps* fro config change
19:47 RoanKattouw: Running recomputeNotifCounts.php --notif-types=login-success on all Echo wikis for T220762
19:31 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be1*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl restart swift-object-replicator'
19:22 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be2*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl systemctl restart swift-object-replicator'
19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Begin homepage experiment on cswiki and kowiki (T221266) (duration: 00m 51s)
18:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Remove link to pageviews tool when no data available (T222405) (duration: 00m 52s)
18:32 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/skins/MinervaNeue/includes/menu/Definitions.php: Harden Definitions::insertCommunityPortal() method (T222407) (duration: 00m 53s)
18:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be*' 'disable-puppet "cdanis rollout I369f9b29"'
18:24 jynus: restart and upgrade db1116
18:14 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Set $wgOresFrontendBaseUrl (T219396) (duration: 00m 51s)
17:53 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
17:52 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
17:19 elukey: restart netbox on netmon1002 as test
17:11 jynus: restart dbprov* hosts, in sequence, for kernel upgrade
16:42 jynus: restart db1114 mysql for upgrade testing
16:38 andrewbogott: re-imaging cloudvirt1024
16:34 jynus: restart db2102 mysql for upgrade testing
16:11 hashar: CI queue drained. Should be working fine again now
15:57 hashar: CI / Zuul is being slowed down and being investigated
15:48 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
15:37 moritzm: updating firmware-bnx2 (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2 firmware)
15:35 papaul: shutting down elastic2038 for DIMM swap
15:30 moritzm: updating base-files from recent stretch point release
15:14 ema: pool cp4026 w/ ATS backend T219967
14:57 godog: capture strace / core for rsyslog on wezen / lithium and restart - T199406
14:42 ema: powercycle cp1083
14:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1083.eqiad.wmnet
14:35 godog: swift eqiad-prod: finish decom ms-be101[45] - T220590
14:25 moritzm: installing vips security updates
14:19 ema: depool cp4026 and reimage as upload_ats T219967
14:11 otto@deploy1001: scap-helm eventgate-analytics finished
14:11 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
14:11 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
14:09 hashar: CI workflow fixed by reverting a change deployed around 10:00 UTC # T222614
14:03 ema: cp3038: restart varnish-be
13:56 otto@deploy1001: scap-helm eventgate-analytics finished
13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
13:56 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
13:54 moritzm: installing zziplib security updates
13:52 hashar: CI does not run sometime for some reason ... https://phabricator.wikimedia.org/T222614 :(
13:22 moritzm: installing audiofile security updates
13:20 moritzm: installing unzip security updates
12:43 moritzm: installing rsync security updates
12:24 moritzm: installing golang security updates on jessie
11:44 Amir1: EU SWAT is done
11:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Suggestion Constraint Status on Wikidata|gerrit:508303Enable Suggestion Constraint Status on Wikidata (duration: 00m 52s)
11:32 arturo: reverting puppet change to the sudo module
11:17 arturo: merging puppet change to the sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/507376
10:59 ema: manual puppet-merge $sha on failed puppetmasters https://phabricator.wikimedia.org/P8477
10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:508302 Bumping portals to master (T128546) (duration: 00m 51s)
10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:508302 Bumping portals to master (T128546) (duration: 00m 52s)
10:05 arturo: upgrade udev in cloudservices2002-dev
09:59 arturo: T222148 upgrade udev & libudev1 on cloudvirt[1001-1003,1005].eqiad.wmnet
09:35 elukey: restart netbox on netmon1002 (trying to reproduce the segfault) - T212697
09:03 godog: upgrade labmon1001 to prometheus 2 - T187987
06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 52s)
05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 58s)
04:08 ariel@deploy1001: Finished deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs (duration: 00m 05s)
04:08 ariel@deploy1001: Started deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs

2019-05-05

14:42 elukey: restart pdfrender on scb1004
03:10 chaomodus: fyi scb* flapping on some endpoints seems to be just noise, there is high load from mobileapi but things appear to be operating normally otherwise, several boxes are in the process of checking md which may account for service lags
02:40 andrewbogott: restarting mariadb on cloudservices1003

2019-05-04

22:20 reedy@deploy1001: Synchronized docroot/mediawiki/xml/index.html: Add extra xml namespace links (duration: 01m 06s)
10:38 ariel@deploy1001: Finished deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis (duration: 00m 09s)
10:38 ariel@deploy1001: Started deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis

2019-05-03

23:50 thcipriani: gerrit back
23:49 thcipriani: gerrit restart due to threads piling up
22:09 XioNoX: clear v4 BGP to AS17451 on cr1-eqsin/cr4-ulsfo
17:16 arturo: T222148 aborrero@labstore1005:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
17:15 arturo: T222148 aborrero@labstore1004:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
17:11 arturo: T222148 aborrero@labpuppetmaster1002:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
17:10 arturo: T222148 aborrero@labpuppetmaster1001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
17:09 arturo: T222148 aborrero@labtestpuppetmaster2001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
17:08 arturo: T222148 drop libudev1 from openstack-mitaka-jessie/jessie-wikimedia (related to T216497)
17:07 arturo: T222148 drop udev from openstack-mitaka-jessie/jessie-wikimedia (related to T216497)
15:02 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=parsoid,dc=codfw
15:02 _joe_: repooling the wtp* servers depooled in codfw for load testing
14:56 _joe_: repool mw1275
13:49 jijiki: Restart npre on proton1001
12:26 gehel: replaying 30 minutes of eqiad search traffic on codfw - T221121
12:21 ema: cp3038: varnish-backend-restart
11:10 _joe_: purging opcache on mw1275
10:47 ema: pool cp4025 w/ ATS backend T219967
10:43 jbond42: T220380 remove zull_2.5.0-8-gcbc7f62-wmf4jessie1 from jessie-wikimedia/thirdparty
10:42 jbond42: T220380 upload zull_2.5.1-wmf7 to jessie-wikimedia
10:25 jijiki: Depool mw1275
10:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/WikibaseLexemeCirrusSearch/: Fix reference to classes that moved (T222347)|gerrit:507847Fix reference to classes that moved (T222347) (duration: 00m 55s)
09:49 ema: depool cp4025 and reimage as upload_ats T219967
09:49 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[3-4].*
09:21 gehel: ban elastic2038 from elastic clusters pending memory issue investigation - T217398
08:47 ema: pool cp4024 w/ ATS backend T219967
08:27 jynus: starting table recompression on new backup source hosts on eqiad and codfw (stop replication) T220572
07:45 ema: depool cp4024 and reimage as upload_ats T219967
07:16 ema: cp1089: varnish-backend-restart
05:32 _joe_: restarting varnish backend on cp1077
05:05 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[5-6].*
04:57 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp20(1[7-9]|20).*
04:55 _joe_: progressively depooling parsoid servers in codfw to assess load tolerance
00:32 mutante: powercycling elastic2038
00:10 XioNoX: remove static route to 208.80.155.128/25 on cr1/2-eqiad - T193496
00:06 mutante: restarting gerrit to pick up config changes for 2 mail threads and lower timeout (gerrit:507852, gerrit: 507853)

2019-05-02

22:10 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/MobileFrontend/resources/dist/mobile.editor.overlay.js: Hot-deploy T222229 to fix VE switching on MobileFrontend (duration: 00m 52s)
21:21 thcipriani: gerrit back
21:20 ejegg: updated payments-wiki from aa8dad50e7 to 558427f731
21:19 thcipriani: gerrit restart to pick up config changes: https://gerrit.wikimedia.org/r/504973/ and https://gerrit.wikimedia.org/r/507858/
21:00 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351 (duration: 01m 48s)
20:58 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351
20:58 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351 (duration: 00m 33s)
20:57 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - T222351
19:41 ejegg: updated CiviCRM from 01c4d15c9a to 5024c968ed
19:40 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: Hot-deploy T222329 fix part 2 (duration: 00m 50s)
19:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/includes/widget/SearchInputWidget.php: Hot-deploy T222329 fix part 1 (duration: 00m 53s)
19:31 James_F: Shuffled 1.34.0-wmf.3 security patch cee0e569f4 for T222324 into the tip of the upstream branch now it's merged; no-op
19:27 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.3
19:03 mutante: phab2001 - apt-get autoremove ..removes a single python package not needed anymore
19:00 mutante: phab1001 - upgrading PHP packages on prod phab server
18:59 jynus: restart dbstore1001 for upgrade
18:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Don't fatal on deleted pages in 'recent questions' (T222206) (duration: 01m 01s)
18:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics on all wikis (T214080) (duration: 00m 58s)
18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SpecialHomepage on cswiki and kowiki (T221266) (duration: 00m 58s)
18:09 mutante: phab1001 - install package upgrades for bash and cron
17:46 sbassett: Deployed patch for T222324 (1.34.0-wmf.3)
17:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@414387b]: Updating Parsoid to 9786781 (duration: 05m 45s)
17:39 arlolra@deploy1001: Started deploy [parsoid/deploy@414387b]: Updating Parsoid to 9786781
16:42 gehel: replaying 30 minutes of eqiad search traffic on codfw - T221121
16:10 jynus: restarted dbproxy1005 haproxy, weird connection issue
15:42 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-enable account creation on wikitech (duration: 00m 57s)
15:40 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Invalidate user sessions upon blocking on wikitech (duration: 00m 59s)
15:15 chasemp: add dsharpe to content admin on wikitech for user blocking
12:42 jynus: stopping several instances at dbstore1001 to clone them to db1139/40 T220572
12:06 ema: swift-proxy rolling restart T222071
12:01 ema: restart swift-proxy on ms-fe1005 T222071
10:37 ariel@deploy1001: Finished deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120 (duration: 00m 15s)
10:36 ariel@deploy1001: Started deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120
10:04 ema: pool cp4023 w/ ATS backend T219967
09:41 jynus: testing backups on db2102 (increased network and disk usage) T220572
09:07 jynus: reboot db2102 T220572
09:02 ema: depool cp4023 and reimage as upload_ats T219967
09:02 godog: rollout rsyslog upgrade 8.1901.0-1~bpo9+wmf1 to eqiad
08:55 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 5% of anonymous users to PHP7.2 - T219150 (duration: 01m 03s)
08:49 jijiki: Sending more traffic to PHP7.2 - T219150
04:28 andrewbogott: upgraded mediawiki on wikitech-static to 1.32.1
04:25 kart_: Updated cxserver to 2019-05-02-040910-production (T222305)
04:23 andrewbogott: apt-get upgrade on wikitech-static
04:18 kartik@deploy1001: scap-helm cxserver finished
04:18 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
04:18 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
04:16 kartik@deploy1001: scap-helm cxserver finished
04:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
04:16 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
04:15 kartik@deploy1001: scap-helm cxserver finished
04:15 kartik@deploy1001: scap-helm cxserver cluster staging completed
04:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
00:35 eileen: civicrm revision changed from 3414657d36 to 01c4d15c9a, config revision is 2119df9495

2019-05-01

23:35 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Drop RENDER_NOW for impact module images (T222223) (duration: 01m 04s)
23:19 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic for group0 (duration: 01m 05s)
22:07 mutante: LDAP - adding jaufrecht to wmf (T222214)
21:57 ebernhardson: start importing group2 to cloudelastic in parallel with group1
21:18 ebernhardson: start importing group1 into cloudelastic from mwmaint1002
20:15 halfak@deploy1001: Finished deploy [ores/deploy@52e9759]: T222121 (duration: 14m 03s)
20:01 halfak@deploy1001: Started deploy [ores/deploy@52e9759]: T222121
19:17 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.3 (duration: 01m 53s)
19:15 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.3
17:59 elukey: force remount of /mnt/hdfs on notebook1003 (fuse hdfs got stuck)
17:43 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed (duration: 03m 15s)
17:40 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed
17:27 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train (duration: 25m 18s)
17:02 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train
16:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Start writing to cloudelastic from testwiki (duration: 01m 01s)
16:52 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.QuestionPosterDialog.js: SWAT: Ensure text exists before logging enter-question-text action|gerrit:507598Ensure text exists before logging enter-question-text action (duration: 01m 00s)
16:48 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: Re-use timestamp for section header and question storage|gerrit:507593Re-use timestamp for section header and question storage (duration: 01m 01s)
16:41 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: Re-use timestamp for section header and question storage|gerrit:507593Re-use timestamp for section header and question storage (duration: 01m 01s)
16:23 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Mentorship.js: SWAT: Mentorship module: Add data-link-id to mentor's talkpage link|gerrit:507580Mentorship module: Add data-link-id to mentor's talkpage link (duration: 01m 01s)
16:17 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable cirrussearch-request logging to eventgate-analytics for group1 wikis|gerrit:507550Enable cirrussearch-request logging to eventgate-analytics for group1 wikis (duration: 01m 00s)
15:58 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Re-enable password reset on wikitech (duration: 00m 58s)
14:54 reedy@deploy1001: Synchronized wmf-config/wikitech.php: propagate blocks to gerrit (duration: 00m 57s)
14:52 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new logging channel for wikitech (duration: 00m 58s)
13:57 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209572 Disable Reporting API endpoint (duration: 00m 59s)
13:31 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209572 Enable Feature Policy Reporting origin trial (duration: 01m 01s)
13:28 jbond42: update puppet and facter on esams
12:53 gehel: start recording 30 minutes of traffic from elasticsearch eqiad - T221121
11:27 gilles: T216499 Y216594 T216598 mwscript purgeList.php ruwiki --all --verbose
11:22 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216598 T216594 Renew origin trial tokens for ruwiki (duration: 01m 14s)
01:01 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5d619e4]: Update spec x-amples (duration: 03m 58s)
00:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5d619e4]: Update spec x-amples
00:30 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s)
00:30 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s