Server Admin Log/Archive 37

2019-04-30

23:56 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
23:56 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
23:49 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 04s)
23:49 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
23:35 ariel@deploy1001: Finished deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count (duration: 00m 03s)
23:35 ariel@deploy1001: Started deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count
23:18 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
23:18 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
23:07 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481 (duration: 00m 05s)
23:07 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207481
22:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956 (duration: 23m 56s)
21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956
21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too (duration: 03m 22s)
21:52 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too
21:44 sbassett: Deployed patch for T222038 (1.34.0-wmf.1 and 1.34.0-wmf.3)
21:44 sbassett: Deployed patch for T222036 (1.34.0-wmf.1 and 1.34.0-wmf.3)
21:13 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.3
21:10 mutante: netmon1002 - apt-get remove --purge php 7.0* ; apt-get install php-common php-pear (pending upgrades) | netmon2001: apt autoremove
21:06 mutante: netmon2001 - apt-get install php-common php-pear (pending upgrades)
21:03 mutante: netmon2001 - apt-get remove --purge php7.0*
21:03 mutante: librenms - switched from PHP 7.0 to PHP 7.2 succesful now. reverted manual changes for debugging on netmon1002
20:29 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache (duration: 31m 17s)
20:21 mutante: netmon1002 - loading PHP 7.2 module to debug issue for librenms. librenms very short downtime
19:58 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache
19:56 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 (duration: 02m 07s)
19:47 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 (duration: 02m 24s)
19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182 (duration: 09m 17s)
19:44 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 (duration: 02m 25s)
19:43 mutante: switched netmon1002/netmon2001 from PHP 7.0 to 7.2 but reverted because LibreNMS still had an issue with it
19:40 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 10m 11s)
19:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes T222133, T222129, T222181, T222182
19:27 otto@deploy1001: scap-helm eventgate-analytics finished
19:27 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
19:27 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
19:26 otto@deploy1001: scap-helm eventgate-analytics finished
19:26 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
19:26 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
19:25 otto@deploy1001: scap-helm eventgate-analytics finished
19:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:40 cdanis: running puppet on ms-be201[3,5] to bump replication concurrency T221068
18:24 cdanis: running puppet on ms-be2014 to bump replication concurrency T221068
18:09 thcipriani: start branchcut for 1.34.0-wmf.3
17:16 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837) (duration: 04m 16s)
17:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1f09e44]: Update mobileapps to 142ba30 (T217837)
16:57 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 09s)
16:57 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
16:52 arturo: merging change to `profile::base` and `::raid` https://gerrit.wikimedia.org/r/c/operations/puppet/+/507357 related to T221225
16:36 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706 (duration: 00m 11s)
16:36 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - T207706
16:27 XioNoX: upgrade librenms to 1.51
16:26 jbond42: upgrade puppet and facter in eqsin
16:04 ema: pool cp4022 w/ ATS backend T219967
15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
15:45 elukey: restart hadoop hdfs namenodes on an-master100[1,2] to pick up new logging settings - T220702
15:18 jynus: stop s8 instance on dbstore2001 for cloning to db2100 T220572
15:09 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 1% of anonymous users to PHP7.2 - T219150 (duration: 00m 54s)
14:58 jbond42: enable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
14:56 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast3002*' 'run-puppet-agent --enable "filippo prometheus"'
14:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'labmon1001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
14:44 jijiki: Sending 1% of anonymous users to PHP7.2 - T219150
14:43 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast5001*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
14:26 jbond42: disable-puppet "T220987: global kafaka log shipping - staged rollout (jbond)"
14:24 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2004*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
14:17 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2003*' 'run-puppet-agent --enable "staged rollout T222105 by cdanis"'
14:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo enable-puppet 'cdanis testing original query.max-samples T222105'
13:29 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
13:28 ema: depool cp4022 and reimage as upload_ats T219967
13:20 arturo: reverting sudo puppet module changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/507317
13:16 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
13:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo disable-puppet 'cdanis testing original query.max-samples T222105'
13:08 cdanis: OOMed the eqiad ops prometheus @ prometheus1003
13:02 cdanis: OOMed the eqiad ops prometheus @ prometheus1004
12:47 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
12:41 arturo: merging a sudo puppet module change
12:39 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout T222105 by cdanis"
12:34 elukey: moved /home to /srv/home (more space in a dedicated partition) on stat1005
12:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'R:prometheus::server' 'disable-puppet "staged rollout T222105 by cdanis"'
11:27 Lucas_WMDE: EU SWAT done
11:22 mlitn@deploy1001: Synchronized wmf-config/CommonSettings.php: Allow cross-site requests from mobile domains (duration: 00m 52s)
11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Commons (T138104)|gerrit:507032Serialize empty lists as objects on Commons (T138104) (duration: 00m 54s)
11:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Serialize empty lists as objects on Wikidata (T138104)|gerrit:507031Serialize empty lists as objects on Wikidata (T138104) (duration: 00m 55s)
11:08 gilles@deploy1001: Finished deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4 (duration: 00m 05s)
11:08 gilles@deploy1001: Started deploy [performance/navtiming@d6756c0]: T221848 Proper fix for partitions_for_topic in python-kafka > 1.4.4
11:02 ema: cp3038 mbox lag, restarting varnish-be
10:55 kart_: Updated cxserver to 2019-04-30-055331-production (T219412)
10:49 santhosh@deploy1001: scap-helm cxserver finished
10:49 santhosh@deploy1001: scap-helm cxserver cluster codfw completed
10:49 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
10:48 santhosh@deploy1001: scap-helm cxserver finished
10:48 santhosh@deploy1001: scap-helm cxserver cluster eqiad completed
10:48 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
10:45 santhosh@deploy1001: scap-helm cxserver finished
10:45 santhosh@deploy1001: scap-helm cxserver cluster staging completed
10:45 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
10:32 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in codfw
10:32 arturo: T222060 reimaged labtestservices2003 as stretch spare system
10:32 arturo: T222057 reimaged labtestvirt2003 as spare system
10:12 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in eqsin / ulsfo / esams
10:08 jynus: stop s7 and x1 instances on dbstore2* for cloning T220572
09:31 fsero@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=docker-registry,service=docker-registry
09:26 fsero: creating lvs endpoints for docker registry - T221101
09:02 elukey: roll restart hdfs namenodes on an-master100[1,2] to pick up new settings - T220702
08:22 godog: bounce prometheus on bast4002 after backfill has finished - T187987
08:11 gilles@deploy1001: Finished deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found (duration: 00m 05s)
08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Default to partition 0 when no partition is found
08:11 gilles@deploy1001: deploy aborted: T221848 Defalt to partition 0 when no partition is found (duration: 00m 00s)
08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: T221848 Defalt to partition 0 when no partition is found
07:53 gilles@deploy1001: Finished deploy [performance/navtiming@e900152]: T221848 add more logging around startup (duration: 00m 05s)
07:53 gilles@deploy1001: Started deploy [performance/navtiming@e900152]: T221848 add more logging around startup
07:29 moritzm: installing systemd updates for jessie
07:24 marostegui: Remove labservices1001 and labservices1002 from tendril T221857
05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1093's status (duration: 00m 51s)
05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db1093's status (duration: 00m 55s)
04:26 mutante: LDAP - remove user pirroh from group nda (T222085 and cross-validate-accounts demands consistency)
02:23 mutante: analytics1050 - systemctl start mclog ... it was failed like recently on analytics1052 (T212219 ?)
02:09 tgr@deploy1001: Synchronized wmf-config/db-eqiad.php: SWAT: depool db1093|gerrit:507237depool db1093 (duration: 00m 54s)
01:30 mutante: contint2001..then contint1001 - deleting /etc/zuul/wikimedia and letting puppet re-clone it (gerrit:507070) (T218844)

2019-04-29

23:59 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (5/5) (duration: 00m 52s)
23:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (4/5) (duration: 00m 52s)
23:56 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (3/5) (duration: 00m 50s)
23:55 ebernhardson@deploy1001: Synchronized wmf-config/LabsServices.php: T220625 Add cloudelastic servers to wgCirrusSearchClusters (2/5) (duration: 00m 52s)
23:54 ebernhardson@deploy1001: Synchronized tests/: T220625 Add cloudelastic servers to wgCirrusSearchClusters (1/5) (duration: 00m 53s)
23:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix (duration: 31m 04s)
23:33 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221154: Add static.inaturalist.org to $wgCopyUploadDomains for Commons (duration: 00m 54s)
23:03 smalyshev@deploy1001: Started deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix
21:13 mutante: restarting gerrit
21:10 mutante: cobalt (gerrit) upgrading openjdk 8 minor version
20:40 arlolra: Updated Parsoid to c9dab9d (T106578, T113194, T205338, T219072, T219938, T221384, T219943)
20:37 XioNoX: add BGP session to AS4922 in eqiad
20:37 RoanKattouw: Deployed patch for T222014
20:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d (duration: 06m 36s)
20:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[5-9].eqiad.wmnet
20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@7859b58]: Updating Parsoid to c9dab9d
20:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[5-9].eqiad.wmnet
20:18 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[0-4].eqiad.wmnet
20:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[0-4].eqiad.wmnet
20:08 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[5-9].eqiad.wmnet
19:59 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[5-9].eqiad.wmnet
19:52 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[1-4].eqiad.wmnet
19:44 thcipriani: gerrit back
19:44 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[1-4].eqiad.wmnet
19:44 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[4-8].eqiad.wmnet
19:43 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/327763 T221026
19:39 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
19:39 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[0-3].eqiad.wmnet
19:36 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
19:35 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[5-9].eqiad.wmnet
19:32 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[5-9].eqiad.wmnet
19:31 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
19:26 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-4].eqiad.wmnet
19:26 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
19:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[8-9].eqiad.wmnet
19:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[8-9].eqiad.wmnet
19:20 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[0-5].eqiad.wmnet
19:17 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-5].eqiad.wmnet
19:07 otto@deploy1001: sync-file aborted: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 02s)
19:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - T214080 (duration: 00m 53s)
19:01 ottomata: deploying config change to enable cirrusssearch-request logging to eventgate-analytics for group0 wikis - T214080
18:59 RoanKattouw: Deployed patch for T221739
18:45 otto@deploy1001: scap-helm eventgate-analytics finished
18:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
18:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
18:44 otto@deploy1001: scap-helm eventgate-analytics finished
18:44 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
18:44 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
18:42 catrope@deploy1001: Synchronized static/images/project-logos/: Change wikimaniawiki logo to Wikimania 2019 version (T221829) (duration: 00m 54s)
18:41 otto@deploy1001: scap-helm eventgate-analytics finished
18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:41 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[8-9].eqiad.wmnet
18:37 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[8-9].eqiad.wmnet
18:37 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Commons (T138104) (duration: 00m 54s)
18:34 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[1-6].eqiad.wmnet
18:33 otto@deploy1001: scap-helm eventgate-analytics finished
18:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:30 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Wikidata (T138104) (duration: 00m 53s)
18:29 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
18:26 otto@deploy1001: scap-helm eventgate-analytics finished
18:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:22 Jeff_Green: authdns-update for T221475
18:21 catrope@deploy1001: Synchronized docroot/noc: Publish throttle-analyze at noc (T187894) (duration: 00m 53s)
18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains (T220704) (duration: 00m 53s)
17:35 Jeff_Green: authdns-update to deploy T214525
17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates (duration: 06m 58s)
17:08 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates
16:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Drop wmgMediaInfoEnableUploadWizardDepicts from IS (duration: 00m 53s)
16:34 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 53s)
16:33 jforrester@deploy1001: sync-file aborted: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 01s)
16:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Add wmgMediaInfoEnableUploadWizardDepicts to IS (duration: 00m 53s)
16:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable feature flag for depicts in UW on Test Commons (duration: 00m 53s)
15:40 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks counter config (T221951) (duration: 00m 58s)
14:49 herron: added uid=sukhe,ou=people,dc=wikimedia,dc=org to nda ldap group T221990
13:56 jbond42: rolling security updates for imagemagick
13:45 fsero: DNS: creating docker-registry.svc.(eqiad|codfw).wmnet RRs
13:17 jbond42: rolling security updates for libpng
12:46 godog: resume rollout rsyslog 8.1901.0-1 to jessie hosts - T219764
12:07 jynus: stop dbstore2002:s3 and dbstore2001:s5 for cloning to db2098/99 T220572
11:56 kart_: EU-Midday SWAT done. Thanks.
11:56 kartik@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ContentTranslation: SWAT: 506971|Change the way we calculate total unmodified MT (T221930) (duration: 00m 56s)
11:30 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 505765|Add namespace "Aldono" at eo.wiktionary (T221525) (duration: 00m 54s)
11:21 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 506939| (T222018) (duration: 00m 53s)
11:14 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 506860|Allow admins to add or remove patroller group at enwikivoyage (T222008) (duration: 00m 55s)
09:27 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis (duration: 28m 19s)
09:13 jynus: stop dbstore2002:s4 for cloning to db2099 T220572
08:59 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis
08:39 godog: begin migration of bast4002 to prometheus v2 - T187987
08:38 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) (duration: 15m 38s)
08:33 elukey: restart keyholder on deploy1001 + rearm keys
08:28 elukey: restart keyholder-proxy on deploy1001 (attempt to see if new analytics scap settings got applied)
08:25 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable unicode overrides table for php 7.2 T219279 (duration: 00m 53s)
08:25 jynus: stop dbstore2001:s2 for cloning to db2098 T220572
08:23 oblivian@deploy1001: Synchronized wmf-config/Php72ToUpper.php: Adding unicode overrides table for php 7.2 T219279 (duration: 00m 54s)
08:23 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy)
07:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2045 from s8 to x1 T219493 (duration: 00m 55s)
07:47 marostegui: Stop mysql on db2034 (lag will happen on x1 codfw) - T219493
07:44 marostegui: Stop replication on db2034 (x1 master) for maintenance - T219493
07:13 moritzm: updated stretch netboot image for 9.9 point release

2019-04-28

17:46 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3037.esams.wmnet
17:46 jijiki: Depooling cp3037 - server and mgmt is unreachable
14:55 James_F: Updated trwiki's MediaWiki:Common.css to not over-ride the logo.
14:53 James_F: Manually purged the trwiki logos from Varnish as part of updating them for 2 year anniversary.
14:47 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki.png: trwiki: Update logo for 2 year anniversary, part III (duration: 00m 53s)
14:45 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-1.5x.png: trwiki: Update logo for 2 year anniversary, part II (duration: 00m 53s)
14:44 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-2x.png: trwiki: Update logo for 2 year anniversary, part I (duration: 00m 55s)

2019-04-27

17:44 elukey: restart pdfrender on scb1002 (alert flapping)
12:37 jynus: correcting last log, stopping dbstore2002:s1 to clone it to db2097 T220572
12:37 jynus: stopping dbstore2002:s6 to clone it to db2097 T220572
00:11 foks: reset passwords for FritzSolms@global and Seanhood@global

2019-04-26

20:15 foks: changing email and password for "Lemon martini@global"
19:38 foks: changing password for JDiPierro@global
19:21 bblack: varnish-backend-restart on cp4026, evidence of artificial 503s from mbox lag behavior, probably related to the semi-abuse client doing odd 404 traffic to ulsfo that's triggering bugs in swift's rewrite.py ....
19:04 foks: changing password for Subinsebastien
17:50 mutante: analytics1052 - reported broken systemd state in Icinga - service mcelog was in state failed - systemctl start mcelog - (T212219 ?)
16:18 jynus: stop s6 mariadb instance on dbstore2001 T220572
15:34 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: thumbor1001 ms-fe1005 ms-be1013 scb1001 restbase1007
15:05 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: ores1001.yaml wtp1025.yaml rdb1006.yaml
14:18 marostegui: Set pc1004-1006 and pc2004-2006 as unracked on netbox - T209858 T210969
13:17 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: mw1311.yaml, mx2001 & dubnium
12:52 ema: cp4025: restart varnish-be due to mbox lag
12:50 jijiki: Restarting hhvm on mw1288
12:48 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on mc1019, maps1001 and logstash1007
12:45 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_upload,name=cp4021.ulsfo.wmnet,dc=ulsfo
12:44 ema: pool cp4021 w/ ATS backend T219967
12:20 ema: repool cp3030 after directors.frontend.vcl testing T219967
12:09 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: elastic1017, ganeti2001, analytics1042
11:26 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on lvs4007, dns2001 and multatuli
11:16 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on bast4002, aqs1004 and conf2001
10:28 moritzm: restarting Parsoid on wtp1025 for glibc update
10:19 ema: depool cp3030 for testing T219967
09:48 marostegui: Remove labtestservices2001 from tendril - T218022
09:11 moritzm: restarting AQS on aqs1004 for glibc update
08:42 elukey: restart pdfrender on scb1003 (alert flapping)
08:21 moritzm: uploaded php-xdebug 2.7.0+wmf1 for component/php72 (T221923)
07:20 moritzm: installing glibc updates on a number of analytics hosts
04:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 T221782 (duration: 00m 56s)
00:31 eileen: civicrm revision changed from 88736c7c11 to 34027da7df, config revision is 2119df9495

2019-04-25

23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:41 robh@cumin1001: START - Cookbook sre.hosts.decommission
23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:41 robh@cumin1001: START - Cookbook sre.hosts.decommission
23:39 eileen: civicrm revision changed from 519fe8028e to 88736c7c11, config revision is 2119df9495 - deployed patch to start recording payment_processor_id on recurring
22:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
22:56 robh@cumin1001: START - Cookbook sre.hosts.decommission
22:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
22:56 robh@cumin1001: START - Cookbook sre.hosts.decommission
21:31 andrewbogott: stopping nova services on labnet1001/1002
21:26 andrewbogott: revoking M5 grants as per https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/506428/4/modules/role/templates/mariadb/grants/production-m5.sql.erb and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/506345/3/modules/role/templates/mariadb/grants/production-m5.sql.erb
21:12 tgr: T221516 running mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'FoldDownPro' 'MichaelOBFDP'
19:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@7187e0c]: Bump HTML content version in docs, remove Parsoid stash fall-back and start logging all sections requests - T221432 T215956 T216636 (duration: 20m 04s)
19:25 mobrovac@deploy1001: Started deploy [restbase/deploy@7187e0c]: Bump HTML content version in docs, remove Parsoid stash fall-back and start logging all sections requests - T221432 T215956 T216636
19:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@7187e0c] (dev-cluster): Bump HTML content version in docs and remove Parsoid stash fall-back (duration: 03m 10s)
19:21 mobrovac@deploy1001: Started deploy [restbase/deploy@7187e0c] (dev-cluster): Bump HTML content version in docs and remove Parsoid stash fall-back
18:32 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Cleanup old EchoCrossWikiBetaFeature|gerrit:506316Cleanup old EchoCrossWikiBetaFeature (2/2) (duration: 00m 53s)
18:31 sbisson@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Cleanup old EchoCrossWikiBetaFeature|gerrit:506316Cleanup old EchoCrossWikiBetaFeature (1/2) (duration: 00m 54s)
18:24 sbisson@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/includes/EventLogging/SpecialHomepageLogger.php: SWAT: EventLogging: Make namespace int, use enum for impact module state|gerrit:506210EventLogging: Make namespace int, use enum for impact module state (duration: 00m 54s)
16:58 XioNoX: add analytics firewall filter term schema to cr1/2-eqiad - T221690
16:57 XioNoX: reorganize analytics firewall filters terms (description) on cr1/2-eqiad
16:34 moritzm: rolling restart of Cassandra on restbase1016-1018 to pick up Java security update
16:27 andrewbogott: repooled labweb1002
15:49 andrewbogott: depooling labweb1002 for easier debugging on labweb1001
15:09 thcipriani: gerrit back
15:07 thcipriani: gerrit restart to pickup new cache config changes
14:56 jynus: syncing facts for puppet compiler
14:51 jynus: update backup grants for dbprov1* on source dbs
12:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 T221782 (duration: 00m 53s)
12:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 T221782 (duration: 00m 53s)
11:55 Lucas_WMDE_: EU SWAT done
11:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Kartographer/: SWAT: Support data-mw="interface" also in staticframe (T221439)|gerrit:506363Support data-mw="interface" also in staticframe (T221439) (duration: 00m 54s)
11:46 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseQualityConstraints: SWAT: Remove beta feature for constraint suggestions (T220609)|gerrit:505764Remove beta feature for constraint suggestions (T220609) (duration: 00m 56s)
11:43 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseQualityConstraints: SWAT: Enable constraint suggestions for everyone (T220609)|gerrit:505763Enable constraint suggestions for everyone (T220609) (duration: 00m 59s)
11:10 Lucas_WMDE_: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=cswikisource --fix
11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create new namespace "Edice" for cswikisource (T221697)|gerrit:506134Create new namespace "Edice" for cswikisource (T221697) (duration: 00m 54s)
09:57 moritzm: installing multipath-tools update from stretch point release
09:49 moritzm: installing libcgroup security updates
08:30 moritzm: installing php5 security updates
08:08 jynus: update statistics grants for dbprov1* on tendril
07:56 moritzm: installing gnutls security updates
07:01 marostegui: Run compare.py for main tables between db2045 and db2080 T220170
06:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s8 codfw - T220170 (duration: 00m 54s)
06:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2080 after onsite maintenance to upgrade BIOS and firmware - T216240 (duration: 00m 54s)
06:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2079 to s8 codfw master T220170 (duration: 00m 52s)
05:47 marostegui: Start changing topology to make db2079 s8 codfw master - T220170
05:28 marostegui: Deploy schema change on db1103:3314 to fix revision table partitioning and indexing - T221782
05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 T221782 (duration: 00m 54s)
afk: updated fundraising CiviCRM from 468f85e524 to 519fe8028e
00:12 maxsem@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/AnalysisConfigBuilder.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CirrusSearch/+/506209/ (duration: 00m 54s)
00:00 eileen: process-control config revision is 0098b7a118 - adjust dedupe rule

2019-04-24

22:46 mutante: icinga-downtime -h ms-be2034 -r swift-rebalancing -d 86400
22:19 mutante: deploying varnish/trafficserver change to cover www.wikiba.se (not prod yet)
22:19 mutante: icinga-downtime -h ms-be2039 -r swift-rebalancing -d 86400
21:31 mutante: icinga-downtime -h ms-be2038 -r swift-rebalancing -d 86400
20:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956 (duration: 20m 39s)
20:21 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956
20:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value (duration: 04m 18s)
19:57 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value
18:47 mutante: pooled mw1297 as a new API server (T192457)
18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet,cluster=api_appserver
18:45 mutante: mw1297 - scap pull
18:17 mutante: sudo icinga-downtime -h ms-be2031 -r swift-rebalancing -d 86400
17:52 mutante: contint1001 - for logfile in $(find /var/log/zuul/ ! -name "*.gz"); do gzip $logfile; done to get more disk space (T207707)
17:33 mutante: contint1001 - apt-get clean for 1% more disk space
17:23 mutante: proton1001 - restarting proton service - low RAM caused facter/puppet fails (https://tickets.puppetlabs.com/browse/PUP-8048) freed memory and fixed puppet run (cc: T219456 T214975)
16:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/: Fix exceptions in Homepage logging (duration: 00m 56s)
15:52 herron: performing rolling restart of pybal on low-traffic eqiad/codfw lvs hosts
15:32 jijiki: Restarting php7.2-fpm on mw2* in codfw for 505383 and T211488
15:00 herron: switching kibana lvs to source hash scheduler
14:41 jijiki: restart pdfrender on scb1002
14:28 godog: being rollout rsyslog 8.1901.0-1 to jessie hosts - T219764
13:38 marostegui: Poweroff db2080 for onsite maintenance - T216240
13:01 jijiki: Restarting php7.2-fpm on mw13* for 505383 and T211488
12:36 jijiki: restarting pdfrender on scb1004
12:23 moritzm: rolling restart of Cassandra on restbase/eqiad to pick up Java security update
11:59 jijiki: Restarting php7.2-fpm on mw12* for 505383 and T211488
11:45 gehel: restarting relforge for jvm ugprade
11:33 jbond42: security update ghostscript on scb jessie servers
11:25 jijiki: Restarting php7.2-fpm on mw-canary for 505383 and T211488
11:23 ladsgroup@deploy1001: Finished deploy [ores/deploy@060fc37]: (no justification provided) (duration: 16m 18s)
11:07 ladsgroup@deploy1001: Started deploy [ores/deploy@060fc37]: (no justification provided)
10:28 akosiaris@deploy1001: scap-helm cxserver finished
10:28 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
10:28 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
10:23 jijiki: Restarting php-fpm on mw1238 for 505383 and T211488
09:58 moritzm: installing rsync security updates on jessie
08:44 moritzm: rolling restart of Cassandra on restbase/codfw to pick up Java security update
08:29 godog: swift eqiad-prod: start decom for ms-be101[45] - T220590
08:17 godog: bounce prometheus on bast5001 after migration and backfill
08:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
08:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
08:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
08:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
06:41 marostegui: Optimize tables on pc1010
06:38 elukey: restart pdfrender on scb1003
06:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2082 (duration: 00m 52s)
06:22 marostegui: Upgrade db2082
06:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2079, depool db2082 (duration: 00m 55s)
06:18 marostegui: Upgrade db2081
06:10 marostegui: Upgrade db2079
06:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2086, depool db2079 (duration: 00m 53s)
05:55 marostegui: Upgrade db2086
05:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2083 and depool db2086 (duration: 00m 52s)
05:38 marostegui: Upgrade db2080 and db2083
05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2080 and db2083 (duration: 00m 54s)
03:45 SMalyshev: repooled wdqs1003, it's good now
01:26 eileen: jobs restarted process-control config revision is ef6d4761e5
01:06 eileen: civicrm revision changed from 31982324b8 to 468f85e524, config revision is 13b9eefe7b
01:02 eileen: process-control config revision is 13b9eefe7b
00:29 mutante: mw1297 - rebooting for nutcracker issue
00:28 mutante: mw1297 - scap pull
00:08 mutante: DNS - add initiatives.wikimedia.org (and initiaves.m) for campaign wiki requested at T167375

2019-04-23

23:51 mutante: mw1297 - initial puppet run - will show up in Icinga in a little while but not pooled yet.. all the things are being installed right now
23:48 ejegg: updated payments-wiki (inactive cluster) from 7a312e371a to aa8dad50e7
23:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logger.js: SWAT GrowthExperiments: Fix validation errors due to state= (duration: 00m 53s)
23:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/includes/EventLogging/SpecialHomepageLogger.php: SWAT GrowthExperiments: Fix EventLogging errors (duration: 00m 53s)
23:25 mutante: generating mcrouter certs for appservers, added mw1297.eqiad.wmnet (T192457)
23:23 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: SWAT T219728 Add support for new Japanese era name 'Reiwa' (duration: 00m 52s)
23:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: SWAT T221668 VisualEditor: Restore external paste sanitization of DOM elements (duration: 00m 55s)
23:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T221521 Add autoreviewer to wgRestrictionLevels on ptwikinews (duration: 00m 54s)
22:35 XioNoX: push firewall rule to pfw3-eqiad - T221475
22:33 XioNoX: push firewall rule to pfw3-codfw - T221475
21:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ORES/includes/Specials/SpecialORESModels.php: T221696 (duration: 00m 55s)
21:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
21:43 robh@cumin1001: START - Cookbook sre.hosts.decommission
21:33 thcipriani: restarting gerrit to pickup config changes
20:55 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407) (duration: 13m 03s)
20:43 andrewbogott: updating designate pools on cloudservices1003 and 1004 using eqiad1_pool_config.yml template from the puppet repo
20:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints (T221407)
20:26 urandom: dropping disused restbase keyspaces -- T221530
19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
19:57 robh@cumin1001: START - Cookbook sre.hosts.decommission
19:32 mutante: webperf* - running puppet to git pull docroot
19:11 thcipriani: gerrit restart
18:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/MassMessage: c640195 (duration: 00m 56s)
18:09 SMalyshev: depool wdqs1003 to let it catch up
18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
17:43 jijiki: Restarting memcached on mc1029 - T208844
17:26 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@78985fb]: Update mobileapps to 6d3a422 (T201382 T217837) (duration: 04m 06s)
17:22 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@78985fb]: Update mobileapps to 6d3a422 (T201382 T217837)
16:55 jijiki: Depool thumbor2004 for 505759 and pool back - T187765
16:54 gehel: restart wdqs for jvm ugprade
16:49 jijiki: Depool thumbor1004 for 505759 and pool back - T187765
16:43 jijiki: Depool thumbor2003 for 505759 and pool back - T187765
16:40 jijiki: Depool thumbor1003 for 505759 and pool back - T187765
16:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable api-request logging to eventgate-analytics for all wikis - T214080 (duration: 00m 53s)
16:33 ottomata: proceeding to enable api-request eventgate-analytics logging for all wikis
16:31 herron: added jfishback to wmf ldap group T221660
16:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
16:12 robh@cumin1001: START - Cookbook sre.hosts.decommission
16:07 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: set wglocaltimezone for sqwikiquote T221627 (duration: 00m 54s)
15:28 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts functionality on Commons (duration: 00m 54s)
14:27 jijiki: Depool thumbor2002 for 505759 and pool back - T187765
14:21 jijiki: Depool thumbor1002 for 505759 and pool back - T187765
14:16 jijiki: Depool thumbor2001 for 505759 and pool back - T187765
14:14 jijiki: Depool thumbor1001 for 505759 and pool back - T187765
14:07 jijiki: Disable puppet on thumbor* to merge 505759
13:54 ema: depool cp4021 and reimage as upload_ats T219967
13:17 jijiki: Restart nagios-nrpe-server on prometheus1003
12:15 godog: swift eqiad-prod: fully decom ms-be1013 - T220590
11:59 moritzm: installing clamav security updates on fermium
11:56 kart_: EU-Midday SWAT is done.
11:54 kart_: 'SWAT: gerrit:505059 deployment-prep: Use new poolcounter instance, gerrit:505060 deployment-prep: Use new ms-fe host.'
11:53 kartik@deploy1001: Synchronized wmf-config/LabsServices.php: SWAT: gerrit:505643 (duration: 00m 53s)
11:45 jijiki: Stop xenon-log, excimer-log and apache on mwlog*
11:43 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:505643 Turn off logging for CitationUsage and CitationUsagePageLoad (T213969) (duration: 00m 53s)
11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix undefined variable from last SWAT (duration: 00m 54s)
11:27 moritzm: installing clamav security updates on mendelevium (OTRS host)
11:18 kartik@deploy1001: Synchronized wmf-config: SWAT: gerrit:505220 Use higher unmodified MT threshold for Indonesian Wikipedia (T221353) (duration: 00m 57s)
10:44 moritzm: uploaded ferm 2.4-1+wmf2+deb10u1 to buster-wikimedia (T153468)
09:23 godog: upgrade prometheus to v2 on bast5001, previous metrics will not be available until migration and backfill are complete - T187987
09:19 elukey: dumping Kafka consumer offsets' history on logstash1012 for T221202
09:00 fdans@deploy1001: Finished deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87 (duration: 14m 09s)
08:54 fsero: synchronizing old docker_registry content into new one - T221101
08:46 fdans@deploy1001: Started deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87
08:14 moritzm: removing debmonitor entries for labvirt* hosts
08:06 moritzm: installing wget security updates on jessie
07:27 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Set wgPriorityHintsRatio (duration: 00m 52s)
06:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T136427 (duration: 00m 57s)
05:52 elukey: powercycle wtp2019 - no ssh, mgmt console stuck
05:16 marostegui: Deploy schema change on x1 master - lag will appear on x1 slaves - T136427
05:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T136427 (duration: 00m 54s)

2019-04-22

18:46 gilles@deploy1001: Synchronized php-1.34.0-wmf.1/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 53s)
18:22 XioNoX: Add k8s BGP neighbors on cr1/2-eqiad - T220822
18:15 XioNoX: Add k8s BGP neighbors on cr1/2-codfw - T220822
08:47 marostegui: finished maintenance window on dbstore1003 and dbstore1005
08:37 marostegui: Upgrade dbstore1005
07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 (duration: 00m 54s)
07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
06:40 marostegui: Upgrade dbstore1003
06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
05:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 (duration: 00m 54s)
05:26 marostegui: Stop MySQL and reboot db1099 to see if memory errors clear up T221502
05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 T221502 (duration: 01m 15s)

2019-04-21

05:19 marostegui: Clean up some space on webperf2001 - T221508

2019-04-20

08:12 _joe_: depooling mw1261,mw1312 wikidata (at least) not working
07:58 jijiki: Pool thumbor1001
07:52 jijiki: depool thumbor1001, switch back to nginx - T187765
07:50 _joe_: restarting php-fpm on mw1312, mw1261 to test the new settings over the weekend

2019-04-19

23:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2245.codfw.wmnet,cluster=api_appserver
23:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet,cluster=api_appserver
23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2150.codfw.wmnet,service=nginx,cluster=jobrunner
22:55 mutante: mw2244,mw2245,mw2150 - scap pull
22:53 mutante: mw2244,mw2245,mw2150 - rebooting for known nutcracker issue after first install
22:47 mutante: furud - remounted /mnt/hdfs for T221483
21:42 mutante: mw2150,mw2244,mw2245: initial puppet run, added to mw roles
19:38 otto@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 52s)
19:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 53s)
19:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No-op - prep for enabling cirrussearch-request logging in beta (duration: 00m 53s)
16:20 bblack: wikipedia.org CNAME TTLs increase to 4H - https://gerrit.wikimedia.org/r/c/operations/dns/+/505249 - T208263
16:18 ejegg: rolled back payments-wiki from eb3d0f35de to aa8dad50e7
15:55 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/logging/LogFormatter.php: T220767 (duration: 00m 53s)
15:54 bblack: restart pybal on lvs1016 (eqiad primary) for eventscehmas service add
15:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/Linker.php: T220767 (duration: 00m 55s)
15:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=schema.*
15:42 bblack: restart pybal on lvs2003 (codfw primary) for eventscehmas service add
15:39 bblack: restart pybal on lvs2006 (codfw backup) for eventscehmas service add
15:32 bblack: restarting pybal on lvs1006 (eqiad backup) for eventschema service add
14:59 volans: uploaded spicerack_0.0.23-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
12:59 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 T216598 Enable Priority Hints and Element Timing on eswiki (duration: 00m 56s)
08:45 akosiaris: restart gerrit to pick up https://gerrit.wikimedia.org/r/504981
06:39 elukey: roll restart of druid daemons on druid100[1-3] to pick up new jvm settings

2019-04-18

23:16 mobrovac: evening SWAT completed
23:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes: (no justification provided) (duration: 00m 54s)
23:10 ejegg: updated payments-wiki from aa8dad50e7 to eb3d0f35de
23:07 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wikimania years namespaces to wgNamespacesWithSubpages - T220950 (duration: 00m 53s)
23:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
22:40 ejegg: updated payments-wiki from aa8dad50e7 to 2f7cd8f195
22:14 mutante: LDAP - adding 'ldoan' and 'schang' to 'wmf' (T221118)
22:01 XioNoX: remove asw2-a-eqiad license keys for troubleshoting
21:58 ejegg: rolled back payments-wiki to aa8dad50e7
21:55 mutante: LDAP - adding rosalie-wmde to group 'wmde' (T220691)
21:52 ejegg: updated payments-wiki from aa8dad50e7 to 2f7cd8f195
21:28 mutante: puppetmaster1001 - mcrouter_generate_certs --generate
21:18 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt) (duration: 00m 10s)
21:18 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt)
21:17 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001) (duration: 00m 11s)
21:17 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001)
21:14 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
21:14 robh@cumin1001: START - Cookbook sre.hosts.decommission
20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.1 refs T220726
20:52 cdanis: root@icinga1001.wikimedia.org /var/lib/icinga # for DOWNTIME in $(fgrep -B12 'comment=mobrovac: temp stop JQ for T221368 - cdanis@cumin1001' retention.dat | grep -A13 servicedowntime | grep downtime_id | cut -d= -f2); do printf "[%lu] DEL_SVC_DOWNTIME;%u\n" $(date +%s) $DOWNTIME ; done > rw/icinga.cmd
20:40 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/utils/MessageUpdateJob.php: Translate jobs: Remove problematic Job::$params assignments, dir 2/2 - T221368 (duration: 01m 00s)
20:39 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/tag: Translate jobs: Remove problematic Job::$params assignments, dir 1/2 - T221368 (duration: 01m 01s)
20:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'enable-puppet "mobrovac: temp stop JQ for T221368"'
20:31 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors (duration: 00m 51s)
20:30 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors
19:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cookbook sre.hosts.downtime -r "mobrovac: temp stop JQ for T221368" 'scb*'
19:36 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
19:36 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
19:29 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'disable-puppet "mobrovac: temp stop JQ for T221368" && systemctl stop cpjobqueue'
19:17 mobrovac@deploy1001: Started restart [cpjobqueue/deploy@922cbc0]: Bounce CP4JQ, lots of transport broken failures - T221368
19:11 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/EventFactory.php: Remove the use of page titles in JobExecutor, file 2/2 - T221368 (duration: 00m 59s)
19:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Remove the use of page titles in JobExecutor, file 1/2 - T221368 (duration: 01m 01s)
18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
18:41 mutante: mw2150 - reimaging, not in confctl
18:02 dzahn@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2151.codfw.wmnet,cluster=jobrunner,service=nginx
17:49 mutante: mw2151 - scap pull
17:46 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Default to a dummy title for invalid titles - T221368 (duration: 01m 01s)
17:20 twentyafterfour@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/AbuseFilter/includes/: sync https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/504863 (duration: 01m 00s)
16:20 bblack: Experimental DNS-level changes deploying for wikipedia.org domain - if wikipedia.org DNS problems appear, revert https://gerrit.wikimedia.org/r/c/operations/dns/+/504588 - T208263
16:17 XioNoX: remove peering to 63199 in eqsin (down for 1 month, no reply to emails)
16:13 XioNoX: rollback dhcp option 82 test from asw2-b-eqiad
14:55 fsero: synchronizing docker_registry_codfw swift container from docker_registry
14:40 XioNoX: push firewall change to pfw3-eqiad - T221278
13:30 jbond42: rolling updates of ruby2.1 on jessie
13:08 elukey: roll restart of cassandra on aqs* to pick up new openjdk upgrades
13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
12:58 reedy@deploy1001: rebuilt and synchronized wikiversions files: group1 back to .25
12:36 anomie: Ran `php7adm /opcache-free` on mw1274 to test a theory related to T221347. The log entries related to that task stopped immediately.
12:30 gehel: restarting blazegraph + updater on wdqs* for jvm upgrade
12:22 moritzm: installing Java security updates on restbase-dev hosts (along with Cassandra restarts)
12:21 gehel: restarting blazegraph + updater on wdqs1009 / wdqs1010 for jvm upgrade
12:19 moritzm: installing Java security updates on WDQS autodeploy/test hosts
10:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
10:35 moritzm: installing rails security updates on jessie hosts
10:21 moritzm: installing jasper updates on jessie hosts
09:44 akosiaris: update grafana service/ dashboard to have user, system, throttled CPU metrics under the CPU saturation row
09:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Run CPU benchmark for all samples on eswiki/ruwiki (duration: 01m 06s)
09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
09:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
08:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
08:53 elukey: reboot kafka10[12-23] (old Analytics cluster) for kernel + openjdk upgrades
08:23 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
08:14 moritzm: installing libssh2 security updates on jessie
08:01 moritzm: restarting mw1261-mw1265 to pick up new libssh2
07:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
07:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
07:53 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
07:28 moritzm: installing libssh2 security updates
07:19 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
06:58 moritzm: restarting icinga on icinga1001 (T196336)
06:37 moritzm: rolling reboots of Swift backends in eqiad for combined kernel/glibc/OpenSSL update

2019-04-17

22:46 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/includes/: I3a50508178159 (duration: 01m 21s)
22:40 XioNoX: push firewall change to pfw3-codfw - T221278
22:28 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Score/: Id58156cfca805 / T219342 (duration: 01m 03s)
21:30 XioNoX: enable option-82 on asw2-b:cloud-hosts1-b-eqiad vlan
21:10 thcipriani: gerrit back
21:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming) (duration: 00m 10s)
21:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming)
21:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only) (duration: 00m 11s)
21:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only)
19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.1 refs T220726 (duration: 01m 49s)
19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.1 refs T220726
18:04 thcipriani: gerrit back
18:01 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/504611/
17:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Wikidata federation on Commons again T214075 (duration: 01m 00s)
17:20 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventGate api-request logging on group1 wikis (duration: 01m 00s)
17:18 mutante: LDAP - added 'brennen' to group 'gerritadmin' (T218858)
17:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/OATHAuth/: UBN T221257 train un-blocker (duration: 01m 02s)
17:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Echo/includes/formatters/: Notifications: Revert 7121b9c4 per I8f9a6a19ba (duration: 01m 01s)
16:49 tzatziki: deleting three files for legal compliance
16:47 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseMediaInfo/: SDC: Various fixes T218922 T221071 T221110 T221123 (duration: 01m 02s)
16:41 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/autoload.php: Update to point to new maintenance scripts (duration: 01m 00s)
16:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUpperCharTable.php: Maintenance script for _joe_ (duration: 00m 59s)
16:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUcfirstOverrides.php: Maintenance script for _joe_ (duration: 01m 00s)
16:21 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1 try two (duration: 01m 00s)
16:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/includes/DefaultSettings.php: T219279 Ability to set wgOverrideUcfirstCharacters part 1b (duration: 01m 03s)
16:13 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
16:11 XioNoX: set fasw-c-eqiad:ge-[0-1]/0/17 in admin vlan - T221232
16:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T220434 Deploy Partial blocks to Chinese Wikipedia (duration: 01m 02s)
14:37 ariel@deploy1001: Finished deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter (duration: 00m 04s)
14:36 ariel@deploy1001: Started deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter
14:35 otto@deploy1001: scap-helm eventgate-analytics finished
14:35 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
14:35 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
14:34 otto@deploy1001: scap-helm eventgate-analytics finished
14:34 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
14:34 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
14:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
14:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
13:56 otto@deploy1001: scap-helm eventgate-analytics finished
13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
13:56 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
13:52 elukey: upgrading hadoop cdh distrubition to 5.16.1 on all the Hadoop-related nodes - T218343
13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime
13:48 godog: reimage prometheus2004 - T187987
12:57 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1004.eqiad.wmnet
12:44 godog: bounce prometheus instances on prometheus[12]003 after https://gerrit.wikimedia.org/r/c/operations/puppet/+/499742
12:33 moritzm: running some ferm tests on graphite2002
12:10 godog: briefly stop all prometheus on prometheus1003 to finish metrics rsync - T187987
11:39 Lucas_WMDE: EU SWAT done
11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable suggestion constraint status on testwikidata (T221108, T204439)|gerrit:504380Enable suggestion constraint status on testwikidata (T221108, T204439) (duration: 01m 01s)
10:58 volans@deploy1001: Finished deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9 (duration: 01m 00s)
10:57 volans@deploy1001: Started deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9
10:40 moritzm: installing Java security updates on kafka/analytics cluster
09:17 godog: swift eqiad-prod continue ms-be1013 decom - T220590
09:09 elukey: restart eventlogging on eventlog1002 due to errors in processors and consumer lag accumulated after the last Kafka Jumbo roll restart
08:47 godog: reimage prometheus1004 - T187987
08:38 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 fully (duration: 01m 00s)
08:29 moritzm: installing ghostscript security updates
07:51 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming: T216597 Event timing support (duration: 01m 01s)
07:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216597 Enable Event Timing origin trial on ruwiki and eswiki (duration: 01m 04s)
07:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 with low load (duration: 01m 18s)
07:07 moritzm: rolling reboots of Swift backends in codfw for combined kernel/glibc/OpenSSL update

2019-04-16

23:42 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Return CirrusSearch to standard execution against eqiad cluster (duration: 01m 00s)
23:37 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/CirrusSearch/includes/: Fix fatals on malformed search queries against overridden clusters (duration: 01m 06s)
22:42 thcipriani: gerrit back
22:39 thcipriani: restarting gerrit for configuration update https://gerrit.wikimedia.org/r/504448
22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Give bureaucrats the usermerge right (duration: 00m 59s)
22:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/NewUserMessage/includes/NewUserMessage.php: Disable onLocalUserCreated for known bot accounts (duration: 01m 01s)
22:17 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960 (duration: 20m 02s)
22:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T165795 Enable the UserMerge extension for clean-up on wikitech (duration: 01m 00s)
21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - T215960
21:56 eileen: civicrm revision changed from 1bc1570967 to 31982324b8, config revision is e5a7908330
21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only (duration: 05m 24s)
21:50 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only
21:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.1 refs T220726
21:24 andrewbogott: deleting 'eqiad' endpoint in keystone
21:21 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.1 refs T220726 (duration: 36m 47s)
21:09 XioNoX: add wpao to wmf/ops in LDAP - T221142
21:02 cdanis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
20:59 otto@deploy1001: scap-helm eventgate-analytics finished
20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:55 andrewbogott: removing keystone endpoints for the 'eqiad' region
20:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.1 refs T220726
20:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960 (duration: 19m 52s)
20:43 otto@deploy1001: scap-helm eventgate-analytics finished
20:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:23 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - T215960
20:19 ariel@deploy1001: Finished deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only (duration: 00m 04s)
20:19 ariel@deploy1001: Started deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only
20:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket (duration: 05m 24s)
20:05 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket
20:04 otto@deploy1001: scap-helm eventgate-analytics finished
20:04 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:59 otto@deploy1001: scap-helm eventgate-analytics finished
19:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:56 gehel: restarting cassandra on maps* for config change - T221055
19:49 otto@deploy1001: scap-helm eventgate-analytics finished
19:49 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:49 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:48 otto@deploy1001: scap-helm eventgate-analytics finished
19:48 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:48 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:11 twentyafterfour: twentyafterfour@deploy1001:/srv/mediawiki-staging$ scap prep 1.34.0-wmf.1
19:07 bblack: restarting varnish backend on cp1083
19:04 bblack: restarting varnish backend on cp1085
18:55 cdanis: cdanis@cp1085.eqiad.wmnet ~ % sudo -i depool
18:53 otto@deploy1001: scap-helm eventgate-analytics finished
18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.profiling_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:46 twentyafterfour: branching 1.34.0-wmf.1 refs T220726
18:25 otto@deploy1001: scap-helm eventgate-analytics finished
18:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:14 cmjohnson1: powering off mw1280 to replace DIMM
18:08 mutante: restbase2007, restbase2008 - re-enabled puppet which was disabled with reason 'decom'ed' but actually needed to run to decom after they had moved to role::spare::system (T208087)
17:56 reedy@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikimediaIncubator/: T220623 (duration: 00m 53s)
17:47 herron: beginning rolling ELK upgrade to 5.6.15
17:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op preparatory change (T221107)|gerrit:504386no-op preparatory change (T221107) (duration: 00m 52s)
17:36 arturo: toolforge k8s reallocation (from nova-network to neutron) is causing troubles with IRC bots, expect missing entries in the SAL
17:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
17:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
17:27 andrewbogott: restarting rabbitmq on cloudcontrol1003
17:26 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1280.eqiad.wmnet,cluster=api_appserver
17:25 arturo: rebooted cloudnet1003
17:24 gehel: force initialization of unassigned shards on elasticsearch eqiad
17:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op preparatory change (T221108)|gerrit:504374no-op preparatory change (T221108) (duration: 00m 52s)
16:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintEntities.php --wiki=testwikidatawiki --config-format=wgConf | tee T221108.php
16:53 mutante: bast2001 - shutdown -h now - decom'ed (T219492)
16:48 mutante: puppet node clean bast2001.wikimedia.org ; puppet node deactivate bast2001.wikimedia.org ; it showed up in Icinga again despite running decom cookbook (T219492)
16:47 otto@deploy1001: scap-helm eventgate-analytics finished
16:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:44 otto@deploy1001: scap-helm eventgate-analytics finished
16:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:43 jynus: upgrading and shutting down db1078 T219115
16:41 jynus: disabling notifications on db1078 T219115
16:37 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 (duration: 00m 52s)
15:36 arturo: reimaging cloudnet2002-dev because role name change
15:21 otto@deploy1001: scap-helm eventgate-analytics finished
15:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:20 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.28 -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:19 otto@deploy1001: scap-helm eventgate-analytics finished
15:19 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:19 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:18 otto@deploy1001: scap-helm eventgate-analytics finished
15:18 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:18 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:16 elukey: roll restart kafka on kafka-jumbo100[1-6] to pick up openjdk upgrades
14:58 gehel: manual data transfer from wdqs1008 to wdqs1009 - T220830
14:56 ema: swift-fe-eqiad: nginx reload for new TLS certificate T204245
14:53 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
14:52 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
14:51 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
14:45 ema: test https://gerrit.wikimedia.org/r/504340 on ms-fe1005 T204245
14:30 ema: swift-fe-codfw: nginx reload for new TLS certificate T204245
14:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
14:21 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
14:20 elukey: roll restart of all the druid daemons on druid100[1-6] to pick up new openjdk updates
14:17 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet
14:07 jijiki: Pooling thumbor1001
14:04 ema: test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504331/ on ms-fe2005 T204245
14:01 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2005.codfw.wmnet
14:01 jijiki: Depooling thumbor1001
13:58 jijiki: Disable puppet on thumbor1001 for ~24h to serve traffic via haproxy - T187765
13:54 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
13:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:52 jijiki: Enable puppet on thumbor*
13:42 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
13:41 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:39 gehel: restetting cookbooks repo on cumin1001 (local changes)
13:34 jijiki: Disabling puppet on thumbor* to merge 504284
13:13 ema: cp-ats: upgrade fifo-log-demux to 0.2 and restart services
13:10 ema: fifo-log-demux 0.2 uploaded to stretch-wikimedia
13:03 arturo: T220095 renaming/reimaging labtestcontrol2003 as cloudcontrol2003-dev
12:58 moritzm: installing ghostscript update on thumbor1001
12:54 gehel: cleanup redundant prometheus-elasticsearch units on elasticsearch servers
12:52 godog: swift eqiad-prod continue ms-be1013 decom - T220590
12:17 moritzm: installing OpenSSL 1.0.2 updates on cp* Varnish hosts
12:07 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
11:18 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgWikibaseMusicalNotationLineWidthInches to config (T218191) (duration: 00m 52s)
11:10 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "WikibaseClient: Conditionally enable mapframe support" (T218051) (duration: 00m 51s)
11:08 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable signatures in 2019: NS (ID 128) for wikimaniawiki (T221062) (duration: 00m 52s)
10:49 gilles: T221065 eswiki purge finished
10:45 moritzm: installing libjs-bootstrap updates from Stretch point release
10:21 gilles: T221065 mwscript purgeList.php eswiki --all --verbose on mwmaint1002
10:21 moritzm: installing xapian-core update from stretch point release
10:18 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T221065 Set up origin trials on Spanish Wikipedia mobile site (duration: 00m 52s)
09:59 jijiki: Enabling puppet again on on dbproxy* and thumbor*
09:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Reduce db1078 load (duration: 00m 53s)
09:37 jijiki: Disabling puppet on dbproxy* and thumbor* to merge 502972
09:26 fsero: [late logging] swift container-to-container synchronization enabled between docker_registry_eqiad and docker_registry_codfw swift containers at 08:15:00 UTC
09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
09:05 ema: cp1076: repool varnish-fe pointing to Varnish T213263
08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
08:57 ema: cp1076: depool varnish-fe in preparation of traffic switchback to Varnish T213263
08:40 hoo: Updated the Wikidata property suggester with data from the 2019-04-08 JSON dump and applied the T132839 workarounds
08:33 moritzm: rebooting ms-be1020 for combined kernel/glibc/OpenSSL update
08:01 moritzm: rebooting Swift frontends in codfw for combined kernel/glibc/OpenSSL security updates
07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
07:50 ema: cp2002: repool varnish-fe pointing to Varnish T213263
07:47 moritzm: rebooting Swift frontends in eqiad combined kernel/glibc/OpenSSL security updates
07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
07:45 ema: cp2002: depool varnish-fe in preparation of traffic switchback to Varnish T213263
07:36 marostegui: Upgrade db2093
07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
07:32 ema: cp2005: repool varnish-fe pointing to Varnish T213263
07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
07:25 ema: cp2005: depool varnish-fe in preparation of traffic switchback to Varnish T213263
07:11 moritzm: upgrading Java on Hadoop/Kafka/Jumbo/Druid clusters
05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 31s)
01:46 aaron@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/Parser.php: 73529ae6c5ffb6 (duration: 00m 53s)
00:34 onimisionipe: pooled maps2003 - postgres init complete!
00:33 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I7589aa153 (duration: 00m 52s)
00:33 urandom: creating new restbase schema -- T221031

2019-04-15

23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
23:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
23:20 cdanis: cdanis@icinga1001.wikimedia.org ~ % sudo systemctl restart tcpircbot-logmsgbot.service
23:17 bd808: scap: SWAT: wikitech: Use cn:caseExactMatch: as account search filter|gerrit:497423wikitech: Use cn:caseExactMatch: as account search filter (T165795)
20:59 thcipriani: gerrit back
20:57 gehel: shutting down blazegraph and updater on wdqs1010, waiting for data reimport
20:55 thcipriani: gerrit restart to pick up gc log changes incoming
20:37 arlolra: Updated Parsoid to 83c17fc9
20:23 Amir1: the ores deployment is over
19:49 XioNoX: export BGP communities (prepend x3 outside asia) to AS3491 in eqsin
19:46 mutante: bromine/vega: rm /etc/rsyncd.conf ; systemctl stop rsync (clean up old rsync config gerrit:503961)
19:45 XioNoX: update (and add) AS3491 BGP communities in eqsin
18:58 XioNoX: update mr1-* security policies - T219384
18:41 onimisionipe: depooling maps2003 for psotgres init
18:40 onimisionipe: pooling map2002 - postgres init complete
18:39 Amir1: Morning SWAT is done
18:35 shdubsh: logstash1009: disabling puppet and testing logstash config
18:09 mutante: LDAP - adding legoktm and qchris to gerritadmin group (T219086)
17:45 anomie: Backporting fix for T220991
17:41 akosiaris: force puppet agent run on maps* after moving config-vars.yaml file for kartotherian, tilerator, tileratorui T220982
17:33 mutante: LDAP - re-adding 'pbj' to 'nda' group, extended access until May 6th, transparency report contractor
17:23 mutante: wikibugs - qdel'ed jobs and restarted another time, make it rejoin
17:17 onimisionipe: wdqs deployment is complete! for some reasons I don't know scap did not logging here
17:17 herron: restarted logstash on logstash1007
17:15 mutante: restarted wikibugs because it stopped talking
16:08 onimisionipe: pooling maps2001 - postgres reinit is complete
15:55 Reedy: changed /srv/mediawiki/docroot/wikimedia.org to a symlink to standard-docroot
15:53 XioNoX: add cloud-in4 firewall filter to codfw - T211921
15:31 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9* on all elastic nodes
15:30 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9200 on all elastic nodes
15:28 _joe_: systemctl reset-failed on ms-be1027, debmonitor session
15:24 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
14:55 gehel: deploying tilerator to maps1001 to validate deployment is working - T220982
14:55 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)
14:43 _joe_: running apply-config-tilerator on maps1001
14:40 _joe_: running apply-config-karthoterian on maps1001
14:22 cdanis: T220982 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
14:21 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' "disable-puppet 'bad permissions - T220982 - cdanis'"
14:18 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
14:18 gehel: reseting permissions on maps server fir /srv/deployment/kartotherian and /srv/deplyoment/tilerator
14:04 moritzm: rebooting ms-fe1005 for combined kernel/glibc/OpenSSL update
13:57 jbond42: upgrading puppet 4 -> 5 and facter 2 -> 3 on mediawiki::canary_appserver, mediawiki::appserver::canary_api and cache::cache roles
13:56 gehel: restart tilerator / kartotherian on all maps servers for openssl update
13:55 godog: start ms-be1013 decom - T220590
13:42 godog: reboot ms-be1013
13:09 moritzm: installing wget security updates on trusty hosts
12:59 moritzm: restarting archiva on archiva1001 for OpenJDK security update
12:50 moritzm: restarting Apache on matomo1001 to pick up OpenSSL update
12:14 moritzm: rolling restart of HHVM/Apache on deployment servers to pick up OpenSSL update
11:59 fsero: pointing boron docker builds to the new registry temporarily (docker builds on boron might fail)
11:35 Amir1: EU swat is done
11:26 moritzm: rolling restart of HHVM/Apache on labweb* to pick up OpenSSL update
09:58 moritzm: installing openssl1.0 security updates
09:18 gehel: unbanning elastic1029 from cluster
08:58 moritzm: updating mediawiki servers in eqiad to version 1.8.1 of the PHP extension for wikidiff
08:29 onimisionipe: increase wal_keep_segments on codfw maps master
08:19 moritzm: updating mediawiki servers in codfw to version 1.8.1 of the PHP extension for wikidiff
07:50 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/initSiteStats.php --wiki=hywwiki --active (T220936)
05:31 marostegui: Upgrade db1100
05:07 marostegui: powercycle mw1280 (crashed)

2019-04-14

06:10 ebernhardson: unban elastic1027 from eqiad-psi
05:36 ebernhardson: unbanning elastic1027 after about half the shards left and load dropped
05:31 ebernhardson: ban elastic1027 from elasticsearch-psi in eqiad
04:59 ebernhardson: restart elasticsearch_6@production-searhc-psi-eqiad on elastic1027 due to 100% cpu for last 30+ minutes

2019-04-13

18:46 godog: 3h downtime for cloudvirt1015
15:58 ebernhardson: restart elasticsearch on elastic1027
15:34 shdubsh: restart recommendation_api on scb1001
15:33 shdubsh: restart recommendation_api on scb2001
10:46 onimisionipe: depooling maps2001 for postgres init
08:05 gehel: repooling wdqs1008 - data transfer completed - T220830
00:32 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/: Idc19cc29764a / T220854 - hot fix (duration: 05m 37s)

2019-04-12

21:16 Krinkle: scap was unable to sync to 1 apache (connect to host cloudweb2001-dev.wikimedia.org port 22: Connection timed out)
21:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/ImageMap/includes/ImageMap.php: I0ee84f059da / T217087 (duration: 05m 12s)
19:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
19:27 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
19:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
18:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
17:17 onimisionipe: depooling maps2002 for postgres init
17:16 onimisionipe: repooling maps2001 - postgres init is complete
16:14 elukey: install ifstat on all the mc1* hosts for network bandwidth investigation
15:56 gehel: starting data trasnfer from wdqs1008 to wdqs1009 - T220830
15:32 thcipriani: gerrit back
15:29 thcipriani: gerrit restart incoming
14:29 onimisionipe: depool maps2001 for postgres initialization
13:24 akosiaris: re-enable puppet across the fleet. Patch merged, recovery storm coming
13:18 akosiaris: disable puppet across the fleet to avoid incoming puppet alert storm
12:57 marostegui: Purge old rows and optimize tables on spare host pc1010 T210725
12:53 urandom: decommissioning cassandra-c, restbase2008 -- T208087
12:49 gehel: rolling restart of cassandra on maps* for jvm upgrade
12:22 arturo: T220095 disable icinga checks for labtestcontrol2003
12:16 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Reduce cawiki survey sampling rate (duration: 05m 11s)
11:56 moritzm: upgrading app server canaries to version 1.8.1 of the PHP wikidiff extension (HHVM already deployed) T203069
11:46 moritzm: upgrading acmechief hosts to latest buster state
11:44 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T220807 Oversample navtiming on cawiki and commonswiki (duration: 05m 14s)
11:37 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw complete (T217806)
11:19 moritzm: installed Java security updates on relforge* hosts
11:10 moritzm: installing Java security updates on remaining maps hosts
10:32 arturo: T219626 reimaging cloudcontrol2001-dev
10:13 elukey: matomo updated to 3.9.1 on matomo1001 + deb upload to wikimedia-stretch - T218037
09:53 moritzm: updated mwdebug1001 to php-wikidiff 1.8.1
09:37 moritzm: updated mwdebug1002 to php-wikidiff 1.8.1
09:30 volans: reset mgmt card on labtestcontrol2003 - T220783
09:07 moritzm: added the wikimedia repository key to the stretch build chroot on boron, fixes builds using the PHP72/SPICERACK hooks
09:05 arturo: T218021 disable icinga checks for labtestcontrol2001
08:35 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming/modules/ext.navigationTiming.js: T220788 Fix veaction === null case (duration: 00m 54s)
08:02 moritzm: updated ssacli in thirdparty/hwraid component for stretch to 3.30-13.0 T220787
07:12 marostegui: Manually install ssacli on db2[097|098|099|100|101|102] T220787 T220572
07:04 moritzm: synced ssacli to thirdparty/hwraid components for jessie/stretch T220787
01:00 mutante: puppet cert clean, puppet node clean, puppet node deactivate on cloudnet2001-dev.codfw.wmnet (T218025)
00:25 tstarling@deploy1001: Synchronized wmf-config/profiler.php: increase excimer max depth (duration: 00m 53s)
00:02 ejegg: updated fundraising CiviCRM from 24b968b1f9 to 1bc1570967

2019-04-11

23:57 urandom: decommissioning cassandra-b, restbase2008 -- T208087
22:15 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikibaseMediaInfo/resources/: Hot-deploy fix for WBMI variable cache miss T220665 (duration: 00m 55s)
20:46 mutante: deleting job of wikibugs-phab-listener in an attempt to restart it
19:47 cdanis: cdanis@mwdebug1001.eqiad.wmnet ~ % sudo systemctl stop hhvm && sudo rm /var/cache/hhvm/fcgi.hhbc.sq3 && sudo systemctl start hhvm
19:39 twentyafterfour: mediawiki error rate seems to be back to normal after deploying 1.33.0-wmf.25, the new branch looks stable refs T206679
18:55 mutante: disabling puppet on hosts using class 'confd' to safely deploy gerrit:456317
18:55 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw (T217806)
18:01 onimisionipe: increase replication factor on maps codfw cluster
17:45 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment (duration: 00m 22s)
17:45 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment
17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322) (duration: 01m 33s)
17:21 mbsantos@deploy1001: Started deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to 8988283 (T213362, T216191, T212322)
16:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:48 otto@deploy1001: scap-helm eventgate-analytics finished
15:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:42 otto@deploy1001: scap-helm eventgate-analytics finished
15:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:36 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code (duration: 00m 22s)
15:35 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code
15:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op comment update|gerrit:503008no-op comment update (duration: 01m 00s)
15:06 cdanis@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
14:53 paravoid: rebooting labnet1002
14:49 vgutierrez: uploaded acme-chief 0.16 to apt.wikimedia.org (buster) - T207461
14:47 urandom: decommissioning cassandra-a, restbase2008 -- T208087
14:46 akosiaris: cxserver Add gargage collections graphs under saturation. T205911
14:18 Amir1: Deployment of Url shortener is done now
14:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy UrlShortener to metawiki, let's get the party started (T108557, T44085) (duration: 01m 00s)
12:49 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=maps,name=maps2001.codfw.wmnet
12:20 kartik@deploy1001: scap-helm cxserver finished
12:19 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
12:19 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
12:16 kartik@deploy1001: scap-helm cxserver finished
12:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
12:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
12:12 kartik@deploy1001: scap-helm cxserver finished
12:12 kartik@deploy1001: scap-helm cxserver cluster staging completed
12:12 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
11:40 zeljkof: EU SWAT finished
11:39 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increase musical notation datatype string length limit (T218767)|gerrit:500692Increase musical notation datatype string length limit (T218767) (duration: 01m 02s)
11:37 akosiaris@deploy1001: scap-helm cxserver finished
11:36 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
11:36 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
11:30 onimisionipe: removing maps2002 from cassandra cluster due to dead node error
10:46 moritzm: upgrading remaining app servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
10:39 hashar: Upgrading CI Jenkins
10:21 volans: forcing puppet run on A:cp-upload_codfw
10:15 gehel: remove maps2001 from new cassandra cluster -T198622
10:10 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
09:57 elukey: roll restart druid-coordinator/overlord on druid100[4-6] to pick up new jvm settings
09:01 moritzm: deployment servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
08:20 moritzm: upgrading remaining job runners to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
08:19 elukey: roll restart of druid-broker/historical on druid100[4-6] to pick up new settings
06:33 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (stretch-wikimedia / thirdparty/ci)
06:32 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (jessie-wikimedia / thirdparty)
06:24 moritzm: upgrading remaining API Servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s3 ready only T219115 (duration: 00m 36s)
05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s3 master eqiad from db1078 to db1075 T219115 (duration: 00m 36s)
05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s3 on read-only T219115 (duration: 00m 37s)
05:00 marostegui: Starting s3 failover from db1078 to db1075 - T219115
04:32 marostegui: Disable puppet on db1078 and db1075 T219115
04:18 marostegui: Start topology changes to move s3 slaves under db1075 T219115
04:14 marostegui: Disable GTID on s3 hosts - https://phabricator.wikimedia.org/T219115
00:45 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/PageTriage/: UBN Fix for pageTriage and ORES T220649 (duration: 01m 04s)
00:12 twentyafterfour: deploying phabricator upgrade

2019-04-10

20:43 urandom: decommissioning cassandra-c, restbase2007 -- T208087
20:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 01m 00s)
19:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging via eventgate-analytics for group1 wikis - T214080 (duration: 00m 59s)
19:42 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.25 refs T206679 (duration: 01m 48s)
19:40 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.25 refs T206679
19:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.25 refs T206679
19:26 XioNoX: enable sampling on cr2-eqiad external links, outbound
19:17 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 [keeping static files] (duration: 02m 18s)
19:14 ejegg: updated fundraising CiviCRM from d0e44a9e51 to 24b968b1f9
19:08 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 [keeping static files] (duration: 02m 22s)
17:44 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 02m 22s)
16:58 chaomodus: restarted nagios-nrpe-server on proton1001 (it died due to OOM)
16:51 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
16:01 elukey: restart brokers on druid100[3-6] - locking after segments get deleted
15:46 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/DateFormatter.php: Ib2b3fb / T220563 (duration: 01m 00s)
15:28 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 59s)
15:26 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere (duration: 00m 21s)
15:26 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere
15:24 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/Score/: UBN Revert Score changes that broke VE T220465 (duration: 01m 01s)
15:19 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 (duration: 00m 13s)
15:19 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0
15:01 fsero: pooled back mwdebug200[1,2] T219989
15:00 fsero: repooling mwdebug2002
15:00 jijiki: Enable puppet on thumbor1001, switch back to nginx, pool thumbor1004 - T187765
14:57 fsero: repooling mwdebug2001
14:20 hashar: CI processing was a bit slower than usual over the past couple hours or so. It should be slightly faster now T220606
14:13 joal@deploy1001: Finished deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints (duration: 14m 41s)
13:58 joal@deploy1001: Started deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints
13:47 fsero: resizing disk on mwdebug2002 T219989
13:42 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on group0 (T188327) (duration: 01m 00s)
13:19 marostegui: Deploy schema change on aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki on x1 - T136427
12:41 urandom: decommissioning cassandra-b, restbase2007 -- T208087
12:40 hashar: contint2001: stopped puppet and zuul-merger for debugging
12:17 jbond42: rolling security update of systemd on stretch systems
12:07 Amir1: EU swat is done
12:07 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part II (duration: 01m 00s)
12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Prep work for deploying UrlShortener extension (T108557), part I (duration: 01m 00s)
11:46 dcausse: elastisearch search cluster: reindexing zh-min-nan wikis (T219533)
10:55 moritzm: upgrading nodejs on analytics-tool1002 to latest node 10 version from component/node10
10:46 gilles: T220265 setZoneAccess on all wikis finished
10:40 akosiaris: upgrade kubernetes-node on kubestage1002 (staging cluster) to 1.12.7-1 T220405
10:33 moritzm: upgrading nodejs on aqs* to latest node 10 version from component/node10
10:25 fsero: resizing disk on mwdebug2001 T219989
10:17 akosiaris: upload kubernetes_1.12.7-1 to apt.wikimedia.org/stretch-wikimedia component main T220405
10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217453 (duration: 00m 59s)
10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T217453 (duration: 01m 03s)
09:59 moritzm: upgrading labweb hosts (wikitech) to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
09:51 akosiaris: upgrade kubernetes-node on kubestage1001 (staging cluster) to 1.12.7-1 T220405
09:50 moritzm: upgrading snapshot hosts to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1 T220405
09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1
09:05 moritzm: upgrading job runners mw1299-mw1311 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
08:56 elukey: restart druid-broker on druid100[4-6] - stuck after attempt datasource delete action
08:46 godog: roll-restart swift frontends - T214289
08:36 elukey: update thirdparty/cloudera packages to cdh 5.16.1 for jessie/stretch-wikimedia - T218343
08:26 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment (duration: 00m 22s)
08:26 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment
08:12 gilles: T220265 foreachwiki extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --backend local-multiwrite
07:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574) (duration: 04m 05s)
07:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" (T220574)
07:12 onimisionipe: depooling maps200[34] to increase cassandra replication factor - T198622
07:09 jijiki: Rolling restart thumbor service
07:08 jijiki: Upgrading Thumbor servers to python-thumbor-wikimedia to 2.4-1+deb9u1
06:59 marostegui: Deploy schema change on x1 master, with replication, lag will happen on x1 T217453
06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool x1 slaves T217453 (duration: 01m 13s)
05:52 _joe_: setting both mwdebug200{1,2} to pooled = inactive to remove them from scap dsh list and allow deployments, T219989
05:12 _joe_: same on mwdebug2001
05:08 _joe_: removing hhvm cache on mwdebug2002
00:37 Krinkle: last scap sync-file failed to mwdebug2002.codfw and mwdebug2001.codfw due to insufficient disk space
00:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/resources/src/startup/: I3b9f1a13379a / Ie9db60e417cca (duration: 01m 01s)

2019-04-09

23:14 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 06m 03s)
22:31 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.25 refs T206679 (duration: 39m 59s)
22:19 chaomodus: uploaded python-pynetbox to apt.wikimedia.org/stretch-wikimedia (T217072)
22:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087 (duration: 02m 32s)
22:11 mobrovac@deploy1001: Started deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19|20) up to date - T208087
21:57 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.25 refs T206679
21:48 urandom: decommissioning cassandra-a, restbase2007 -- T208087
19:46 herron: added myself to ldap group cn=archiva-deployers,ou=groups,dc=wikimedia,dc=org
19:10 twentyafterfour: branching 1.33.0-wmf.25
18:53 crusnov@deploy1001: Finished deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script (duration: 00m 52s)
18:52 crusnov@deploy1001: Started deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script
18:50 thcipriani: gerrit back
18:48 thcipriani: gerrit restart
18:48 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming (duration: 00m 10s)
18:47 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming
18:46 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only) (duration: 00m 10s)
18:46 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only)
18:42 volans: restart icinga on icinga1001 - T196336
18:38 cdanis: T196336 cdanis@icinga1001$ sudo systemctl restart nsca
18:27 crusnov@deploy1001: Finished deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229 (duration: 00m 57s)
18:26 crusnov@deploy1001: Started deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - T215229
18:11 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 03s)
18:11 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
18:07 urandom: bootstrapping cassandra-c, restbase2020 -- T208087
17:58 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 02s)
17:58 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
17:56 elukey: restart keyholder-agent on deploy1001 to pick up new settings for analytics (+ arm all the keys)
17:42 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 04s)
17:42 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
17:42 elukey: restart keyholder-proxy.service on deploy1001 as attempt to reload perms for the analytics_deploy key
17:37 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 10s)
17:37 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
17:19 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667) (duration: 03m 50s)
17:15 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b04c397]: Update mobileapps to 3edfcad (T220045 T219411 T219667)
17:14 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/WikiExporter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1 (duration: 00m 51s)
17:09 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/XmlDumpWriter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 (duration: 00m 52s)
17:04 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/specials/SpecialUploadStash.php: T220265 Add support for X-Swift-Secret to upload stash (duration: 00m 53s)
17:03 twentyafterfour: deploying https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1
17:01 arturo: T220426 reimaging+renaming labtestnet2002 to cloudweb2001-dev
16:49 otto@deploy1001: scap-helm eventgate-analytics finished
16:49 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
16:49 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
16:46 otto@deploy1001: scap-helm eventgate-analytics finished
16:46 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
16:46 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
16:45 otto@deploy1001: scap-helm eventgate-analytics finished
16:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:41 herron: performing rolling restart of kafka main brokers and eventbus instances in eqiad to pick up security updates
16:32 otto@deploy1001: scap-helm eventgate-analytics finished
16:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:28 jijiki: Restarting thumbor service on thumbor1001
16:26 jijiki: Upgrading thumbor1001 to python-thumbor-wikimedia_2.4-1+deb9u1
16:18 jijiki: Uploading python-thumbor-wikimedia_2.4-1+deb9u1 to component/thumbor in stretch-wikimedia
15:05 moritzm: uploaded jenkins 2.164.1 for stretch-wikimedia/thirdparty/ci
15:04 moritzm: uploaded jenkins 2.164.1 for jessie-wikimedia/thirdparty
14:42 ejegg: updated payments-wiki from 15bcb3d1a6 to aa8dad50e7
14:10 ema: reboot lvs2010 with systemd 232 T209707
14:09 godog: bootstrapping cassandra-b, restbase2020 -- T208087
13:19 godog: bounce rsyslog on wezen
13:11 fsero: building envoy docker image
13:07 jbond42: rolling security updates of systemd on canary systems
12:35 godog: bounce rsyslog on lithium
12:13 elukey: powercycle logstash1012 - no ssh, no mgmt console available, seems completely stuck
12:10 jbond42: remove facter2.4 from wikimedia-buster
11:27 moritzm: upgrading API servers mw1276-mw1290 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
11:07 akosiaris: pool both DCs for newly created swift.recovery.wmnet RR
11:07 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=.*,dnsdisc=swift
11:00 ema: rebooting lvs2010 with systemd 241-1~bpo9+1 T209707
10:57 moritzm: updated buster installer to daily build from 9th of April
10:09 godog: bootstrapping cassandra-a, restbase2020 -- T208087
10:07 moritzm: rebooting stat1005 for some tests again
09:49 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming: T220476 Add originCountry to paintTiming context (duration: 00m 54s)
09:46 moritzm: rebooting stat1005 for some tests
08:47 akosiaris: switch swift to be accessed from varnish+ats active/active rw
08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove old comment from db1089 (duration: 00m 51s)
08:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2069 (duration: 00m 50s)
08:10 marostegui: Upgrade db2069
08:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2069 (duration: 00m 51s)
07:52 moritzm: upgrading app servers mw1319-mw1333 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy parsercache key change everywhere T210725 (duration: 00m 53s)
07:37 moritzm: installing samba security updates
07:21 marostegui: Change parsercache keys on mw[1230-1235,1238-1239] - T210725
07:10 jijiki: Depool thumbor1004 for testing - T187765
07:09 marostegui: Change parsercache keys on mw[1221-1229] - T210725
07:03 marostegui: Change parsercache keys on mw[1280-1289] - T210725
06:51 dcausse: elasticsearch search cluster: reindex all spaceless languages in eqiad and codfw (T219533)
06:47 moritzm: installing libav security updates
06:39 marostegui: Change parsercache keys on mw[1260-1269] - T210725
06:30 marostegui: Change parsercache keys on mw[1270-1279] - T210725
06:01 marostegui: Deploy parsercache key change on canaries only - T210725
03:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: Id04a3a / T219841 (duration: 00m 52s)
03:16 onimisionipe: depooled maps2003 - T219849
02:47 onimisionipe: restarting tilerator on maps2003 - T219849
02:40 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: I8614f6 / T219841 (duration: 00m 53s)
01:27 eileen: civicrm revision changed from dfe89516b3 to d0e44a9e51, config revision is 2bcbf44521
00:45 urandom: bootstrapping cassandra-c, restbase2019 -- T208087
00:07 ebernhardson@deploy1001: Synchronized wmf-config/: T218716: Migrade configs to WikibaseCirrusSearch (duration: 00m 51s)

2019-04-08

23:57 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218954: Enable WBCS search on commons too (duration: 00m 50s)
23:45 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 52s)
23:41 ebernhardson@deploy1001: Synchronized wmf-config: T218954: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 51s)
23:33 ebernhardson@deploy1001: Synchronized wmf-config/Wikibase.php: T218954: Disable wbcs dispatching query builder on commons (2/3) (duration: 00m 52s)
23:10 ebernhardson@deploy1001: Synchronized wmf-config/: T218954: Disable wbcs dispatching query builder on commons (1/3) (duration: 00m 52s)
22:45 XioNoX: rollback enable sampling on cr2-eqiad external links
22:29 XioNoX: enable sampling on cr2-eqiad external links
22:18 XioNoX: enable sampling on eqiad Telia transit link
22:04 jforrester@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: WBMI T220277 (duration: 00m 57s)
22:01 XioNoX: pfw firewall rules update - T217355
20:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 07m 55s)
20:41 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
20:24 urandom: bootstrapping cassandra-b, restbase2019 -- T208087
20:08 bearND: mobileapps deploy failed on canary (Check 'endpoints' failed). Rolled back canary.
20:08 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667) (duration: 02m 10s)
20:05 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to cdb9928 (T220045 T219411 T219667)
19:59 marxarelli: promotion of 1.33.0-wmf.24 to all wikis completed. error rates nominal aside from usual timeouts. cc: T206678, T220037
19:51 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
19:48 marxarelli: promoting 1.33.0-wmf.24 to all wikis. cc: T220037, T206678
19:41 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.2
19:35 marxarelli: starting promotion of 1.33.0-wmf.24 to group1
18:45 Lucas_WMDE: Morning SWAT done
18:31 bblack: deploying wiktionary CNAME experiment - https://phabricator.wikimedia.org/T208263#5094712
18:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221 (duration: 21m 14s)
18:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable eventgate-analytics api-request logging for group0 wikis - T214080 (duration: 00m 56s)
18:24 mobrovac: restart pdfrender on scb2001 - T174916
18:13 otto@deploy1001: scap-helm eventgate-analytics finished
18:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
18:12 otto@deploy1001: scap-helm eventgate-analytics finished
18:12 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
18:10 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
18:09 otto@deploy1001: scap-helm eventgate-analytics finished
18:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:06 mobrovac@deploy1001: Started deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - T219910 T220221
17:50 arturo: T220129 renaming labtestmetal2001.codfw.wmnet to clouddb2001-dev.codfw.wmnet
17:42 XioNoX: add swift term to cr1/2-eqiad - T220081
17:14 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix (duration: 11m 17s)
17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix
16:59 mobrovac@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms (duration: 00m 16s)
16:59 mobrovac@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms
16:55 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Replace needed WikimediaEditorTasks Beta Cluster config (T220153) (duration: 00m 58s)
16:31 urandom: bootstrapping cassandra-a, restbase2019 -- T208087
15:35 herron: aborting ores to logstash kafka logging pipeline switchover for now. puppet applied only to ores2009, reverting now
15:19 herron: switching ores to logstash kafka logging pipeline (via temporary puppet disable and rolling puppet agent runs)
15:09 jijiki: Pool mw2206 - T215415
14:55 papaul: powering down mw2206 for DIMM replacement
14:49 otto@deploy1001: Finished deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho (duration: 18m 35s)
14:45 papaul: powering down elastic2048 for disk replacement
14:30 otto@deploy1001: Started deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho
14:17 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on test wikis and mediawikiwiki (T188327) (duration: 00m 59s)
14:06 jijiki: Temporarily serve thumbor traffic on thumbor1001 via haproxy - T187765
13:41 moritzm: upgrading job runners in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
12:31 hashar: contint2001: upgraded python-pbr 0.8.2-1 -> 1.10.0-1 # T218559
12:25 moritzm: upgrading API servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
12:06 arturo: reboot cloudvirt1009 to clean some ACPI errors in dmesg
12:03 arturo: T219776 puppet node deactivate labtestnet2003.codfw.wmnet
12:00 hashar: contint1001 upgraded zuul to 2.5.1-wmf6 # T208426
11:53 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: WikibaseClient: Conditionally enable mapframe support (T218051) (duration: 00m 58s)
11:48 hashar: contint2001: stopping zuul-server , it is not meant to be running there
11:41 hoo@deploy1001: Synchronized wmf-config/abusefilter.php: Enable blocking feature of AbuseFilter in zh.wikipedia (T210364) (duration: 00m 58s)
11:25 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create uploader user group for thwiki (T216615) (duration: 00m 58s)
11:12 jijiki: Restarted thumbor services after librsvg upgrade
11:11 fsero: upgrading envoy to 1.9.1 T215810
10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:502190 Bumping portals to master (T128546) (duration: 00m 58s)
10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:502190 Bumping portals to master (T128546) (duration: 00m 59s)
10:34 moritzm: upgrading app servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
10:23 jijiki: Running debdeploy to upgrade librsvg
09:43 gehel: force allocation of 3 unassigned shards on elasticsearch / cirrus / eqiad
09:30 arturo: T219776 puppet node clean labtestnet2003.codfw.wmnet
09:20 volans: restarting icinga on icinga1001 - T196336
08:45 moritzm: upgrading API servers mw1221-mw1235 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
08:34 akosiaris@deploy1001: scap-helm zotero finished
08:34 akosiaris@deploy1001: scap-helm zotero cluster staging completed
08:34 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml --reset-values staging stable/zotero [namespace: zotero, clusters: staging]
08:32 akosiaris@deploy1001: scap-helm zotero finished
08:32 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
08:32 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
08:32 akosiaris: lower CPU, memory limits for zotero pods. Set 1 cpu, 700Mi. This should help the pods to recover faster in some cases. The old memory leak issues we used to have seem to be no longer present
08:31 akosiaris@deploy1001: scap-helm zotero finished
08:31 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
08:31 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
08:17 godog: delete fundraising folder from public grafana - T219825
08:01 godog: bounce grafana after https://gerrit.wikimedia.org/r/c/operations/puppet/+/501519
07:59 moritzm: upgrading mw1266-mw1275 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
07:59 moritzm: upgrading mw1266-mw1255 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T217453 (duration: 00m 58s)
07:19 marostegui: Deploy schema change on the first 10 wikis - T217453
07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T217453 (duration: 00m 59s)
07:02 moritzm: installing wget security updates
07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T143763 (duration: 00m 58s)
06:34 _joe_: restarted netbox, SIGSEGV on HUP-induced reload
05:20 marostegui: Deploy schema change on x1 master with replication, there will be lag on x1 slaves T143763
05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 01m 30s)

2019-04-07

off: restarted icinga on icinga2001
06:34 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
06:23 _joe_: deleting zotero pods with high memory watermark in codfw
06:03 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=zotero,name=codfw

2019-04-06

10:09 gilles: Purging ruwiki namespaces > 0

2019-04-05

23:10 thcipriani: revert some recent problematic gerrit acl changes
22:46 chaomodus: restarted pdfrender on scb1002 T174916
21:45 hashar: thcipriani restarted Gerrit. CI works again # T220243
21:37 thcipriani: restarting gerrit
21:30 hashar: CI / Zuul is no more processing events / T220243
17:29 thcipriani: gerrit back on 2.15.11
17:27 thcipriani: restart gerrit
17:26 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming) (duration: 00m 11s)
17:26 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 on cobalt (restart incoming)
17:25 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only) (duration: 00m 10s)
17:25 thcipriani@deploy1001: Started deploy [gerrit/gerrit@a4e66d4]: Gerrit to back to 2.15.11 (on gerrit2001 only)
17:19 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 02s)
17:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/diff/TextSlotDiffRenderer.php: Ia326c6 / T220217 (duration: 01m 00s)
16:02 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/includes/jobqueue/jobs/RefreshLinksJob.php: Ib1ac31365f9c / T220037 (duration: 00m 59s)
15:58 ejegg: re-enabled recurring donations queue consumer
15:57 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming/: I6b23be / T220156 (duration: 01m 00s)
15:51 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/GlobalBlocking/includes/specials/: I5843cd181ca7d (duration: 01m 02s)
15:08 ejegg: upgraded fundraising CiviCRM from 3c55850631 to 83478013a8
15:01 ejegg: disabled recurring donation queue consumer
14:55 papaul: powering down restbase2019 and 2020 for relocation
13:53 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
13:45 akosiaris: repool eqiad for all kubernetes services T217426
13:45 akosiaris: ρepool eqiad for all kubernetes services T217426
13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
13:45 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
13:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
13:41 arturo: T220203 reimage labtestnet2002 as spare in stretch
13:36 arturo: T220101 disable active icinga checks for cloudcontrol2002-dev
13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:35 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:50 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=99)
12:49 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:48 jijiki: Restarting pybal on lvs1016 and lvs2003 for 496382
12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:43 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:43 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
12:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
12:32 akosiaris: depool eqiad for all kubernetes services T217426
12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
12:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
12:31 akosiaris: repool codfw for all kubernetes services T217426
12:30 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:30 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:29 akosiaris: repool codfw for all kubernetes services
12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
12:29 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
12:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
12:18 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
12:15 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:15 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:12 bblack: repool esams
12:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
11:53 bblack: esams depooled in DNS
11:37 jijiki: Restarting pybal on lvs1006 and lvs2006 for 496382
11:27 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
10:57 arturo: updating puppet catalog compiler facts
10:42 elukey: restart druid broker on druid100[5,6] - exceptions in the logs after old datasource removal
10:41 elukey: restart druid broker on druid1004 - exceptions in the logs after old datasource removal
10:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
10:10 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
09:27 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
09:27 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
09:26 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
09:26 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
08:57 akosiaris: depool codfw kubernetes apps from discovery in preparation for upgrade
08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=cxserver
08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=blubberoid
08:57 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
08:55 arturo: T220101 reimaging+renaming labtestservices2002 to cloudservices2002-dev
08:43 akosiaris: upgrade kubernetes staging cluster to 1.11.9
08:32 elukey: roll restart of aqs on aqs100* to pick up new druid settings
08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 (duration: 00m 59s)
08:06 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
07:51 elukey: restart gerrit on cobalt (timeouts and general slowdown)
07:34 jijiki: Repooling thumbor1004 until we replace its memory - T215411
07:18 moritzm: upgrading mw1262-mw1265 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 (T203069)
06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 00m 57s)
06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 (duration: 01m 00s)
05:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 with low weight (duration: 00m 58s)
05:15 marostegui: Fully upgrade and reboot db1075
05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 59s)
04:49 gilles: T216594 Start purge of namespace 0 on ruwiki
02:27 eileen: update civicrm revision changed from 7560af93df to 3c55850631, config revision is 9ad5ef3e15
00:09 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wikitech: Lock LDAP accounts when users are blocked|gerrit:497866wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech|gerrit:501123Disable Phabricator accounts when blocked on wikitech (T168692) 2/2 (duration: 00m 57s)
00:07 bd808@deploy1001: Synchronized wmf-config/wikitech.php: SWAT: wikitech: Lock LDAP accounts when users are blocked|gerrit:497866wikitech: Lock LDAP accounts when users are blocked, Disable Phabricator accounts when blocked on wikitech|gerrit:501123Disable Phabricator accounts when blocked on wikitech (T168692) (duration: 00m 59s)

2019-04-04

23:52 bd808@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/LdapAuthentication: SWAT: Also set an LDAP password policy on Block|gerrit:501412Also set an LDAP password policy on Block (T168692) (duration: 01m 01s)
23:38 bd808@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add smn and sms to wmgExtraLanguageNames|gerrit:501393Add smn and sms to wmgExtraLanguageNames (T220118) (duration: 01m 02s)
21:22 XioNoX: renumber AS58587 to AS10075 in eqsin
21:17 bblack: DNS deploying https://gerrit.wikimedia.org/r/c/operations/dns/+/500731 which can affect resolution of our CNAME records. If dns-related issues, can revert at will!
21:09 herron: restarting eqiad ELK stack for security updates
20:45 marxarelli: promotion of 1.33.0-wmf.24 rolled back to group0 and holding. cc: T206678, T220037
20:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2/group1 wikis to 1.33.0-wmf.24"
20:36 marxarelli: rolling back again following still high rates of DBTransactionError (avg ~ 800/min)
20:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
20:11 marxarelli: promoting 1.33.0-wmf.24 to all wikis
20:11 marxarelli: error rates look good after proper syncs and re-deploy. cc: T220037
20:06 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Citoid/+/501114 (duration: 00m 58s)
20:04 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 57s)
20:03 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
20:02 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: sync for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/LdapAuthentication/+/500994 (duration: 00m 58s)
19:58 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: syncing JobExecutor changes (duration: 00m 58s)
19:55 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 47s)
19:53 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
19:51 marxarelli: re-deploying to group1 after proper syncs
19:47 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/Citoid/modules/ve.ui.Citoid.init.js: (no justification provided) (duration: 00m 59s)
19:46 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus/includes/JobExecutor.php: (no justification provided) (duration: 00m 58s)
19:45 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationPlugin.php: (no justification provided) (duration: 00m 58s)
19:44 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthenticationHooks.php: (no justification provided) (duration: 00m 59s)
19:43 dduvall@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/LdapAuthentication/LdapAuthentication.php: (no justification provided) (duration: 00m 59s)
19:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.33.0-wmf.24"
19:13 marxarelli: large spike in DBTransactionError errors. rolling back. cc: T220037
19:12 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
19:10 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
19:06 marxarelli: fetch/rebase looks good, incorporates fixes for T220037, T219510. deploying
19:03 marxarelli: preparing to promote 1.33.0-wmf.24 to group1
18:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on frwiki, plwiki (T219327, T219218) (duration: 00m 58s)
18:23 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES RCFilters on eswikiquote (T219160) (duration: 01m 02s)
18:13 moritzm: restarted apache on people.wikimedia.org to pick up OpenSSL update
17:59 bstorm_: stopped postgresql on labsdb1006.eqiad.wmnet and moved the database master functionality (and all rsyncs) to clouddb1003.clouddb-services.eqiad.wmflabs
17:59 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125 (duration: 04m 03s)
17:55 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@922cbc0]: Switch to new logging infrastructure T211125
17:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125 (duration: 01m 44s)
17:45 ppchelko@deploy1001: Started deploy [changeprop/deploy@f69dc9c]: Switch to new logging infrastructure T211125
17:33 jynus: stopping replication on dbstore2001:s8 for backup testing T206203
17:29 jynus: killing ongoing backup at dbprov2002, stuck
17:28 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
17:10 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
16:31 herron: beginning rolling kafka restarts on kafka200[123] for security updates
16:01 herron: repooling kafka2003 eventbus
15:59 mutante: wikivoyage-old.org domain has been retired and deactivated (T219867, T81727)
15:56 herron: depooling kafka2003 for eventbus security updates
15:55 herron: repooling kafka2002 eventbus
15:52 herron: depooling kafka2002 for eventbus security updates
15:52 herron: pooling kafka2001 eventbus
15:42 herron: depooling kafka2001 for eventbus security updates
15:38 moritzm: rolling restart of proton to pick up openssl security update
15:03 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
14:59 moritzm: installing libdatetime-timezone-perl updates
14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
14:24 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
14:23 jijiki: Depooling scb* from service cxserver traffic
13:46 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
13:46 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 37s)
13:29 jbond42: restart of gerrit apache service will occure at 13:40
13:28 volans: upgraded spicerack to 0.0.22 on cumin[12]001
13:27 volans: uploaded spicerack_0.0.22-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
13:23 moritzm: upgrading mw1261 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 / wikidiff 1.8.1
13:20 jijiki: Stopped all citoid services from scb* - 494215
13:15 jbond42: restart of phabricator apache service will occure at 14:25
12:46 moritzm: uploaded HHVM 3.18.5+dfsg-1+wmf8+deb9u2 to apt.wikimedia.org/stretch-wikimedia
12:10 arturo: T219626 reimaging cloudcontrol2001-dev again
11:43 moritzm: upgrading HHVM on mwdebug servers in eqiad along with update to hhvm-wikidiff 1.8.1
11:35 moritzm: uploaded nodejs 10.15.2~dfsg-1+wmf1 to the component/node10 component of apt.wikimedia.org/stretch-wikimedia (updated to latest 10.x release and a change to ensure zlib binary compat with NodeSource) (T215562)
11:34 Amir1: EU SWAT is done
11:32 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add mediawiki.org to the URL shortener whitelist|gerrit:500976Add mediawiki.org to the URL shortener whitelist (duration: 00m 58s)
11:28 jbond42: rolling security updates for apache on jessie
11:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ReferencePreviews beta feature on de- and ar-wiki (T218766)|gerrit:498371Enable ReferencePreviews beta feature on de- and ar-wiki (T218766) (duration: 01m 00s)
11:21 arturo: T219626 reimaging cloudcontrol2001-dev again
11:08 arturo: drop python-psutil from jessie-wikimedia/openstack-mitaka-jessie, related to T219626
10:56 moritzm: uploaded hhvm-wikidiff 1.8.1 to apt.wikimedia.org/stretch-wikimedia (source package is named php-wikdiff2 for legacy reasons) (T203069)
10:21 arturo: T219626 reimaging cloudcontrol2001-dev again
10:01 moritzm: installing openssl1.0 security updates on stretch-based DB hosts
08:36 moritzm: rolling restart of parsoid to pick up OpenSSL security update
08:06 moritzm: uploaded Apache 2.4.10-10+deb8u14+wmf1 to apt.wikimedia.org/jessie-wikimedia (latest jessie security update rebased with our local patches)
05:39 marostegui: Stop MySQL on db2033 for decommission - T219493
05:32 marostegui: Remove db2033 from tendril and zarcillo - T219493
05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2033 for decommission T219493 (duration: 00m 59s)
04:58 marostegui: Deploy schema change on labswiki for the job table - T219887
00:40 chaomodus: restart pdfrender on scb1003 - T174916

2019-04-03

23:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on zhwikisource (T219588) (duration: 00m 58s)
23:50 catrope@deploy1001: Synchronized dblists/flow.dblist: Enable Flow on zhwikisource (T219588) (duration: 00m 57s)
23:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage EventLogging on testwiki (duration: 00m 59s)
23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage tutorial pages on cswiki, kowiki, viwiki (dark deploy) (duration: 00m 59s)
23:18 catrope@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage on testwiki (duration: 01m 01s)
21:32 elukey: start hadoop-hdfs-namenode on an-master1002 after outage due to big job hitting HDFS
20:40 gehel: excluding elastic2048 from cluster and depooling - T220038
20:29 arlolra: Updated Parsoid to 0b3bb10 (T219337)
20:20 arlolra@deploy1001: Finished deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10 (duration: 05m 44s)
20:14 arlolra@deploy1001: Started deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10
20:09 marxarelli: 1.33.0-wmf.24 is holding at group0 following rollback. filed T220037. cc: T206678
19:56 marxarelli: log correction group1 reverted to 1.33.0-wmf.23
19:56 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.33.0-wmf.24
19:55 marxarelli: 111,185 and counting DBTransactionError for jobrunner.discovery.wmnet
19:53 marxarelli: rolling back group1
19:53 marxarelli: massive spike in DBTransactionError ([{exception_id}] {exception_url} Wikimedia\Rdbms\DBTransactionError from line 246 of /srv/mediawiki/php-1.33.0-wmf.24/includes/libs/rdbms/lbfactory/LBFactory.php: RefreshLinksJob::runForTitle: transaction round 'RefreshLinksJob::run' already started.)
19:51 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 49s)
19:50 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
19:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy (duration: 10m 54s)
19:23 smalyshev@deploy1001: Started deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy
18:14 thcipriani: gerrit back on 2.15.12
18:12 thcipriani: restarting gerrit for 2.15.12 update
18:11 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow) (duration: 00m 11s)
18:11 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow)
18:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
18:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only
17:57 elukey: restart hadoop-hdfs-namenode on an-master1001 as precautionary measure after the outage (currently standby)
17:44 herron: shortly postponing restarts of eventbus and kafka services for security updates due to unrelated firefighting - repooling kafka1001
17:19 elukey: restart hadoop-hdfs-namenode on an-master1002 after forced shutdown due to errors
17:14 herron: depooling kafka1001 to restart eventbus and kafka services for security updates
17:04 Lucas_WMDE: EU SWAT done
17:04 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=srwiki --fix # T214428 – 0 pages to fix, 0 links to fix, Looks good!
17:03 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule (T220001)|gerrit:500987Add new throttle rule (T220001) (duration: 00m 58s)
17:00 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/EventBus: SWAT: Incorrect order of calls in createPageDeleteEvent.|gerrit:500959Incorrect order of calls in createPageDeleteEvent. (duration: 00m 59s)
16:51 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
16:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
16:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=idwiktionary --fix # T218796 – 41 links to fix, 41 were resolvable, Looks good!
16:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add namespace "Lampiran" at id.wiktionary (T218796)|gerrit:499530Add namespace "Lampiran" at id.wiktionary (T218796) (duration: 00m 59s)
16:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Draft namespace on srwiki (T214428)|gerrit:500761Enable Draft namespace on srwiki (T214428) (duration: 01m 00s)
16:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add three domains at wgCopyUploadDomains (T216886, T219075)|gerrit:500154Add three domains at wgCopyUploadDomains (T216886, T219075) (duration: 01m 00s)
16:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507)|gerrit:500153Remove namespace 104 from FlaggedRevs configuration for arwiki (T217507) (duration: 01m 00s)
15:18 volans: shutdown ms-be2026 for firmware upgrade - T219854
15:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
15:16 volans@cumin1001: START - Cookbook sre.hosts.downtime
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on wikitech for T215525
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 8 wikis for T215525
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 7 wikis for T215525
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 6 wikis for T215525
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 5 wikis for T215525
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 4 wikis for T215525
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on remaining section 3 wikis for T215525
15:00 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 2 wikis for T215525
14:59 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on section 1 wikis for T215525
14:56 anomie@deploy1001: Synchronized php-1.33.0-wmf.24/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
14:55 anomie@deploy1001: Synchronized php-1.33.0-wmf.23/maintenance/includes/MigrateActors.php: Backporting fix from gerrit:500754 (duration: 01m 01s)
14:18 marostegui: Stop replication on pc2007 for testing - T210725
14:03 andrewbogott: restarting rabbitmq on cloudcontrol1003
13:59 andrewbogott: restarting neutron-l3-agent on cloudnet1003 and cloudnet1004
13:46 andrewbogott: restarting neutron-metadata-agent on cloudnet1003
13:44 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Identify images that should have had high importance (duration: 00m 59s)
13:34 moritzm: reverting dbmonitor2001 to deb8u12+wmf1 build
13:02 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
13:01 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
12:49 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
12:45 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
12:42 arturo: T219626 reimaging cloudcontrol2001-dev
12:31 mutante: restarting gerrit service to apply change 498431
11:25 Amir1: EU SWAT is done
11:16 jbond42: rolling security updates for apache
10:29 mutante: planet1001/2001 - apt autoremove un-required packages
10:27 mutante: planet1001/2001 - upgrade apache2, openssh, locales, rsyslog ..
10:25 arturo: updating puppet compiler facts
10:19 volans: upgraded spicerack to 0.0.21 on cumin[12]001
10:17 volans: uploaded spicerack_0.0.21-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
09:56 marostegui: Alter empty job table on s6 primary master - T219887
09:55 moritzm: upgrading beta to hhvm wikidiff 1.8.1 (T203069)
09:54 mutante: running mysql select queries on m3-slave to get data from phabricator conpherence as requested by andre
09:45 moritzm: removed labtestnet2003.codfw.wmnet from debmonitor (T219776)
09:29 ema: cp-ats-codfw: test ATS rolling restart T213263
09:27 marostegui: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 T219963
09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool s8 sanitarium master (duration: 00m 56s)
09:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool s8 sanitarium master (duration: 01m 00s)
08:35 jynus: merging change on network constants (firewall operation)
08:23 marostegui: Restart mysql on sanitarium hosts db1124 db1125 db2094 db2095 - T218302
08:18 marostegui: Stop replication on db2082 and db1087 (s8 sanitarium masters) T218302
08:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool s8 sanitarium master (duration: 00m 57s)
08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool s8 sanitarium master (duration: 00m 58s)
08:09 moritzm: installing new apache packages on mmw1261
07:53 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/media/ThumbnailImage.php: T216499 Only apply high priority hint half the time (duration: 00m 58s)
07:51 moritzm: installing new apache packages on mwdebug
07:42 marostegui: Reboot db1115 - tendril and dbtree will be down
07:40 marostegui: DIsable event scheduler on db1115 before restarting - tendril is stuck
07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 T219493 (duration: 00m 57s)
07:25 marostegui: Deploy schema change on db1073, labtestwiki - T219887
07:09 marostegui: Stop replication in sync on db1120 and db2034 (x1 codfw master) - T219493
07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 T219493 (duration: 01m 13s)
06:04 _joe_: restart varnish backend on cp1085, causing unavailability
05:57 marostegui: Fix data drifts on bnwikisource on x1 - T219493
05:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
05:23 marostegui: Upgrade pc1007
05:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 for upgrade (duration: 01m 00s)

2019-04-02

23:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT enwiki: Restrict move-categorypages to +extendedmover/+sysop/+bot T219261 (duration: 00m 58s)
23:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Add new WMCS IP range to wgRateLimitsExcludedIps T167432 (duration: 00m 57s)
23:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable SandboxLink for rowiki T219855 (duration: 00m 56s)
23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add 'depicts' statements to search index on testcommons (duration: 00m 59s)
21:27 andrewbogott: rebooting labservices1001
21:16 andrewbogott: rebooting labservices1002
20:54 andrewbogott: restarting pdns and pdns-recursor on labservices1001 and 1002 in hopes of getting those machines to act a bit less sluggish
20:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/skins/Vector/includes/: I6e04b512d / T219864 (duration: 00m 59s)
20:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/: I6e04b512d / T219864 (duration: 01m 00s)
20:16 marxarelli: 1.33.0-wmf.24 successfully deployed to group0. errors rates look normal (T206678)
20:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.33.0-wmf.24
19:57 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache (duration: 44m 20s)
19:12 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.24 and rebuild l10n cache
18:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125 (duration: 20m 49s)
18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24 (T206678)
18:22 marxarelli: cutting mediawiki branch 1.33.0-wmf.24
18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
18:20 ppchelko@deploy1001: deploy aborted: Kafka logging pipeline, full deploy T211125 (duration: 00m 03s)
18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, full deploy T211125
18:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125 (duration: 02m 33s)
18:06 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7]: Kafka logging pipeline, canary on restbase2010 T211125
17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125 (duration: 03m 25s)
17:56 ppchelko@deploy1001: Started deploy [restbase/deploy@2cb53a7] (dev-cluster): Kafka logging pipeline, dev cluster only T211125
17:51 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218 (duration: 20m 47s)
17:37 ejegg: updated payments-wiki-staging from 793bce1a5f to 15bcb3d1a6
17:30 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328]: Upgrade swagger to v3, attempt 2, T218218
17:30 ppchelko@deploy1001: Finished deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218 (duration: 03m 02s)
17:27 ppchelko@deploy1001: Started deploy [restbase/deploy@3dcf328] (dev-cluster): Upgrade swagger to v3, attempt 2, T218218
16:47 XioNoX: - replacing accepted-prefix-limit with prefix-limit in eqsin - T211730
16:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218 (duration: 04m 52s)
16:39 ppchelko@deploy1001: Started deploy [restbase/deploy@6026ad1]: Switch to swagger 3 T218218
16:36 XioNoX: - replacing accepted-prefix-limit with prefix-limit on esams - T211730
16:12 XioNoX: - replacing accepted-prefix-limit with prefix-limit on cr2-eqiad - T211730
16:02 mutante: T194174 - bump. started alerting again 2 days ago
16:00 mutante: icinga - schedule (30d) downtime for kubernetes operational latencies alerts (T219696) on kubernetes1004
15:57 arturo: T219626 reimaging cloudcontrol2001-dev again
15:55 mutante: scandium - systemctl start parsoid-vd was failed (T201366)
15:55 herron: beginning rolling upgrade of codfw ELK cluster to 5.6.15 T219571
15:52 mutante: icinga - re-enabling notifications for scandium. setup task is resolved yet systemd is alerting, should not have been turned off anymore (T201366)
15:39 XioNoX: repool eqsin - T219847
15:32 jbond42: add cpp-hocon 0.1.6 to jessie-wikimedia/backports
15:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VE: Enable mobile section editing A/B test on all remaining wikis T219564 (duration: 00m 51s)
15:07 moritzm: stopped/disabled ipmievd on cumin2001
14:54 jbond42: add leatherman 1.4 to jessie-wikimedia/backports
13:44 anomie@mwmaint1002: Fixing empty values for 'target_author_actor' in log_search on test wikis and mediawikiwiki for T215525
13:24 volans: reboot ms-be2026 to see if that fixes the controller - T219854
13:23 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
13:23 volans@cumin1001: START - Cookbook sre.hosts.downtime
13:20 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
13:20 volans@cumin1001: START - Cookbook sre.hosts.downtime
13:20 jynus: updating puppet compiler facts
12:11 arturo: icinga downtime toolschecker for 1 month T219243
12:07 hashar: contint1001: compressing some MediaWiki debugging logs under /srv/jenkins/builds # T219850
11:42 moritzm: restarting parsoid on wtp1025 to pick up openssl update
11:33 hashar: contint1001: cleaning Docker containers #T219850
11:23 Amir1: EU SWAT is done
11:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109)|gerrit:499777Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 51s)
11:21 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add the urlshortener-manage-url right and enable it for stewards (T133109)|gerrit:499777Add the urlshortener-manage-url right and enable it for stewards (T133109), Part I (duration: 00m 53s)
11:14 akosiaris: T217715 Update mathoid, citoid, cxserver, eventgate grafana dashboards to use the new recording rules for the quantiles
11:14 jbond42: add cmake 3.6.2 to jessie-wikimedia/backports
11:02 jbond42: add rapidjson 1.1.0 to jessie-wikimedia/backports
10:47 jbond42: add catch 1.10 to jessie-wikimedia/backports
10:42 jbond42: add strip-nondeterminism 0.034 to jessie-wikimedia/backports
10:39 jbond42: add dh-autoreconf 12 to jessie-wikimedia/backports
10:30 jbond42: add debhelper 10.2.5 and dh-systemd 10.2.5 to jessie-wikimedia/backports
10:08 elukey: manually purge varnishkafka graphite alert's URL as attempt to avoid a flapping alert - T219842
09:14 arturo: T219776 finally reimaging cloudnet2003-dev.codfw.wmnet (was labtestnet2003)
09:03 _joe_: uploaded patched version of bootstrap-vz to account for jessie-updates vanishing (T219683)
08:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 T219777 T143763 (duration: 00m 53s)
08:50 marostegui: Execute schema change on db1069 x1 master with replication enabled on the following small wikis: aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki T143763
08:20 marostegui: Compress wikishared.urlshortcodes table on x1, directly on the master with replication (table has 1 row) - T219777
08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 T219777 T143763 (duration: 00m 53s)
08:13 moritzm: installing debdeploy updates on remaining hosts in eqiad/codfw
08:05 moritzm: installing openssl1.0 security updates
07:52 moritzm: removed labvirt1008 from debmonitor (T216661)
06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 (duration: 00m 50s)
06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 (duration: 00m 52s)
06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 (duration: 00m 52s)
06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 (duration: 00m 54s)
06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1008 (duration: 00m 53s)
05:58 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@2a090ef]: New version for T219778 (duration: 00m 19s)
05:58 oblivian@deploy1001: Started deploy [docker-pkg/deploy@2a090ef]: New version for T219778
05:55 marostegui: Upgrade pc1008
05:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1008 (duration: 00m 56s)
04:14 onimisionipe: restarted tilerator on maps200[1-3] - connection refused
01:18 XioNoX: replacing accepted-prefix-limit with prefix-limit on cr1-eqiad - T211730
01:14 XioNoX: replacing accepted-prefix-limit with prefix-limit in eqord - T211730
00:52 XioNoX: depool eqsin due to Telia eqsin-codfw link outage
00:40 XioNoX: replacing accepted-prefix-limit with prefix-limit in [co|eq]dfw - T211730
00:25 XioNoX: replacing accepted-prefix-limit with prefix-limit on all ulsfo peers - T211730
00:19 XioNoX: replacing accepted-prefix-limit with prefix-limit on one ulsfo peer - T211730
00:06 XioNoX: jnt push to msw switches

2019-04-01

23:54 shdubsh: restarting kafka on kafka-jumbo1004
23:47 shdubsh: restarting kafka on kafka-jumbo1003
23:36 shdubsh: restart kafka on kafka-jumbo1002
23:28 shdubsh: restart kafka on kafka-jumbo1001
23:16 XioNoX: jnt push to csw2-esams
22:52 XioNoX: restart pdfrender on scb1003 - T174916
21:44 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Remove kowiki spam mitigations T212679 (duration: 00m 54s)
21:28 XioNoX: Push AS specific policy-statements to cr1/2-eqsin v4 peers - T211930
21:11 dcausse: elasticsearch search cluster: reindex spaceless languages (T219533)
19:48 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Renew Priority Hints origin trial token (duration: 00m 54s)
19:48 bblack: authdns2001 (ns1) upgrade gdnsd -> 3.1.0
18:58 XioNoX: re-set ulsfo-codfw ospf cost to previous default - T219591
18:52 shdubsh: restart mjolnir-kafka-msearch on relforge1002 to adopt new logging config
18:44 dcausse: Morning SWAT done
18:42 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T219268: [cirrus] Use bm25 similarity for all wikis (duration: 00m 51s)
18:33 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Cleanup transitional states (duration: 00m 53s)
18:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: ExternalGuidance: Allow google translate hosts as known services (T218948)|gerrit:498913ExternalGuidance: Allow google translate hosts as known services (T218948) (duration: 00m 53s)
18:18 bblack: multatuli (ns2) upgrade gdnsd -> 3.1.0
18:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T138104)|gerrit:499999Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T138104) (duration: 00m 54s)
17:55 XioNoX: remove asw2-c-eqiad:et-3/1/2 from disabled interfaces - T218059
17:31 bblack: authdns1001 (ns0) upgrade gdnsd -> 3.1.0
17:22 bblack: upgrade gdnsd -> 3.1.0 (wmf2) on cp1099 (authdns test)
17:21 bblack: uploading gdnsd-3.1.0-1~wmf2 to stretch-wikimedia
17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern (duration: 12m 10s)
17:07 arturo: restart dhcp server in install2002 to release old lease for labtestnet2003
17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@115a6bf]: Added more endpoint, GUI updates and new bot pattern
16:32 vgutierrez: slowly reenabling puppet in cache text cluster - T213705
16:28 bblack: upgrade gdnsd -> 3.1.0 on cp1099 (authdns test)
16:25 bblack: uploading gdnsd-3.1.0-1~wmf1 to stretch-wikimedia
16:15 arturo: T219776 reimaging + renaming labtestnet2003 into cloudnet2003-dev
16:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet
16:07 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet
16:05 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2023.codfw.wmnet
15:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2023.codfw.wmnet
15:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3042.esams.wmnet
15:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
15:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
15:43 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4032.ulsfo.wmnet
15:42 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
15:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
15:24 vgutierrez: disable puppet in the cache text cluster - T213705
15:09 Amir1: mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki=hywwiki --baseName hywwiki --cluster (eqiad|codfw)
14:59 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Cleanup: Remove obsolete WikimediaEditorTasks beta cluster prefs (duration: 00m 50s)
14:44 moritzm: rolling out debdeploy 0.0.99.10 for jessie, buster, stretch systems
14:42 moritzm: restarting superset on analytics-tool1004 to pick up latest Python
14:41 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=hywwiki --force --sysop Ladsgroup
14:37 ladsgroup@deploy1001: Synchronized langlist: (no justification provided) (duration: 00m 50s)
14:35 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 50s)
14:33 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: T212597 (duration: 00m 51s)
14:32 Amir1: wikiadmin@10.64.32.136(hywwiki)> update text set old_text = 'DB://cluster25/1';
14:18 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
14:11 moritzm: uploaded debdeploy 0.0.99.10 to apt.wikimedia.org (jessie, stretch, buster)
14:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 52s)
14:04 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5007.eqsin.wmnet
13:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5007.eqsin.wmnet
13:56 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5001.eqsin.wmnet
13:50 hashar: Reverted CI Jenkins jobs to Quibble 0.0.28 # T219647
13:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5001.eqsin.wmnet
13:26 mvolz@deploy1001: scap-helm citoid finished
13:26 mvolz@deploy1001: scap-helm citoid cluster codfw completed
13:26 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
13:23 mvolz@deploy1001: scap-helm citoid finished
13:23 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
13:23 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
13:12 mvolz@deploy1001: scap-helm citoid finished
13:12 mvolz@deploy1001: scap-helm citoid cluster staging completed
13:12 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
13:11 hashar: Upgraded CI Jenkins jobs to Quibble 0.0.30 # T219647
13:09 jbond42: rolling security update of tshark
12:24 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm (duration: 00m 43s)
12:23 oblivian@deploy1001: Started deploy [docker-pkg/deploy@46ba982]: Rollback - third time is the charm
12:08 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778 (duration: 00m 18s)
12:08 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Rollback to 1.0.0, T219778
12:02 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778 (duration: 00m 34s)
12:02 oblivian@deploy1001: Started deploy [docker-pkg/deploy@UNKNOWN]: Rollback to 1.0.0, T219778
11:58 Lucas_WMDE: EU SWAT done
11:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikibaseLexeme: SWAT: Fix GrammaticalFeatureListWidget (T219134, T219734)|gerrit:500237Fix GrammaticalFeatureListWidget (T219134, T219734) (duration: 01m 00s)
11:53 moritzm: uploaded logstash/kibana/elasticsearch 5.6.15 to component thirdparty/elastic56
11:52 moritzm: uploaded logstash/kibana/elasticsearch to component thirdparty/elastic56
11:51 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add unwatchedpages permission to rollbacker and patroller at zhwiki (T219285)|gerrit:500393Add unwatchedpages permission to rollbacker and patroller at zhwiki (T219285) (duration: 00m 52s)
11:41 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Correct logos for the Gujarati Wikipedia (T219373)|gerrit:499210Correct logos for the Gujarati Wikipedia (T219373) (duration: 00m 52s)
11:34 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Enable logging of private filters on commonswiki (T218527)|gerrit:497236Enable logging of private filters on commonswiki (T218527) (duration: 00m 50s)
11:25 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" (T191039)|gerrit:498818Revert "Revert "Remove $wgAbuseFilterRuntimeProfile"" (T191039) (duration: 00m 51s)
11:17 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Revert "Revert "Remove $wgAbuseFilterProfile"" (T191039)|gerrit:498817Revert "Revert "Remove $wgAbuseFilterProfile"" (T191039) (duration: 00m 52s)
11:16 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2 (duration: 01m 08s)
11:15 oblivian@deploy1001: Started deploy [docker-pkg/deploy@0c32dc1]: Upgrade to 1.1.2
11:00 jbond42: halt rolling updates of tshark untill after SWAT
10:48 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:500410 Bumping portals to master (T128546) (duration: 00m 50s)
10:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:500410 Bumping portals to master (T128546) (duration: 00m 52s)
10:42 jbond42: rolling security update of tshark
10:32 _joe_: pruning old images on boron
10:31 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2 (duration: 00m 26s)
10:31 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7ef5ca3]: Upgrade to 1.1.2
10:27 arturo: T219626 reimaging cloudcontrol2001-dev
09:09 moritzm: installing Chromium security updates on proton* (tested the new release in deployment-prep)
08:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2033 (duration: 00m 51s)
08:09 marostegui: Deploy testing schema change on enwiki.echo_event on db2033 and upgrade mysql - T143961
07:54 ariel@deploy1001: Finished deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script (duration: 00m 03s)
07:54 ariel@deploy1001: Started deploy [dumps/dumps@7abb6c8]: get db user/passwd va mw maint script
07:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2033 (duration: 00m 51s)
06:28 _joe_: pushing wikimedia-jessie:{20190401,latest} to docker-registry.w.o T219580
06:27 _joe_: installing new bootstrap-vz on boron T219580
05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 50s)
05:08 marostegui: Deploy schema change on db1077, this will generate lag on s3 on labs
05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 53s)

2019-03-31

06:57 marostegui: Remove old files from dbstore1001 to clean up the disk space warning

2019-03-30

03:39 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ImageMap/includes/ImageMap.php: I1387825f25e / T217087 (duration: 00m 52s)
03:16 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/skins/Vector/includes/templates/index.mustache: I0d6e036b65da0 / T219359 / i18n regression (duration: 00m 54s)

2019-03-29

22:06 bstorm_: stopped database services on labsdb1004 and labsdb1005
21:01 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 05m 14s)
20:55 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
20:49 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3) (duration: 03m 13s)
20:46 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 3)
20:35 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2) (duration: 03m 30s)
20:31 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (part 2)
20:30 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers (duration: 00m 30s)
20:29 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Elasticsearch 6 fixes for content-type headers
18:41 ejegg: updated payments-wiki from 4b49bb7333 to 793bce1a5f
15:51 XioNoX: repool ulsfo - T219591
15:48 XioNoX: bump ulsfo-codfw ospf link cost to 1000 - T219591
15:14 _joe_: pruning old images and containers on boron
15:00 mutante: ldap-eqiad-replica02 - running out of disk - apt-get clean - gzipping /var/log/debug
13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
13:06 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
13:05 ema: cp2002/cp2005: repool varnish-fe for user traffic T213263
12:55 thcipriani: gerrit running on 2.15.11
12:53 thcipriani: restarting gerrit to finish rollback to 2.15.11
12:52 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming (duration: 00m 11s)
12:52 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 on cobalt -- restart of gerrit incoming
12:51 moritzm: removing php 7.0 packages from snapshot1008, dumps are only using 7.2 (T218193)
12:50 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only) (duration: 00m 10s)
12:50 thcipriani@deploy1001: Started deploy [gerrit/gerrit@670ddb8]: Gerrit (back) to version 2.15.11 (gerrit2001 only)
12:47 moritzm: upgrading snapshot1008 to component/php72 (T218193)
12:46 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
12:23 ema: rolling ATS restarts to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/500011/ T213263
11:45 mutante: cobalt - systemctl restart gerrit
10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
10:36 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
10:35 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
09:37 mutante: restarting zuul on contint1001
09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
09:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
08:36 godog: depool ulsfo as precaution -- link repair in progress
08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1110 (duration: 00m 50s)
07:58 gilles@deploy1001: Synchronized php-1.33.0-wmf.23/includes/media/MediaTransformOutput.php: T216499 Only apply high priority half the time (duration: 00m 50s)
07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 51s)
07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1110 (duration: 00m 50s)
07:19 vgutierrez: reenabling puppet in acme-chief clients after verifying NOOP in netmon2001
07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1110 (duration: 01m 06s)
07:11 vgutierrez: disabling puppet in acme-chief clients to merge I437b91 safely
07:06 marostegui: Upgrade db1110
07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 49s)
07:01 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216598 T216594 Element Timing for Images and Layout Stability on ruwiki (duration: 00m 51s)
06:56 marostegui: Remove tools section from tendril by doing: update shards set display='0' where name='tools'; T216749
06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 (duration: 00m 49s)
06:41 marostegui: Upgrade pc1009
06:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 (duration: 00m 50s)
06:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 50s)
05:49 marostegui: Disable notifications on labsdb1004 and labsdb1005 - T216749
05:47 marostegui: Remove labsdb1004 and labsdb1005 from tendril - T216749
05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s)
00:18 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: I35213d83a0 (duration: 00m 49s)
00:16 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I8887ce013a8 (duration: 00m 51s)
00:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: I24a5469dbfd0 / T216206 for testwikidatawiki (duration: 00m 50s)

2019-03-28

23:54 krinkle@deploy1001: Synchronized wmf-config/Wikibase.php: Ib9d617 (duration: 00m 50s)
23:53 krinkle@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Ib9d617 (duration: 00m 51s)
23:14 bstorm_: completed setting up clouddb1003 as the replica of labsdb1006 (osm)
22:13 bd808@deploy1001: Finished deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325) (duration: 00m 59s)
22:12 bd808@deploy1001: Started deploy [striker/deploy@2f62c43]: Fixes for error pages and repo creation (T176325)
22:11 XioNoX: add AS specific policy-statements to cr1-eqsin v6 transits - T211930
21:51 thcipriani: restarting gerrit
21:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikimania] Enable VisualEditor in the 2019 namespace T218645 (duration: 00m 50s)
21:16 XioNoX: add AS specific policy-statements to cr2-eqsin v6 transits - T211930
21:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikitech] Enable VisualEditor in extra namespaces (duration: 00m 50s)
20:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: VisualEditor: Enable mobile section editing A/B test on 10 Wikipedias T218851 T218939 (duration: 00m 50s)
20:29 moritzm: restarting Gerrit on cobalt to effect new Java security update
19:47 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on wikidatawiki (duration: 00m 52s)
19:39 mdholloway: created table wikimedia_editor_tasks_entity_description_exists on wikidatawiki
19:19 marxarelli: 1.33.0-wmf.23 deployed for all wikis (T206677)
19:09 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.23
18:45 bstorm_: switching replica for osmdb to clouddb1003 VM from labsdb1007
18:42 addshore@deploy1001: Synchronized wmf-config/db-labs.php: BETA ONLY db-labs (duration: 00m 57s)
18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: wikibase.php, define sharedCacheKeyGroup (duration: 00m 57s)
18:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/ProofreadPage/includes/Index/IndexContent.php: ProofreadPage: Fix AbuseFilter UBN T219514 (duration: 00m 57s)
18:17 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/AdvancedSearch/: AdvancedSearch: Fix two UBNs T219455 T219539 (duration: 00m 59s)
18:03 ejegg: updated payments-wiki from 6661655e37 to 4b49bb7333
17:46 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix (duration: 03m 24s)
17:43 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ff9d424]: Deploy logging @cee: prefixing bugfix
16:39 XioNoX: enable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
16:36 mutante: wikitech-static - changing [renewalparams] authenticator = to 'apache' from 'standalone' (installer = was already apache) (T214640)
16:36 jbond42: move python3-requests and python3-urllib3 from jessie-wikimedia backports to component/kube2proxy
16:33 XioNoX: disable cr2-codfw:xe-5/0/0 (to cr2-eqdfw)
16:00 akosiaris: poweroff sessionstore2001 for a re-racking
15:15 mutante: wikitech-static - removing acme-setup cron jobs from root's crontab. this was used before the switch to certbot, is unrelated and added to confusion and maybe the problem (T214640)
15:07 otto@deploy1001: scap-helm eventgate-analytics finished
15:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:07 otto@deploy1001: scap-helm eventgate-analytics finished
15:07 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
15:07 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
15:06 otto@deploy1001: scap-helm eventgate-analytics finished
15:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
14:46 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159 (duration: 00m 52s)
14:45 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@4deeb04]: Partition htmlCacheUpdate topic, final cleanup stage T219159
14:32 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159 (duration: 00m 53s)
14:31 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@3a8a889]: Partition htmlCacheUpdate topic, step 2 T219159
14:07 gehel: reindexing changes from '2019-03-26T12:00:00Z' to '2019-03-28T12:00:00Z' into cirrus / elasticsearch - T218878
13:59 gehel: restarting elasticsearch on elastic2050 to validate JVM upgrade
13:57 moritzm: upgrading Java on elasticsearch hosts
13:50 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
13:49 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
13:22 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159 (duration: 00m 48s)
13:21 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@c120b38]: Partition htmlCacheUpdate topic, explicitly exclude htmlCacheUpdate T219159
13:14 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159 (duration: 01m 46s)
13:12 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@17285f8]: Partition htmlCacheUpdate topic, step 1 T219159
12:20 moritzm: removing php 7.0 packages from snapshot1005-1007/1009, dumps are only using 7.2 (T218193)
12:13 jbond42: move git from jessie-wikimedia backports repo components/ci
12:02 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450)|gerrit:499756Revert "SDC: Enable both new-style and old-style Wikibase federation on Commons" (T219450) (duration: 00m 57s)
11:54 moritzm: upgrading snapshot1005-1007/1009 to component/php72 (T218193)
11:53 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Revert T212597
11:51 ladsgroup@deploy1001: Synchronized dblists: Revert T212597 (duration: 00m 58s)
11:27 ladsgroup@deploy1001: Synchronized dblists: T212597 (duration: 00m 56s)
11:01 godog: test copying prometheus metrics on bast3002
10:54 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
10:22 gehel: restarting elasticsearch on elastic20[34,36,50] (shards stuck in recovery) - T218878
10:15 addshore@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/Wikibase/lib: T219452 Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule|gerrit:499738Revert: Use enableModuleContentVersion() for Wikibase\lib\SitesModule (duration: 01m 06s)
10:11 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
09:56 gehel: restarting elasticsearch-omega on elastic2031 (shards stuck in recovery) - T218878
09:42 gehel: restarting elasticsearch on elastic20[28,29,41] (shards stuck in recovery) - T218878
09:37 gehel: restarting elasticsearch-psi on elastic20[39,40] (shards stuck in recovery) - T218878
09:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
09:28 gehel: restarting elasticsearch on elastic20[25,27] (shards stuck in recovery) - T218878
09:19 gehel: restarting elasticsearch-omega on elastic20[38,50] (shards stuck in recovery) - T218878
09:14 godog: install rsyslog 8.1901.0-1~bpo8+wmf1 on phab1001 and copper
09:09 gehel: restarting elasticsearch-omega on elastic2050 (shards stuck in recovery) - T218878
09:06 gehel: restarting elasticsearch-psi on elastic20[35,36,53] (shards stuck in recovery) - T218878
09:00 gehel: restarting elasticsearch-psi on elastic2036 (shards stuck in recovery) - T218878
08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 55s)
08:43 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2007 after upgrade (duration: 00m 57s)
08:38 gehel: retry shard allocation on elasticsearch codfw all clusters (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed') - T218878
08:37 gehel: retry shard allocation on elasticsearch codfw (curl -k -XPOST 'https://localhost:9243/_cluster/reroute?pretty&explain=true&retry_failed')
08:33 elukey: move hadoop yarn configuration from hdfs back to zookeeper - T218758
08:32 marostegui: Upgrade pc2007
08:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2007 for upgrade (duration: 00m 56s)
08:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2009 after upgrade (duration: 00m 57s)
08:12 marostegui: Upgrade pc2009
08:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2009 for upgrade (duration: 00m 57s)
08:10 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
08:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
07:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool pc2008 after upgrade (duration: 00m 57s)
07:22 marostegui: Upgrade pc2008
07:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool pc2008 for upgrade (duration: 00m 57s)
07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clean up old non used entries (duration: 01m 04s)
06:27 marostegui: Deploy schema change on s3 codfw, lag will be generated on s3 codfw.
05:39 marostegui: Restart apache on phab1001 - phabricator is down
02:50 chaomodus: restarted pdfrender on scb1004 in order to attempt to address flapping errors
01:45 XioNoX: add AS specific policy-statements to cr2-eqsin (but don't apply them yet) - T211930
01:20 XioNoX: progressive jnt push to standardize cr*
01:15 XioNoX: remove sandbox-out6 filter from all routers
00:56 XioNoX: jnt push to standardize asw*
00:32 XioNoX: jnt push to standardize mr1-*
00:21 krinkle@deploy1001: Synchronized php-1.33.0-wmf.23/includes/api/ApiStashEdit.php: Ic357dbfcd9ab / T203786 (duration: 00m 57s)

2019-03-27

23:46 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Pass database name to the NameTableStore constructor (duration: 00m 57s)
23:34 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on Wikidata|gerrit:499400Load WikibaseLexemeCirrusSearch on Wikidata T216206 (duration: 00m 58s)
23:25 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Load WikibaseLexemeCirrusSearch on test.wikidata.org|gerrit:499399Load WikibaseLexemeCirrusSearch on test.wikidata.org T216206 (duration: 00m 59s)
22:51 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
22:51 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
22:47 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 04s)
22:47 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
22:45 krinkle@deploy1001: Synchronized wmf-config/profiler.php: I8c7f8c / T176916 (duration: 00m 59s)
22:36 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 34s)
22:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
22:30 niharika29@deploy1001: Finished deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link (duration: 00m 02s)
22:30 niharika29@deploy1001: Started deploy [scholarships/scholarships@9db232d]: Update wikimania-scholarships; includes fix for broken privacy policy link
22:21 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns (duration: 00m 31s)
22:21 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4fd1022]: Deploy new bot patterns
21:59 chaomodus: restarting proton1001 to upgrade ram
21:58 chaomodus: restarting proton1002 to upgrade ram
21:57 chaomodus: restarting proton2001 in order to upgrade ram
21:54 chaomodus: restarting proton2002 in order to upgrade ram
21:25 dcausse@deploy1001: Synchronized wmf-config/Wikibase.php: T219448 (duration: 00m 55s)
21:25 eileen: civicrm revision changed from 67b8405b60 to 7560af93df, config revision is 5a0cbb3c7d (was actually before the process control one)
21:24 eileen: process-control config revision is e1bc772c89
21:17 chaomodus: restarted proton on proton1001 in response to memory exhaustion and cpu peg
21:07 milimetric@deploy1001: Finished deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs (duration: 11m 48s)
20:55 milimetric@deploy1001: Started deploy [analytics/refinery@fdd21a4]: non-deploy changes and two new oozie jobs
20:29 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks config for DB location split (duration: 00m 57s)
20:23 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Update DB utils to handle counts and suggestion DBs in different locations (duration: 00m 58s)
20:14 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
20:14 mholloway-shell@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/WikimediaEditorTasks: Fix: Use READ_LOCKING when evaluating whether to update targets_passed (duration: 00m 58s)
20:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
20:03 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
19:48 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
19:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
19:43 herron: removed queued wikidata notification messages for a***a@w**gm**ster.** on mx1001 to address gmail excessive volume rate limiting
19:32 jijiki: restarting pdfrender on scb1001
19:30 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
19:27 marxarelli: (resent; originally @ 1916) dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.23
19:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
19:18 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.23 (duration: 01m 45s)
19:14 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
18:48 thcipriani: restarting gerrit process
18:12 jynus: update grants on db1115 for new provisioning hosts on codfw T218336
18:10 elukey: interface::rps applied to all the mc10XX hosts - T203786
17:41 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
17:41 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
17:10 ema: fermium: /usr/local/sbin/disable_list wikimetrics T211835
16:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 SDC: Enable Wikidata federation on Commons (duration: 00m 57s)
16:38 elukey: mc20XX and mc1022 have interface::rps enabled - T203786
16:28 jforrester@deploy1001: Synchronized php-1.33.0-wmf.23/extensions/GlobalPreferences/includes/GlobalPreferencesFactory.php: Hot-fix T219380 GlobalPreferences: Allow modifiedPrefs to be set even if no UI control (duration: 00m 58s)
16:18 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Use feature flag for enabling depicts in UW (duration: 00m 57s)
16:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Add feature flag for enabling depicts in UW (duration: 00m 57s)
15:56 jbond42: bastion reboots complete
15:56 ariel@deploy1001: Finished deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression (duration: 00m 03s)
15:56 ariel@deploy1001: Started deploy [dumps/dumps@88ddd76]: ability to use lbzip2 for meta-history compression
15:44 jbond42: rebooting bast2001.wikimedia.org in 5 minutes
15:44 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
15:42 jbond42: rebooting bast2002.wikimedia.org in 5 minutes
15:38 jbond42: rebooting bast1002.wikimedia.org in 5 minutes
15:34 jbond42: rebooting bast4002.wikimedia.org in 5 minutes
15:30 jbond42: rebooting bast5001.wikimedia.org in 5 minutes
15:24 jbond42: rebooting iron.wikimedia.org in 5 minutes
15:22 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
15:21 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
15:19 elukey: slowly rolling out interface::rps to all the mcXXXX nodes - T203786
14:52 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
14:45 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
14:44 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
14:13 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
14:12 marostegui: Sanitize hywwiki on db1124:3313 T212625
14:11 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
14:05 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/498417
13:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:11 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 57s)
12:42 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 00m 58s)
12:41 Amir1: scap sync-file dblists
12:30 Amir1: mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=mediawikiwiki hyw wikipedia hywwiki hyw.wikipedia.org
12:25 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:23 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:15 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
11:47 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
11:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
11:37 mdholloway: created wikimedia_editor_tasks_entity_description_exists table on testwikidatawiki
11:28 _joe_: SWAT done
11:24 oblivian@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/WikimediaEvents: SWAT: Backport Use a cookie to persist the seed for php7 a/b test to .22 T216676 (duration: 00m 58s)
11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for The Art and Feminism Edit-a-thon in Taiwan (T219113)|gerrit:498770Throttle rule for The Art and Feminism Edit-a-thon in Taiwan (T219113) (duration: 00m 59s)
11:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean the throttles up (T219311)|gerrit:499287Clean the throttles up (T219311) (duration: 00m 57s)
11:10 dcausse: elasticsearch search cluster: setting cluster.routing.allocation.disk.watermark.flood_stage to 100% on omega/psi/chi@eqiad (T219364)
11:08 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Czech editathon (T219291)|gerrit:499231Add throttle rule for Czech editathon (T219291) (duration: 00m 58s)
11:06 dcausse: elasticsearch search cluster: setting "index.blocks.read_only_allow_delete" to null on all indices in omega/psi/chi@omega (T219364)
11:04 mutante: re-enabled puppet on logstash1007 through 1011 - then on logstash*
11:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
10:57 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on cobalt to test imfile file rotation fix - T214176
10:53 mutante: enabling and running puppet on logstash1007
10:49 mutante: disabling puppet on logstash* via cumin
10:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 (duration: 00m 58s)
10:20 godog: upgrade rsyslog to 8.1903.0-3~bpo8+wmf1 on phab1001 to test imfile file rotation fix - T214176
09:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 (duration: 00m 56s)
09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1074 (duration: 00m 57s)
09:41 marostegui: Upgrade db2092
09:06 vgutierrez: puppet reenabled in acme-chief clients - T207295
09:01 marostegui: Deploy schema change on db1074, this will generate lag on labsdb hosts for s2
09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 57s)
08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 (duration: 00m 54s)
08:33 vgutierrez: disabling puppet in acme-chief clients to get rid safely of old TLS material - T207295
08:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
08:17 godog: bounce rsyslog on phab* - apache access logs stopped at ~6.30 today
08:09 godog: bounce rsyslog on cobalt - apache access logs stopped at ~6.30 today
08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 57s)
07:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 58s)
06:57 SMalyshev: depooled wdqs1005 to catch up
06:56 SMalyshev: repooled wdqs1004
06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 (duration: 00m 58s)
06:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change one parsercache key on codfw - T210725 (duration: 00m 57s)
05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 01m 10s)
00:56 SMalyshev: depooled wdq1004 to catch up
00:55 SMalyshev: repooled wdq1006

2019-03-26

23:37 SMalyshev: repooled wdqs2003
23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: T216206 : sync noop labs config: Actually load WBCS-Lexeme extension before trying to use it (duration: 00m 57s)
22:12 gehel: freezing and unfreezing writes to elasticsearch codfw
21:47 SMalyshev: depool wdq2003 to catch it up
21:32 ebernhardson: manually thaw search.svc.codfw.wmnet:9643
21:31 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikimediaEditorTasks on testwikidatawiki (duration: 00m 57s)
21:22 mdholloway: created new db tables for WikimediaEditorTasks in x1
21:00 SMalyshev: depooled wdqs1006 to see if it'd catch up better
20:19 marxarelli: correction: group0 to 1.33.0-wmf.23
20:15 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.0
20:08 ejegg: updated payments-wiki from f42910460b to 6661655e37
19:58 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache (duration: 37m 59s)
19:20 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.23 and rebuild l10n cache
19:18 marxarelli: scap clean failure due to T218783. train is rolling without cleanup
19:17 jynus: reloading db2095 mariadb instances to reload and check filters
19:13 jynus: reloading db2094 mariadb instances to reload and check filters
19:07 dduvall@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 10s)
19:04 jynus: reloading db1125 mariadb instances to reload and check filters
18:49 marxarelli: branch 1.33.0-wmf.23 was cut successfully (T206677)
18:24 jynus: reloading db1124 mariadb instances to reload and check filters
18:21 marxarelli: starting branch cut for 1.33.0-wmf.23 (T206677)
18:09 thcipriani: gerrit back on version 2.15.12, upgrade complete.
18:05 thcipriani: restarting gerrit on cobalt for update to 2.15.12
18:05 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
18:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt (duration: 00m 15s)
18:04 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on cobalt
18:03 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s)
18:03 thcipriani@deploy1001: Started deploy [gerrit/gerrit@d3d2134]: Gerrit to 2.15.12 on gerrit2001 only
18:01 thcipriani: starting gerrit 2.15.12 upgrade
17:45 otto@deploy1001: scap-helm eventgate-analytics finished
17:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
17:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
17:43 otto@deploy1001: scap-helm eventgate-analytics finished
17:43 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
17:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
17:41 otto@deploy1001: scap-helm eventgate-analytics finished
17:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:39 otto@deploy1001: scap-helm eventgate-analytics finished
17:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:38 otto@deploy1001: scap-helm eventgate-analytics finished
17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:38 arlolra: Updated Parsoid to f58c3d1 (T219023)
17:38 otto@deploy1001: scap-helm eventgate-analytics finished
17:38 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:38 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:33 otto@deploy1001: scap-helm eventgate-analytics finished
17:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:31 otto@deploy1001: scap-helm eventgate-analytics finished
17:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1 (duration: 06m 51s)
17:21 arlolra@deploy1001: Started deploy [parsoid/deploy@395a214]: Updating Parsoid to f58c3d1
17:14 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
17:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
17:12 otto@deploy1001: scap-helm eventgate-analytics finished
17:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:10 otto@deploy1001: scap-helm eventgate-analytics finished
17:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:06 otto@deploy1001: scap-helm eventgate-analytics finished
17:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:03 otto@deploy1001: scap-helm eventgate-analytics finished
17:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:59 otto@deploy1001: scap-helm eventgate-analytics finished
16:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
16:59 otto@deploy1001: scap-helm eventgate- upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-, clusters: staging]
16:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
16:58 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:57 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:57 otto@deploy1001: scap-helm eventgate-analytics finished
16:57 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:57 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:31 gilles@deploy1001: Finished deploy [performance/asoranking@9a1e5ef]: (no justification provided) (duration: 00m 52s)
16:30 gilles@deploy1001: Started deploy [performance/asoranking@9a1e5ef]: (no justification provided)
16:07 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:05 robh: decom of labtestvirt200[12] started via T218023
15:45 otto@deploy1001: scap-helm eventgate-analytics finished
15:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:44 otto@deploy1001: scap-helm eventgate-analytics finished
15:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.16 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:43 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 52 -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:43 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
15:40 otto@deploy1001: scap-helm eventgate-analytics finished
15:40 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:40 otto@deploy1001: scap-helm eventgate-analytics upgrade --help [namespace: eventgate-analytics, clusters: staging]
15:34 otto@deploy1001: scap-helm eventgate-analytics finished
15:34 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:34 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --dry-run --debug stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:32 otto@deploy1001: scap-helm eventgate-analytics finished
15:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:31 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
15:31 otto@deploy1001: scap-helm eventgate-analytics finished
15:31 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:31 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:27 otto@deploy1001: scap-helm eventgate-analytics finished
15:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:20 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
15:20 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
15:08 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
15:07 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
14:01 jbond42: rolling update of passenger on puppet masters
13:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:06 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:04 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:58 gehel@cumin2001: START - Cookbook sre.elasticsearch.e6-upgrade
11:42 Amir1: EU SWAT is done
11:40 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=wikimaniawiki --force-protocol https (T217730)
11:39 Amir1: wikiadmin@db1078.eqiad.wmnet(wikimaniawiki)> DELETE FROM sites; and site_identifiers
11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wmgWikibaseSiteGroup for wikimaniawiki (T217730)|gerrit:498440Set $wmgWikibaseSiteGroup for wikimaniawiki (T217730) (duration: 00m 49s)
11:22 elukey: temporary install ifstat on mc1022 + tmux session to log in/out bandwidth usage every 1s for T203786
11:20 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869)|gerrit:498949Throttle rule for Wikimedia Hackathon 2019 (T213869), try II (duration: 00m 49s)
11:11 ladsgroup@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for Wikimedia Hackathon 2019 (T213869)|gerrit:498949Throttle rule for Wikimedia Hackathon 2019 (T213869) (duration: 00m 51s)
10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3312 (duration: 00m 49s)
09:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3312 (duration: 00m 50s)
09:54 marostegui: Upgrade db2071
09:42 marostegui: Upgrade db2070
09:15 jijiki: Restarting pdfrender on scb1001
09:09 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1004.eqiad.wmnet
09:05 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
08:09 marostegui: Deploy schema change on s2 codfw master, this will generate lag on codfw s2
07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 (duration: 00m 49s)
06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 (duration: 00m 50s)
06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 (duration: 00m 52s)
06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 00m 51s)
06:02 marostegui: Deploy schema change on db1106, this will generate lag on s1 on labs hosts

2019-03-25

23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T219234 Turn on Elastica logging channel (duration: 00m 51s)
22:32 krinkle@deploy1001: Synchronized docroot/wikipedia.org/speed-tests/Banksy.enwiki.872156204: T185446 (duration: 00m 49s)
21:44 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part b (duration: 00m 49s)
21:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216206 Deploy WikibaseLexemeCirrusSearch: Part 1 - set up variables, sub-part a (duration: 00m 50s)
21:40 XioNoX: apply transport-in4 filter to cr1/2-eqiad - T190090
21:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218715 Enable WBCS on Testcommons too (duration: 00m 50s)
20:32 ebernhardson: T218994 set various deprecation channels on all six cirrus elasticsearch clusters to ERROR
19:54 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.common.logging.DeprecationLogger" to "ERROR" to psi/omega@eqiad (T218994)
19:48 dcausse: elasticsearch search cluster: SET "logger.org.elasticsearch.deprecation.index.query.functionscore.ScoreFunctionBuilder" to "ERROR" to chi/psi/omega@eqiad (T218994)
19:40 volans: restart icinga on icinga1001 to reset modified attributes
19:37 dcausse: morning SWAT done
19:33 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] switch all wikis to eqiad (elastic 6.5.4) (duration: 00m 50s)
19:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T192254 (duration: 00m 49s)
19:13 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T218260 (duration: 00m 49s)
19:06 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop (duration: 03m 27s)
19:02 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@1cbf290]: Roll update to mjolnir-bulk-daemon es6 handling of super_detect_noop
18:46 dcausse@deploy1001: Synchronized wmf-config/flaggedrevs.php: revert T217507 (duration: 00m 49s)
18:43 ebernhardson: restart mjolnir-kafka-msearch-daemon across cirrus elasticsearch servers
18:41 dcausse@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
18:32 dcausse@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/FlaggedRevs/: T218949: Fix reject changes when user is partially blocked (duration: 00m 51s)
18:27 dcausse@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: T192135 (duration: 00m 50s)
18:15 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: T211622: Enforce 8 char password length requirements for non-privileged users (duration: 00m 50s)
17:24 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates (duration: 10m 31s)
17:24 elukey: restart pdfrender on scb1004
17:14 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@281aaf8]: New build for blazegraph and updater plus GUI updates
17:11 ebernhardson: restart mjolnir-kafka-msearch-daemon on relforge100[12]
17:10 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T218878: [cirrus] switch low volume wikis to eqiad (elastic 6.5.4) (duration: 00m 49s)
16:56 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 52s)
16:47 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@2fb5038]: Ship new logging support code via new simplified virtualenv deployment
16:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment (duration: 09m 10s)
16:19 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ba09eb5]: Ship new logging support code via new simplified virtualenv deployment
16:19 hashar: updating Jenkins plugins and restarting
16:16 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment (duration: 02m 38s)
16:13 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@6eda7d8]: Ship new logging support code via new simplified virtualenv deployment
15:48 XioNoX: remove 2nd AS7568 router in Equinix Singapore
15:21 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code (duration: 01m 29s)
15:20 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@ddf26d0]: Ship new logging support code
15:00 jbond42: updateing passenger on rhodium
14:29 andrewbogott: updating slapd indexes on seaborgium, serpens, ldap-eqiad-replica01, ldap-eqiad-replica02 for 498396
13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
13:52 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
13:52 ema: cp1076: repool varnish-fe, frontend misses served by cp-ats T213263
13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
13:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
13:41 ema: cp1076: depool varnish-fe and point it to cp-ats T213263
13:28 mutante: planet - manually updating en version since new monitoring check warned it wasn't current (T203208)
13:17 mutante: mwmaint1002 - manually running tor_exit_node cron command and test with PHP 7.2
12:48 mutante: reloading icinga config
12:15 Lucas_WMDE: EU SWAT finished
12:08 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Move 0.1% of anonymous users to php7 T212828 (duration: 00m 49s)
12:07 moritzm: installing openssl1.0 security updates on stretch
12:00 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039)|gerrit:498814Revert "Remove $wgAbuseFilterRuntimeProfile" (T191039) (duration: 00m 51s)
11:48 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove $wgAbuseFilterRuntimeProfile (T191039)|gerrit:486470Remove $wgAbuseFilterRuntimeProfile (T191039) (duration: 00m 49s)
11:46 ema: cp-ats-codfw: upgrade trafficserver to 8.0.3-1wm1
11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Wikibase/repo: SWAT: Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907)|gerrit:498354 Revert "OutputPageBeforeHTML: do nothing for non entity pages" (T218907) (duration: 01m 06s)
11:26 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
11:23 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
11:23 godog: switch codfw prometheus from prometheus2003 to prometheus2004
11:19 ema: cp-ats-eqiad: upgrade trafficserver to 8.0.3-1wm1
11:18 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
11:16 oblivian@deploy1001: Synchronized wmf-config/LabsServices.php: switching to use of the local proxy for search in php7 (duration: 00m 50s)
11:09 ema: trafficserver 8.0.3-1wm1 uploaded to stretch-wikimedia
10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
10:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
10:40 gehel: disable deprecation warnings on elasticsearch eqiad - T218994
10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:498800 Bumping portals to master (T128546) (duration: 00m 49s)
10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:498800 Bumping portals to master (T128546) (duration: 00m 49s)
10:27 moritzm: installing Java security updates on Hadoop/Druid test cluster
10:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
10:07 moritzm: installing ntfs-3g security updates
10:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 49s)
09:42 moritzm: uploaded openjdk 8u212-b01-1~deb8u1 to apt.wikimedia.org/jessie-wikimedia/main
09:34 marostegui: Upgrade db2062
09:24 hashar: contint1001: manually compressing Zuul log files sudo -u zuul gzip --best /var/log/zuul/*.log.????-??-??
09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083+ (duration: 00m 49s)
09:18 marostegui: Upgrade db2055
09:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 (duration: 00m 49s)
09:10 mutante: contint1001 - restarting zuul
08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 (duration: 00m 49s)
08:08 vgutierrez: reenabling puppet in openldap servers
08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1118 (duration: 00m 49s)
07:58 vgutierrez: disable puppet and downtime host in icinga for labtestservices2001 - T218022
07:40 vgutierrez: disable puppet in production openldap servers before merging https://gerrit.wikimedia.org/r/498776
07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1118 (duration: 00m 49s)
06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1118 after mysql upgrade (duration: 00m 50s)
06:45 marostegui: Stop MySQL on db1118 for upgrade
06:44 marostegui: Deploy schema change on s1 codfw master, this will generate lag on codfw
06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1118 for schema change and upgrade (duration: 00m 54s)
04:31 chaomodus: restarted pdfrender on scb1003 to try to help flapping

2019-03-24

15:00 jijiki: Restart pdfrender on scb1002 and scb1004

2019-03-23

13:02 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config, take 2 (duration: 00m 50s)
12:36 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix WikimediaEditorTasks Beta Cluster DB config (duration: 00m 52s)

2019-03-22

22:13 bd808: Restarted uwsgi-striker on labweb1002
22:12 bd808: Restarted uwsgi-striker on labweb1001
20:14 otto@deploy1001: scap-helm eventgate-analytics finished
20:14 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
20:14 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
20:04 otto@deploy1001: scap-helm eventgate-analytics finished
20:04 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
19:59 ejegg: updated payments-wiki-staging from 31647bc97e to f42910460b
19:57 otto@deploy1001: scap-helm eventgate-analytics finished
19:57 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
19:57 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
19:55 otto@deploy1001: scap-helm eventgate-analytics finished
19:55 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
19:55 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
19:52 otto@deploy1001: scap-helm eventgate-analytics finished
19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf= [namespace: eventgate-analytics, clusters: staging]
19:52 otto@deploy1001: scap-helm eventgate-analytics finished
19:52 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:52 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --set main_app.extra_kafka_conf={} [namespace: eventgate-analytics, clusters: staging]
19:46 otto@deploy1001: scap-helm eventgate-analytics finished
19:46 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:46 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:39 otto@deploy1001: scap-helm eventgate-analytics finished
19:39 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:39 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:36 otto@deploy1001: scap-helm eventgate-analytics finished
19:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:36 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:53 otto@deploy1001: scap-helm eventgate-analytics finished
18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:41 krinkle@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/Collection/: I2c4f5d / T217835 (duration: 00m 52s)
18:21 otto@deploy1001: scap-helm eventgate-analytics finished
18:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:16 otto@deploy1001: scap-helm eventgate-analytics finished
18:16 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:16 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:13 tzatziki: removing 5 files for legal compliance
18:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:06 jijiki: Restart ferm on db2096
15:58 James_F: UBN hot-deploy for T218918: Only load latest revision in MessageCache::loadFromDB
15:26 gehel: restarting elasticsearch on elastic1046 for logging configuration change - T218994
14:34 mutante: scandium - apt-get remove --purge php* ; apt autoremove ; letting puppet reinstall php 7.2 one more time using mediawiki::profile::php now
14:33 gehel: upgrading to elasticsearch-curator 5.6.0 on all elasticsearch nodes (including logstash) - T218991
11:22 ema: lvs1002: bounce pybal to clear backends health icinga warning T218133
11:18 ema: lvs1005: bounce pybal to clear backends health icinga warning T218133
10:24 mutante: scandium - apt autoremove
10:20 mutante: scandium - manually removing all php* packages to let puppet reinstall 7.2 instead of 7.0
10:05 ema: cp2005: repooled, serving traffic via ATS T213263
10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
10:00 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
09:48 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
09:47 ema: cp2005: depool varnish-fe in preparation of traffic switch to ATS T213263
09:42 moritzm: rebooting pool counters in codfw to pick up SSBD-enabled qemu
09:04 elukey: start tcpdump on mc1022 to gather traffic for analysis
06:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1094 (duration: 00m 50s)
06:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 49s)
06:05 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2096 after onsite maintenance (duration: 00m 51s)
01:31 bd808: labweb: upgraded mariadb packages installed on labweb100[12]
01:19 bd808@deploy1001: Finished deploy [striker/deploy@b4bcd08]: Update python wheels (duration: 01m 00s)
01:18 bd808@deploy1001: Started deploy [striker/deploy@b4bcd08]: Update python wheels
00:54 bd808: Striker down following upgrade. scap3 did not rebuild venv as expected. Manually resolved, but not having mysql library issues.
00:47 Krinkle: krinkle@mwmaint1002 Fixing corrupt 'log_params' field of kawiki.logging row where log_id=1021367; T93110
00:36 bd808@deploy1001: Finished deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932) (duration: 01m 15s)
00:34 bd808@deploy1001: Started deploy [striker/deploy@c4726e3]: Django upgrade and various bug fixes (T192487, T182142, T176325, T217932)
00:32 James_F: SWAT done, 12 minutes ago.
00:20 jforrester@deploy1001: Finished scap: SWAT: Full scap for i18n rebuild for 498259 and 498113 (duration: 24m 49s)

2019-03-21

23:57 gtirloni: downtimed systemd check in labweb1001/1002 (T218935)
23:56 jforrester@deploy1001: Started scap: SWAT: Full scap for i18n rebuild for 498259 and 498113
23:53 gtirloni: downtimed systemd check in labwen1001 (T210818)
23:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/extensions/ContentTranslation/api/ApiQueryContentTranslationSuggestions.php: SWAT T218902 CX: Return API error on anonymous suggestions queries (duration: 00m 51s)
23:08 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT T217730 Add wikimaniawiki to another special group in Wikibase client (duration: 00m 49s)
22:33 jijiki: Restarting pdfrender on scb1003
22:26 otto@deploy1001: scap-helm eventgate-analytics finished
22:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
22:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
22:14 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
22:02 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable WikimediaEditorTasks on the Beta Cluster (duration: 00m 49s)
21:56 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Add WikimediaEditorTasks labs config to InitializeSettings-labs.php (duration: 00m 47s)
21:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add WikimediaEditorTasks default config to InitializeSettings.php (duration: 00m 49s)
21:53 jijiki: Restarting pdfrender on scb1004
21:52 mholloway-shell@deploy1001: Synchronized wmf-config/extension-list: Add WikimediaEditorTasks to extension-list (duration: 00m 50s)
21:45 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
21:39 XioNoX: Ping offload - replace test IP with text-lb.codfw IP on cr1/2-codfw - T190090
21:11 XioNoX: remove peering sessions to AS7385 on cr4-ulsfo
21:08 otto@deploy1001: scap-helm eventgate-analytics finished
21:08 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
21:08 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:55 otto@deploy1001: scap-helm eventgate-analytics finished
20:55 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:55 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:43 otto@deploy1001: scap-helm eventgate-analytics finished
20:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:29 otto@deploy1001: scap-helm eventgate-analytics finished
20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:29 otto@deploy1001: scap-helm eventgate-analytics finished
20:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:27 otto@deploy1001: scap-helm eventgate-analytics finished
20:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1006.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:26 otto@deploy1001: scap-helm eventgate-analytics finished
20:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1005.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:24 otto@deploy1001: scap-helm eventgate-analytics finished
20:24 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:24 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1004.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:23 otto@deploy1001: scap-helm eventgate-analytics finished
20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1003.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:23 otto@deploy1001: scap-helm eventgate-analytics finished
20:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:22 otto@deploy1001: scap-helm eventgate-analytics finished
20:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1001.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:21 otto@deploy1001: scap-helm eventgate-analytics finished
20:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:21 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --set main_app.kafka_broker_list=kafka-jumbo1002.eqiad.wmnet:9092 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:03 otto@deploy1001: scap-helm eventgate-analytics finished
20:03 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:03 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:45 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Disable RDF output of mediainfo Wikibase entities (duration: 00m 49s)
19:40 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213483 Read wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 50s)
19:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213483 Set default wmgWikibaseEntityTypesWithoutRdfOutput value (duration: 00m 51s)
18:49 gehel: resetting archived settings on elasticsearch cirrus eqiad - T218879
18:41 otto@deploy1001: scap-helm eventgate-analytics finished
18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:36 sbisson@deploy1001: Synchronized php-1.33.0-wmf.22/languages/Language.php: SWAT: languages: Partial revert of I8287118cf8ec01326ead9|gerrit:498116languages: Partial revert of I8287118cf8ec01326ead9 (duration: 00m 50s)
18:30 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
18:25 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable Welcome survey on viwiki|gerrit:498166Disable Welcome survey on viwiki (duration: 00m 49s)
18:23 otto@deploy1001: scap-helm eventgate-analytics finished
18:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:17 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
18:16 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable logging for CitationUsage and CitationUsagePageLoad|gerrit:496857Enable logging for CitationUsage and CitationUsagePageLoad (duration: 00m 49s)
18:13 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
18:12 otto@deploy1001: scap-helm eventgate-analytics finished
18:12 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:11 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable reader trust survey v2|gerrit:494552Disable reader trust survey v2 (duration: 00m 50s)
18:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
18:05 otto@deploy1001: scap-helm eventgate-analytics finished
18:05 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:05 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:01 otto@deploy1001: scap-helm eventgate-analytics finished
18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:01 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:56 bblack: everything back to normal for lvs1002/lvs1005 (high-traffic2 @ eqiad)
17:55 bblack: restarting pybal on lvs1002
17:54 otto@deploy1001: scap-helm eventgate-analytics finished
17:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
17:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
17:49 reedy@deploy1001: Synchronized php-1.33.0-wmf.22/includes/user/User.php: Iab2492 (duration: 00m 51s)
17:43 bblack: restarting pybal on lvs1005
17:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable EntitySourceBasedFederation on TestCommons (duration: 00m 50s)
17:37 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
17:35 bblack: disabled puppet on lvs1002 + lvs1005 for new service rollout
17:28 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
17:27 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC: Add test-commons.wikimedia.org to wgCrossSiteAJAXdomains (duration: 00m 49s)
17:11 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
17:07 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts on TestCommons, with related config (duration: 00m 50s)
17:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
17:03 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
17:02 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
17:02 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:39 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
16:38 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
16:38 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
16:38 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:38 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:29 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2096 for onsite maintenance (duration: 00m 50s)
16:01 marostegui: Poweroff db2096 for onsite maintenance T218336
15:20 moritzm: rebooting flerovium/furud for kernel updates
14:35 moritzm: restarging jenkins on releases* after Java update
14:18 gtirloni: downtimed labtestweb2001 (T218881)
14:11 vgutierrez: re-enabling puppet in acme-chief clients - T218862
14:09 arturo: T218024 disabled icinga checks for labtestweb2001
14:07 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
13:58 vgutierrez: update acme-chief to version 0.15 in acmechief1001 - T218862
13:54 vgutierrez: disabling puppet in acme-chief clients - T218862
13:48 akosiaris: reboot oresrdb2001
13:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 (duration: 00m 51s)
13:37 elukey: upgrade openjdk-8 on an-worker1080 and restarted hadoop daemons
13:28 moritzm: installing Java security updates on notebook hosts
13:22 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.22
13:18 gtirloni: downtimed cloudcontrol*, cloudservices*, labcontrol*, labweb* (T210818)
13:06 moritzm: installing Java security updates on stat hosts
12:40 arturo: T216497 remove python-cliff from jessie-wikimedia/openstack-mitaka-jessie
12:35 jijiki: Pooling mw1339 back
12:33 jijiki: Pooling mw1290 back
12:08 arturo: T216497 add python-cliff to jessie-wikimedia/openstack-mitaka-jessie
12:02 vgutierrez: uploaded acme-chief 0.15 to apt.wikimedia.org (buster) - T218862
11:54 elukey: restart yarn node managers on an-worker10[82,89,92] - shutdown after a long yarn failover and only now downtime is expired
11:36 mutante: gerrit2001 (not the master prod server)- scheduled downtime and rebooting for upgrade
11:04 zeljkof: EU SWAT finished
11:04 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for LMU Edit-a-thon (T217929) (duration: 00m 57s)
10:57 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
10:52 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
10:46 elukey: restart hadoop yarn resource managers on an-master100[1,2] to pick up new settings
10:23 moritzm: rebooting labtestcontrol2001 for kernel update
10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 (duration: 00m 56s)
09:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 58s)
09:42 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=cxserver,cluster=scb,name=scb.*
09:42 jijiki: Depool scb* in codfw from serving cxserver, finishing its migration to k8s - T213195
09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 after mysql upgrade (duration: 00m 56s)
09:27 moritzm: rolling reboot of maps servers in codfw for kernel update
09:17 marostegui: Upgrade and reboot db1086
08:53 marostegui: Upgrade db1086
08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 for upgrade (duration: 00m 56s)
08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 (duration: 00m 57s)
08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 56s)
08:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1079 (duration: 00m 56s)
08:01 vgutierrez: deploying directory based certificates in acme-chief clients - T207295
07:35 _joe_: rolling restart of php-fpm to pick up some changes
07:34 marostegui: Deploy schema change on db1079, this will generate lag on labsdb:s8
07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 57s)
07:03 elukey: restart pdfrender on scb1002
06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3317 (duration: 00m 56s)
06:24 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to get dbproxy1011 back
06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3317 (duration: 01m 10s)
06:12 marostegui: Upgrade and reboot dbproxy1011
06:04 marostegui: Run wmcs-wikireplica-dns on cloudcontrol1003 to drain dbproxy1011
00:09 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/includes/parser/BlockLevelPass.php: SAT T218817 Unbreak parser line counting for long wikitext pages I22eebb70a I55a2c4c0 I41a45266d (duration: 00m 56s)
00:08 twentyafterfour: deploying phabricator upgrade
00:01 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Move FundraisingTranslateWorkflow load to after Translate I73452ae8 (duration: 00m 56s)

2019-03-20

23:49 jforrester@deploy1001: Synchronized php-1.33.0-wmf.22/resources/lib/ooui/oojs-ui-core.js: SWAT T218722 T218830 Bring forward UBN OOUI fix (duration: 00m 57s)
23:28 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497948/ (duration: 00m 56s)
23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/490648/ (duration: 00m 56s)
22:29 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214075 Enable federation of Wikidata items and properties on Test Commons (duration: 00m 57s)
21:37 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr1/2-codfw - T190090
21:34 XioNoX: apply transit-in4 term offload-ping4 with test IP to cr2-codfw
21:00 XioNoX: apply icmp redirect on cr1-codfw:xe-5/0/2 (to cr4-ulsfo) for test IP 208.80.154.225 - T190090
20:24 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.22 (duration: 01m 46s)
20:23 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.22
20:13 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
20:13 otto@deploy1001: scap-helm eventgate-analytics finished
20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
20:13 otto@deploy1001: scap-helm eventgate-analytics finished
20:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
20:13 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
20:07 otto@deploy1001: scap-helm eventgate-analytics finished
20:07 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:07 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:38 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.22
19:13 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
19:13 otto@deploy1001: scap-helm eventgate-analytics finished
19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
19:04 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache (duration: 38m 29s)
18:50 jijiki: restarting pdfrender on scb1003
18:49 ottomata: hitting eventgate-analytics in eqiad with ab
18:39 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
18:39 otto@deploy1001: scap-helm eventgate-analytics finished
18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
18:39 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
18:39 otto@deploy1001: scap-helm eventgate-analytics finished
18:39 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
18:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
18:37 otto@deploy1001: scap-helm eventgate-analytics finished
18:37 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
18:26 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.22 and rebuild l10n cache
16:44 XioNoX: disable lldp on asw2-a-eqiad:ge-8/0/10
16:25 chasemp: mkdir /srv/dumps/xmldatadumps/public/other/rook for T218587 (fyi apergos)
15:55 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
15:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
15:35 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
15:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 (duration: 00m 50s)
15:33 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
15:24 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
15:24 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
15:23 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
15:23 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
15:22 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust account stuff at wikitech 4adc89bce4 (duration: 00m 48s)
15:20 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=0)
15:20 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
15:10 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
15:09 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
15:09 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=97)
15:08 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
14:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 00m 56s)
14:35 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 03s)
14:02 moritzm: rebooting oresrdb2002 for kernel update
13:48 godog: take a snapshot of prometheus data on prometheus1004
13:44 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 05s)
13:37 zfilipin@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 00m 08s)
13:29 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
13:29 otto@deploy1001: scap-helm eventgate-analytics finished
13:29 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
11:51 akosiaris: re-enable puppet across fleet
11:45 Amir1: EU SWAT is done
11:44 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add wikimania as a special group to wikidata sitelinks (T217730) (duration: 00m 50s)
11:40 ladsgroup@deploy1001: Synchronized dblists/wikidataclient.dblist: SWAT: Add wikimaniawiki to wikidataclient.dblist (T217730) (duration: 00m 50s)
11:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Advanced Mobile Contributions mode for ar,id,es and test wikis (T217643) (duration: 00m 50s)
11:34 akosiaris" disable puppet across fleet to avoid alert spam storm
11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Partially revert "Enable musical notation datatype in wikidata" (T218535) (duration: 00m 50s)
11:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Increased maxSerializedEntitySize from 2500 to 3000 (T217739) (duration: 01m 47s)
11:03 akosiaris: restart gerrit for testing https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497727/
10:28 akosiaris: restart gerrit for merge of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497561/
10:26 godog: reimage prometheus1003 with stretch - T205870
10:20 marostegui: Repool dbproxy1010 and running wmcs-wikireplica-dns script
10:12 marostegui: Reboot dbproxy1010 for upgrade
09:45 vgutierrez: updated acme-chief to version 0.14 in acmechief[12]001
09:32 marostegui: Deploy schema change on s7 codfw master, lag will appear on codfw
09:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 48s)
08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 48s)
08:55 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1003.eqiad.wmnet
08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 48s)
08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 48s)
08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1092 (duration: 00m 48s)
08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 48s)
08:20 ema: cp2009, cp1071 (cp-ats): reboot for kernel upgrades
07:32 elukey: pool kafka1001 in pybal's eventbus service after yesterday's network maintenance
06:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool databases in row A - T187960 (duration: 00m 49s)
00:48 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/includes/Title.php: SWAT: Improve Caching in Title::loadRestrictions() (duration: 00m 51s)

2019-03-19

22:20 otto@deploy1001: Finished deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas (duration: 00m 04s)
22:20 otto@deploy1001: Started deploy [eventlogging/analytics@9aea626]: fix for production error where mw api is returning html instead of json schemas
21:50 otto@deploy1001: scap-helm eventgate-analytics finished
21:50 otto@deploy1001; scap-helm eventgate-analytics cluster eqiad completed
21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
21:50 otto@deploy1001: scap-helm eventgate-analytics finished
21:50 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
21:50 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
21:36 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
21:36 otto@deploy1001: scap-helm eventgate-analytics finished
21:36 otto@deploy1001; scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
21:23 otto@deploy1001: scap-helm eventgate-analytics finished
21:23 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
21:23 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
21:07 cdanis: cdanis@wikitech-static.wikimedia.org: apt install sshguard
21:06 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
21:06 otto@deploy1001: scap-helm eventgate-analytics finished
21:06 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:58 XioNoX: disable down ports with no description on switches
20:44 cdanis: enabling puppet on contint1001
19:54 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
19:52 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
19:47 XioNoX: disable asw2-a<->asw-a link
19:44 cdanis: icinga failed over to icinga1001 successfully
19:43 XioNoX: remove forced failover on cr1/cr2-eqiad
19:36 cdanis: failing over icinga to icinga1001
19:35 XioNoX: enable cr2-eqiad:ae1
19:29 ariel@deploy1001: Finished deploy [dumps/dumps@da66149]: move maxretries to config (duration: 00m 03s)
19:29 ariel@deploy1001: Started deploy [dumps/dumps@da66149]: move maxretries to config
19:09 ejegg: updated CiviCRM from a2316be94f to 3bfc7a762e
19:09 gtirloni: rebooted labmon1001
19:02 XioNoX: disable cr2-eqiad:ae1
18:46 XioNoX: failover cr2-eqiad:ae1 VRRP master to cr1
18:17 XioNoX: starting pybal on lvs1002
18:11 XioNoX: stopping pybal on lvs1002
18:09 XioNoX: starting pybal on lvs1001
18:01 XioNoX: stopping pybal on lvs1001
18:01 jijiki: restart pdfrender on scb1003
17:56 XioNoX: shutdown scp1001 for uplink move
17:47 Lucas_WMDE: Updated the Wikidata property suggester with data from last Monday's JSON dump and applied the T132839 workarounds (T216270)
17:33 hasharAway: contint1001 / CI going for a quick scheduled maintenance -network cable being moved-
17:33 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 01m 50s)
17:31 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
17:30 mdholloway: mobileapps deploy failed for group default3, retrying
17:24 tzatziki: changing email for User:St3f
17:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0 (duration: 03m 47s)
17:16 addshore: started "foreachwikiindblist wiktionary extensions/Cognate/maintenance/populateCognatePages.php --batch-size 1000" in a screen on mwdebug1002 (catching up cognate after x1 readonly time)
17:14 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: T150377: Bump wikimedia-page-library to 6.3.0
16:45 vgutierrez: uploaded acme-chief 0.14 to apt.wikimedia.org (buster) - T218685 T218418 T207295
16:30 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
16:29 elukey: stop eventlogging's mysql kafka consumers on eventlog1002, eventlogging's db replication on db1108 to ease db1107's maintenance
16:15 bstorm_: downtimed labstore1003 for network moves so it doesn't page
16:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:10 otto@deploy1001: scap-helm eventgate-analytics finished
16:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:08 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org,service=pdns_recursor
16:02 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org,service=pdns_recursor
16:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2 (duration: 21m 01s)
15:58 tzatziki: changing password for User:St3f
15:57 XioNoX: enable pybal on lvs1006
15:55 XioNoX; disable pybal on lvs1006
15:54 XioNoX: enable pybal on lvs1005
15:52 XioNoX: disable pybal on lvs1005
15:50 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:50 otto@deploy1001: scap-helm eventgate-analytics finished
15:50 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:49 XioNoX: enable pybal on lvs1004
15:45 XioNoX: disable pybal on lvs1004
15:40 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3, take #2
15:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3 (duration: 12m 27s)
15:28 mobrovac@deploy1001: Started deploy [restbase/deploy@62df8c3]: Update the docs page title; deploy v0.19.3
15:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 read only OFF - T187960 (duration: 00m 26s)
15:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s2 database master on read only - T187960 (duration: 00m 48s)
15:12 XioNoX: eqiad A7 servers uplink move - T187960
14:46 moritzm: rebooting icinga1001 for kernel update
14:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool databases in row A - T187960 (duration: 00m 48s)
14:41 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Reapply I49a18d from gerrit for consistency (duration: 00m 49s)
14:32 otto@deploy1001: scap-helm eventgate-analytics finished
14:32 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
14:32 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
14:32 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
14:32 otto@deploy1001: scap-helm eventgate-analytics finished
14:31 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
14:31 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
14:28 otto@deploy1001: scap-helm eventgate-analytics finished
14:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
14:28 <otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
14:10 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
14:10 otto@deploy1001: scap-helm eventgate-analytics finished
14:10 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
13:19 akosiaris: start zuul/zuul-merger
13:12 akosiaris: unfirewall gerrit, put service back in action
11:31 moritzm: installing php5 security updates
09:08 akosiaris: start nagios-nrpe-server on proton1002, failed due to fork() failed with error 12, bailing out...
07:25 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
07:20 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: Temporarily disable account creation on wikitech (duration: 00m 51s)
06:47 akosiaris: stop zuul and zuul-merger on contint1001
03:45 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)
02:12 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/ApiJsonSchema.php: If280a4056a (duration: 00m 48s)
02:11 krinkle@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/EventLogging/includes/RemoteSchema.php: If280a4056a (duration: 00m 51s)
00:14 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/tests/phpunit/includes/: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 01m 00s)
00:12 reedy@deploy1001: Synchronized php-1.33.0-wmf.21/includes/user/User.php: Replace wgUser with RequestContext::getUser in User::getBlockedStatus (duration: 00m 49s)

2019-03-18

23:54 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 49s)
23:45 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494551/ (duration: 00m 48s)
23:33 maxsem@deploy1001: Synchronized php-1.33.0-wmf.21/includes/EditPage.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/497347/ (duration: 00m 49s)
23:25 twentyafterfour: running puppet on phab1001 to get out of degraded state
23:23 XioNoX: renumber Telia transit in eqsin
23:14 maxsem@deploy1001> Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/497317/ (duration: 00m 49s)
23:07 maxsem@deploy1001> Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/496515/ (duration: 00m 48s)
22:18 greg-g: gjg@phab1001:~$ sudo /srv/phab/phabricator/bin/auth strip --all-types --user Barras # per request/verification from foks
19:57 bawolff@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable block disables login on wikitech (duration: 00m 48s)
19:56 bawolff@deploy1001: Synchronized wmf-config/wikitech.php: Adjust ldap config (duration: 00m 48s)
16:17 volans: restarting pdfrender on scb1003
16:15 volans: restarting pdfrender on scb1004
15:48 jiji@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=cxserver,cluster=scb,name=scb.*
15:45 jijiki: Depool sbc* from serving cxserver on eqiad - T213195
15:06 papaul: shutting down mw2206 for memtest
14:47 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.e6-upgrade (exit_code=99)
14:46 gehel@cumin1001: START - Cookbook sre.elasticsearch.e6-upgrade
14:13 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2003.codfw.wmnet
13:42 ema: cp-ats rolling restart to apply proxy.config.cache.ram_cache.size config change T213263
13:23 mvolz@deploy1001: scap-helm citoid finished
13:22 mvolz@deploy1001: scap-helm citoid cluster codfw completed
13:22 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-codfw-values.yaml stable/citoid [namespace: citoid, clusters: codfw]
13:18 mvolz@deploy1001: scap-helm citoid finished
13:18 mvolz@deploy1001: scap-helm citoid cluster eqiad completed
13:17 mvolz@deploy1001: scap-helm citoid upgrade production -f citoid-eqiad-values.yaml stable/citoid [namespace: citoid, clusters: eqiad]
13:04 arturo: T218022 disable icinga checks for labtestservices2001.wikimedia.org
12:54 arturo: T218025 disable icinga checks for cloudnet2001-dev.codfw.wmnet
12:49 mvolz@deploy1001: scap-helm citoid finished
12:49 mvolz@deploy1001: scap-helm citoid cluster staging completed
12:49 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
12:48 mvolz@deploy1001: scap-helm citoid upgrade staging -f citoid-values-staging.yaml stable/citoid [namespace: citoid, clusters: staging]
11:45 zeljkof: EU SWAT finished
11:45 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable mobile section editing on bnwiki, hewiki, zh_yuewiki (T218375)|gerrit:496696Enable mobile section editing on bnwiki, hewiki, zh_yuewiki (T218375) (duration: 00m 50s)
10:51 _joe_: testing safety checks for php-fpm on mwdebug2001
10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:497261 Bumping portals to master (T128546) (duration: 00m 48s)
10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:497261 Bumping portals to master (T128546) (duration: 00m 49s)
10:12 vgutierrez: uploaded acme-chief 0.12 to apt.wikimedia.org (buster) - T218543
10:12 volans: restarted irc echo on icinga2001
10:04 _joe_: hot-patching the error in php7.2-fpm config
10:02 volans: running puppet on hosts matching 'C:php::fpm' to apply I004349
10:00 volans: running puppet on failed hosts
09:57 volans: temporarily stop ircecho to avoid spam
09:40 ema: superior-cache-analyzer_3.3.7 uploaded to stretch-wikimedia T213263
09:29 godog: switch to mpm_event for prometheus apache before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/496750
08:58 vgutierrez: uploaded acme-chief 0.11 to apt.wikimedia.org (buster) - T207295
08:52 moritzm: restarting ferm on sessionstore, was stuck in resolving one of the -a records, which were only merged in a subsequent step (T215883)
08:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 (duration: 00m 48s)
08:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 48s)
08:34 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
08:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
08:31 ema: cp2002: repool varnish-fe to resume ATS testing T213263
08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 (duration: 00m 48s)
08:22 moritzm: armed keyholder on neodymium
07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
07:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1101 (duration: 00m 48s)
07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 (duration: 00m 49s)
07:02 marostegui: Stop db1101 to upgrade mysql and kernel
07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 (duration: 00m 48s)
06:33 marostegui: Deploy schema change on s8 codfw master (db2045), this will generate lag on s8 codfw
06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 48s)
06:08 marostegui: Deploy schema change on x1 master (db1069) with replication - T218397
06:04 marostegui: Deploy schema change on db1121 - lag will appear on labsdb:s4
06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 01m 04s)
03:58 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T218279)
02:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T218279)

2019-03-17

11:51 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=labswiki --force --sysop Ladsgroup
08:49 elukey: restart pdfrender on scb1004

2019-03-16

10:00 chasemp: stop apache on cobalt for maintenance
00:19 andrewbogott: restarting slapd on seaborgium

2019-03-15

22:37 shdubsh: temporarily stop ircecho on icinga2001
18:00 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend: SWAT: iOS: Fix mobile editor|gerrit:496827iOS: Fix mobile editor T218069 T218062 T218352 T211490 T218062 T211491 T172877 (duration: 00m 54s)
17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
17:53 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
17:53 ema: depool cp2002's varnish-fe for the weekend T213263#5027366
17:25 arturo: acmechief2001 - armed keyholder
17:22 arturo: cumin2001 - armed keyholder
17:21 andrewbogott: updating puppet compiler facts
17:13 mutante: netmon2001 - armed keyholder for rancid
17:12 mutante: netmon1002 - armed keyholder for rancid
17:04 arturo: arm keyholder in deploy2001
17:03 arturo: arm keyholder in sarin
17:02 arturo: arm keyholder in labpuppetmaster1002
17:01 arturo: arm keyholder in deploy101
17:00 XioNoX: clean up rigel switch port
17:00 arturo: arm keyholder in acmechief1001
16:58 arturo: arming keyholder in cumin1001
16:09 moritzm: upgrading deployment-deploy01 to component/php72
15:59 akosiaris: puppetmaster1001 rm /var/run/confd-template/.citoid*.err to remove old stale confd files that resulted from merging https://gerrit.wikimedia.org/r/494213
15:54 moritzm: rebooting labtestservices2003 for kernel update
15:47 andrewbogott: enabling puppet on seaborgium to apply new acme cert
15:47 moritzm: rebooting labtestservices2002 for kernel update
15:42 moritzm: rebooting labtestcontrol2003 for kernel update
15:38 moritzm: rebooting labtestnet2002 for kernel update
15:11 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
15:10 ema: cp2015: repool ATS with proxy.config.cache.ram_cache.size 1G T213263
15:07 moritzm: rebooting graphite2003 for kernel security update
15:05 ema@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,cluster=cache_upload,name=cp2015.codfw.wmnet
15:04 ema: cp2015: test ATS depool T213263
14:45 mutante: tools tools-sgebastion-07 - dpkg-reconfigure locales and adding ko_KR.EUC-KR for Korean users by request and as done in the past on former tools bastion
14:43 moritzm: rebooting etherpad1001 to pick up SSBD-enabled qemu
14:31 mutante: tools-sgebastion-07 - generating locales for user request in T130532
13:50 moritzm: rolling reboot of ores in codfw for SSBD/L1TF kernel update
13:47 akosiaris@deploy1001: scap-helm cxserver finished
13:47 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
13:47 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
11:16 godog: reenable prometheus@k8s on prometheus2004 with mod_proxy connection limits - T217715
10:31 akosiaris: add a 10s bucket to cxserver prometheus-statsd exporter mappings
10:31 akosiaris@deploy1001: scap-helm cxserver finished
10:31 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
10:31 akosiaris@deploy1001: scap-helm cxserver finished
10:31 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
10:31 akosiaris@deploy1001: scap-helm cxserver finished
10:31 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
10:31 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
10:30 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/citoid [namespace: cxserver, clusters: staging]
10:03 akosiaris@deploy1001: scap-helm citoid finished
10:03 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
10:03 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
10:03 akosiaris@deploy1001: scap-helm citoid finished
10:02 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
10:02 akosiaris: add a 10s bucket to citoid prometheus-statsd exporter mappings
10:02 akosiaris: remove prometheus-statsd-exporter from zotero pods
10:02 akosiaris@deploy1001: scap-helm citoid finished
10:02 akosiaris@deploy1001: scap-helm citoid cluster staging completed
10:02 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
10:01 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-values-staging.yaml staging stable/citoid [namespace: citoid, clusters: staging]
10:00 akosiaris@deploy1001: scap-helm zotero finished
10:00 akosiaris@deploy1001: scap-helm zotero cluster staging completed
10:00 akosiaris@deploy1001: scap-helm zotero upgrade --install -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
09:58 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml staging stable/zotero [namespace: zotero, clusters: staging]
09:53 akosiaris@deploy1001: scap-helm zotero finished
09:53 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
09:53 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
09:53 akosiaris@deploy1001: scap-helm zotero finished
09:53 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
09:52 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
09:42 godog: bounce grafana-server on grafana1001
09:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103 (duration: 00m 50s)
09:28 godog: correction, prometheus2004
09:27 godog: temporarily disable read queries to prometheus@k8s on prometheus2003
09:19 jiji@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
09:18 jiji@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
09:17 jijiki: Ramp up cxserver k8s traffic to 50% - T213195
08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 (duration: 00m 50s)
08:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 00m 47s)
08:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 49s)
07:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1091 (duration: 00m 49s)
07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
07:01 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
06:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1091 (duration: 00m 48s)
06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1091 (duration: 00m 48s)
06:04 marostegui: Upgrade db1091
06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 50s)
04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
01:25 ejegg: re-enabled ingenico audit parser
01:25 ejegg: updated fundraising CiviCRM from 41efa14fb0 to a2316be94f

2019-03-14

22:54 ejegg: temporarily disabled Ingenico WX audit parsing
22:05 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart icinga.service
21:58 cdanis: cdanis@icinga2001.wikimedia.org ~ % sudo systemctl restart nsca.service
21:01 crusnov@deploy1001: Finished deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script (duration: 00m 56s)
21:00 crusnov@deploy1001: Started deploy [netbox/deploy@090a0c3]: Another minor bugfix releaes for ganeti-netbox script
20:26 thcipriani: gerrit live on 2.15.11
20:24 thcipriani: restarting gerrit for 2.15.11
20:23 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 02s)
20:23 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
20:22 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 04s)
20:22 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
20:17 ejegg: updated CiviCRM from b4e3cf16cc to 41efa14fb0
20:17 thcipriani: gerrit back to 2.15.8
20:15 thcipriani: restart gerrit on cobalt
20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt (duration: 00m 07s)
20:14 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on cobalt
20:14 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 10s)
20:13 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Revert Gerrit to 2.15.11 on gerrit2001 only
20:13 bstorm_: Placed labstore1006 back in rotation for NFS and rsync
20:11 crusnov@deploy1001: Finished deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script (duration: 00m 54s)
20:10 crusnov@deploy1001: Started deploy [netbox/deploy@c6cf7d6]: Minor bugfix releaes for ganeti-netbox script
20:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/extension.json: Hot-deploy I19414dc31 to fix dependencies on mw.Uri (duration: 00m 49s)
19:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-esams - T209989
19:25 XioNoX: merged Juniper BFD Icinga check
19:12 thcipriani: gerrit back up
19:08 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
19:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt (duration: 00m 11s)
19:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on cobalt
19:05 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 11s)
19:05 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2bc8af0]: Gerrit to 2.15.11 on gerrit2001 only
19:02 XioNoX: set protocols bgp group Anycast4 multihop ttl 193 on cr1/2-eqiad - T209989
18:53 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: SWAT Another deprecation fix via I4936d0ce03 (duration: 00m 49s)
18:37 XioNoX: set protocols bgp group Anycast4 multihop ttl 190 on cr1-codfw - T209989
18:31 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T216730 Enable musical notation datatype on Wikidata (duration: 00m 48s)
18:29 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/modules/help/: SWAT Ib13cf88d GrowthExperiments log fix for closes (duration: 00m 49s)
18:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT T217436 Add default user config for rollback confirmation (duration: 00m 48s)
18:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T217436 Set up exceptions for rollback confirmation (duration: 00m 49s)
18:08 tzatziki: change email for KStineRowe (WMF) on officewiki, collabwiki, SUL
18:05 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade (duration: 19m 40s)
17:59 jforrester@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/ParsoidBatchAPI/includes/ApiParsoidBatch.php: Hot-deploy I2842dfea to reduce deprecation spam after T206675 deploy of wmf.21 (duration: 00m 49s)
17:45 mforns@deploy1001: Started deploy [analytics/aqs/deploy@13203f1]: Deploying AQS for node10 upgrade
17:43 mforns: Deploying AQS using scap (node10 upgrade)
17:32 arlolra: Updated Parsoid to f3e2209 (T213950)
17:24 arlolra@deploy1001: Finished deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209 (duration: 07m 09s)
17:17 arlolra@deploy1001: Started deploy [parsoid/deploy@8cf4107]: Updating Parsoid to f3e2209
17:15 jijiki: Pool mw1280 back - T218006
17:12 jijiki: Depool mw2206 - T215415
16:51 otto@deploy1001: scap-helm eventgate-analytics finished
16:51 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:51 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:50 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229 (duration: 00m 50s)
16:49 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 (netmon1002) - T215229
16:46 crusnov@deploy1001: Finished deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229 (duration: 00m 30s)
16:45 crusnov@deploy1001: Started deploy [netbox/deploy@59430dd]: Deploy Ganeti Sync and Upgrade to upstream v2.5.8 - T215229
16:32 XioNoX: add default deny to mr1-* junos-host policies - T218234
16:30 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/lib/includes/Store/Sql/TermSqlIndex.php: gerrit:496481 TermSqlIndex, track calls to getTermsOfEntities (duration: 00m 50s)
16:22 otto@deploy1001: scap-helm eventgate-analytics finished
16:22 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:22 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:08 arturo: reimaging cloudvirt1015 again
16:04 akosiaris: reboot one final time all sessionstore[12]00[123] servers
16:02 arturo: T216497 drop python-dogpile.cache from jessie-wikimedia/openstack-mitaka-jessie
14:57 marostegui: Start replication on db2070 after testing url_notes
14:53 mutante: analytics-tool1003 - stopping idle screen session
14:43 marostegui: Stop replication on db2070 to test the url_notes (will alert only on IRC)
14:21 otto@deploy1001: scap-helm eventgate-analytics finished
14:21 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
14:21 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --set main_app.version=v1.0.3-wmf0 stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
14:09 otto@deploy1001: scap-helm eventgate-analytics finished
14:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
14:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
13:54 godog: take a snapshot of data on prometheus2004
13:50 arturo: reimaging cloudvirt1015
13:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1081 into API (duration: 00m 48s)
13:15 arturo: T216497 drop libpulse0 from jessie-wikimedia/openstack-mtiaka-jessie
13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 into API (duration: 00m 49s)
13:10 arturo: T216497 drop python-mysqldb from jessie-wikimedia/openstack-mtiaka-jessie
13:10 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.21
12:50 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
12:49 jiji@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=cxserver,cluster=scb,name=kubernetes.*
12:42 jijiki: Rump up k8s cxserver traffic to 8% - T213195
12:22 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
12:21 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=cxserver,cluster=scb,name=kubernetes.*
12:17 jijiki: Send ~4% of cxserver traffic to eqiad k8s - T213195
12:14 zeljkof: EU SWAT finished
12:13 kartik@deploy1001: Synchronized wmf-config: SWAT: gerrit:496418 Revert "Correct the enable context detection configuration" (duration: 00m 56s)
12:12 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: qemu-XXX
12:06 arturo: T216497 drop some packages from jessie-wikimedia/openstack-mtiaka-jessie: libvirt*, librados2, librbd1, because they induce the resolver to conflict with those included in stretch
12:02 kartik@deploy1001: Synchronized wmf-config: SWAT: Revert gerrit:496412 Fix content detection config (duration: 00m 56s)
11:58 kartik@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
{{safesubst:SAL entry|1=11:45 kartik@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue: SWAT: [[gerrit:496364|Ensure page-actions icons are `display:block` (T218182) (duration: 00m 57s)}}
11:15 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:493672 Enable ExternalGuidance to all Wikipedias (T216129) (duration: 00m 57s)
10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 57s)
10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
10:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
10:50 ema: cp2002: pool varnish-fe to resume ATS testing T213263
10:44 moritzm: installing libsdl1.2 security updates for jessie
10:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 58s)
09:54 hashar: ci: live hacked job https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/ in attempt to capture 'core' files from hhvm | https://gerrit.wikimedia.org/r/#/c/integration/config/+/496392/ | T216689
09:02 mutante: ms-be2037 - down since a couple hours, no SAL or ticket, powercycling
08:44 marostegui: Deploy schema change on s4 codfw master (db2051), this will generate lag on codfw
08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1088 (duration: 00m 53s)
08:21 marostegui: Upgrade s3 codfw master (db2043) there will be lag on s3 codfw
08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1088 (duration: 00m 55s)
07:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1088 (duration: 00m 55s)
07:48 akosiaris@deploy1001: scap-helm cxserver finished
07:48 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
07:48 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
07:42 marostegui: Upgrade db1088
07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 (duration: 00m 54s)
07:22 kartik@deploy1001: Finished deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386) (duration: 03m 50s)
07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1098 (duration: 00m 55s)
07:18 kartik@deploy1001: Started deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386)
07:16 akosiaris@deploy1001: scap-helm cxserver finished
07:16 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
07:16 akosiaris@deploy1001: scap-helm cxserver finished
07:16 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
07:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
07:16 akosiaris@deploy1001: scap-helm cxserver finished
07:15 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
07:15 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 55s)
06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098 (duration: 00m 54s)
06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 55s)
06:50 marostegui@deploy1001: sync-file aborted: More traffic to db1097 (duration: 00m 00s)
06:46 akosiaris@deploy1001: scap-helm cxserver finished
06:46 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
06:46 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
06:40 marostegui: Upgrade mysql on dbstore2002
06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1098:3317 (duration: 00m 55s)
06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1098:3317 (duration: 00m 55s)
06:08 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
06:04 marostegui: Upgrade MySQL on db1098
06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 (duration: 00m 56s)
04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
01:39 ejegg: updated fundraising CiviCRM from 5c45e4c24d to b4e3cf16cc

2019-03-13

23:48 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/skins/MinervaNeue/: Remove unnecessary parameter from getHistoryPageAction (duration: 00m 56s)
23:45 catrope@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: Fix builder class definition for WBCS (duration: 00m 56s)
23:41 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/: Fix animation when visual section editing enabled on mobile only (T218167) (duration: 00m 58s)
23:39 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/WikibaseCirrusSearch/: Fix hook return values (duration: 00m 58s)
23:30 catrope@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/GrowthExperiments/: Instrumentation fixes (T217802) (duration: 00m 57s)
22:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group0 wikis until we solve T218268 (duration: 00m 56s)
21:11 otto@deploy1001: scap-helm eventgate-analytics finished
21:11 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
21:11 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
21:10 otto@deploy1001: scap-helm eventgate-analytics finished
21:10 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
21:09 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
20:58 otto@deploy1001: scap-helm eventgate-analytics finished
20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
20:58 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
20:35 arlolra@deploy1001: Finished deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b (duration: 06m 38s)
20:28 arlolra@deploy1001: Started deploy [parsoid/deploy@e2e44bc]: Updating Parsoid to ea80d1b
20:25 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 03m 35s)
20:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disabling api-request logging to eventgate-analytics for group1 wikis to investigate possible outage (duration: 00m 56s)
20:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262)
20:14 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5f8e4e6]: Update mobileapps to 5865552 (7074964 d6dc3cd fbc6262) (duration: 01m 49s)
20:03 herron: increased index.mapping.total_fields.limit to 1350 on index logstash-2019.03.13
19:46 jijiki: Pooling mw2206 - T215415
19:26 herron: performing rolling restart of eqiad logstash instances
18:51 jijiki: Depool mw1280 and mw2206 to hardware issues - T215415 T218006
18:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group1 wikis (duration: 00m 58s)
18:30 robh: thumbor1004 memtest in progress via T215411
18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
18:29 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
18:28 ema: cp2002: depool varnish-fe after 1 hour ATS experiment T213263
18:09 bstorm_: rebooting labstore1006 T217473
18:07 bstorm_: downtime labstore1006 for troubleshooting T217473
17:57 XioNoX: set interface description on fasw-c-codfw:ge-0/0/47
17:43 XioNoX: s/29073/202425/ on AMS-IX
17:34 XioNoX: add missing sandbox1-b-eqiad interface to ospf(3) passive on cr1/2-eqiad
17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
17:19 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
17:18 ema: cp2002: pool varnish-fe for user traffic, routed through ATS backends T213263
17:05 otto@deploy1001: scap-helm eventgate-analytics finished
17:05 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
17:05 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
17:01 otto@deploy1001: scap-helm eventgate-analytics finished
17:01 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
17:01 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
16:59 otto@deploy1001: scap-helm eventgate-analytics finished
16:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:56 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
16:42 robh: mw2206.codfw.wmnet is being powered down for firmware update, relying on auto depool function from clean shutdown for mw api server via T215415
16:36 addshore: SWAT done
16:36 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/includes/api/ApiMain.php: SWAT: T214080 T212529 ApiMain.php api/request logging event changes gerrit:496197 (duration: 00m 57s)
16:32 akosiaris@deploy1001: scap-helm cxserver finished
16:32 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
16:32 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
16:19 akosiaris@deploy1001: scap-helm cxserver finished
16:19 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
16:19 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
16:16 akosiaris@deploy1001: scap-helm cxserver finished
16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
16:16 akosiaris@deploy1001: scap-helm cxserver finished
16:16 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
16:16 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging /home/akosiaris/deployment-charts/charts/cxserver/ [namespace: cxserver, clusters: staging]
16:15 jijiki: Depool thumbor1004 to investigate memory issues - T215411
16:04 akosiaris@deploy1001: scap-helm cxserver finished
16:04 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
16:04 akosiaris@deploy1001: scap-helm cxserver finished
16:04 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
16:04 akosiaris@deploy1001: scap-helm cxserver finished
16:04 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
16:04 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
15:52 akosiaris@deploy1001: scap-helm cxserver finished
15:52 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
15:52 akosiaris@deploy1001: scap-helm cxserver finished
15:52 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml eqiad stable/cxserver [namespace: cxserver, clusters: eqiad]
15:52 akosiaris@deploy1001: scap-helm cxserver finished
15:52 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
15:52 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
15:40 akosiaris: do the first deploy of cxserver in eqiad/codfw T213195
15:39 akosiaris@deploy1001: scap-helm cxserver finished
15:39 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-eqiad-values.yaml stable/cxserver [namespace: cxserver, clusters: eqiad]
15:39 akosiaris@deploy1001: scap-helm cxserver finished
15:39 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
15:39 akosiaris@deploy1001: scap-helm cxserver install -n production -f cxserver-codfw-values.yaml stable/cxserver [namespace: cxserver, clusters: codfw]
14:27 ema: cp2002: depool varnish-fe in preparation of pointing it to ATS T213263
14:13 marostegui: Upgrade db2074 (sanitarium master)
13:42 akosiaris: upgrade kubestage to kubernetes 1.11.8
13:42 akosiaris: upgrade neon to kubernetes 1.11.8
13:28 akosiaris: upgrade kubestage1002 to kubernetes 1.11.8
13:24 godog: take a snapshot of prometheus@k8s data on prometheus2004
13:13 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.21 (duration: 01m 43s)
13:12 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.21
11:34 marostegui: Test snapshot db1117:3325 to dbstore1001 - T210292
10:55 marostegui: Upgrade db2057
10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 (duration: 00m 56s)
09:52 mutante: ms-be1035 - sudo systemctl reset-failed
09:45 ema: cp1071: upgrade trafficserver to 8.0.3~rc0 for testing purposes
09:41 marostegui: Deploy schema change on db1085 with replication, there will be lag on labsdb:s6
09:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 (duration: 00m 55s)
09:06 moritzm: installing PHP 7.0 security updates
08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 (duration: 00m 55s)
08:58 marostegui: Upgrade mysql and kernel on db2050
08:51 ema: cp3030: wipe frontend cache to get rid of large objects T216006
08:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 (duration: 00m 55s)
08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 (duration: 00m 55s)
08:09 moritzm: upgrading job runners in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 (duration: 00m 54s)
07:26 moritzm: upgrading remaining app servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1096 (duration: 00m 58s)
07:13 marostegui: Test snapshot dbstore1001:3311 to dbstore1001 - T210292
07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1096 (duration: 00m 55s)
06:58 marostegui: Upgrade MySQL and kernel on db2036
06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1096 (duration: 00m 55s)
06:40 marostegui: Stop MySQL on db1096 for upgrade
06:24 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
06:21 marostegui: Testing snapshotting on db1117:3321 to > dbstore1001 - T210292
06:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096 (duration: 01m 07s)
04:11 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)

2019-03-12

23:33 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/MobileFrontend/includes/specials/SpecialMobileOptions.php: SWAT: Fix: undefined locals in SpecialMobileOptions.setJsConfigVars()|gerrit:495907Fix: undefined locals in SpecialMobileOptions.setJsConfigVars() T218098 (duration: 00m 57s)
20:49 shdubsh: manually upgrade prometheus-icinga-exporter to 0.5 on standby icinga
19:48 eileen: civicrm revision changed from 977b9bfcf1 to 5c45e4c24d, config revision is f930677e97
19:31 herron: restarted citoid on scb1003
19:16 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging to eventgate-analytics for group0 wikis (duration: 01m 01s)
19:14 arturo: T216497 manually delete libpam-systemd and libsystemd0 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
19:09 arturo: T216497 manually delete systemd 230-7~bpo8+2 from jessie-wikimedia/openstack-mitaka-jessie
19:07 robh: rebooting thumbor1004 for memory troubleshooting via T215411
17:11 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 55s)
17:10 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Increase APC cache for PropertyInfoLookup from 15 to 20s (duration: 00m 57s)
17:02 jbond42: rolling update of debdeploy
16:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 53s)
16:43 addshore@deploy1001: Synchronized php-1.33.0-wmf.21/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 55s)
16:42 addshore@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: T97368 Double on server cache for PropertyInfoStore (duration: 00m 57s)
16:29 moritzm: upgraded buster installation image to daily build from 12th of March (T213527)
15:45 otto@deploy1001: scap-helm eventgate-analytics finished
15:45 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
15:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
15:43 otto@deploy1001: scap-helm eventgate-analytics finished
15:43 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:43 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:42 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:41 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:39 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:38 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor
15:37 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
15:33 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
15:28 otto@deploy1001: scap-helm eventgate-analytics finished
15:28 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:28 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:26 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
15:23 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_^Ccursor
15:02 ppchelko@deploy1001: scap-helm eventgate-analytics finished
15:02 ppchelko@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:02 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:00 ppchelko@deploy1001: scap-helm eventgate-analytics upgrade -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
14:26 mutante: phab1002 - reboot
13:43 marostegui: Upgrade MySQL and kernel on db2094 (inactive sanitarium)
13:27 marostegui: Deploy schema change on s6 codfw, lag will be generated on s6 codfw
13:24 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.21
12:41 arturo: T215605 include python-mwclient .deb in openstack-mitaka-jessie/jessie-wikimedia in install1002
12:23 jynus: testing snapshotting on db1117:3325 -> dbstore1001 T210292
12:23 zfilipin@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache (duration: 34m 25s)
12:09 moritzm: upgrading mw1238-mw1258 to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
11:59 mutante: analytics-tool1004 - start superset service
11:48 zfilipin@deploy1001: Started scap: testwiki to php-1.33.0-wmf.21 and rebuild l10n cache
11:47 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 01m 40s)
11:45 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 01m 35s)
11:42 arturo: T215605 include python-oath .deb in stretch-wikimedia thirdparty/oath
11:41 zfilipin@deploy1001: Pruned MediaWiki: 1.33.0-wmf.16 (duration: 12m 41s)
11:39 elukey: raise mysql's max_user_connection to 1000 for the Analytics user on labsdb1012
11:36 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
11:36 ema: cp1077: repool varnish-be after service restart T217893
11:35 arturo: delete wrong stretch-wikimedia `thirdparty` component in install1002
11:12 zeljkof: EU SWAT finished
11:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:495842 Add campaign prefix for EG tag (T216123) (duration: 00m 49s)
11:11 moritzm: upgrading API servers/job runners servers in eqiad to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
10:32 marostegui: Deploy schema change on db1082, lag will happen on s5 on labs
10:29 gtirloni: re-enabled puppet on serpens and seaborgium
10:19 gtirloni: updated slapd to version 2.4.47 on seaborgium (T217280)
10:17 moritzm: upgrading API servers/job runners servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
10:14 gtirloni: upgrading seaborgium to slapd 2.4.47
09:39 jynus: stop db1114 and restart it empty
09:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 (duration: 00m 48s)
08:57 elukey: restart memcached on mc1019 to apply new settings - T217731
08:50 ema: cp1077 depooled again T217893
08:49 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
08:48 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates) (T216712)
08:48 ema: restart varnish-be on cp1077 T217893
08:47 moritzm: upgrading app servers in codfw to component/php72 / PHP 7.2.16 (combined with glibc/OpenSSL updates)
08:46 ema: cp1077 repooled T217893
08:46 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
08:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for schema change (duration: 00m 48s)
08:34 jynus: deploy core replica events to db1118
08:15 ema: cp1099: ferm.service failed to resolve prometheus1003.eqiad.wmnet. ferm restarted T202966
07:18 marostegui: Deploy schema change on db2052 (s5 codfw master), this will generate lag on codfw T71127 T51199
07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113 after schema change and upgrade (duration: 00m 49s)
07:09 marostegui: Upgrade mysql and kernel on db1113
06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113 for schema change and upgrade (duration: 00m 50s)
04:04 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
02:40 ejegg: updated payments-wiki from f1a89d7045 to 7a312e371a

2019-03-11

17:55 addshore@deploy1001: Synchronized wmf-config/interwiki-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495723/ (duration: 00m 48s)
17:43 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495721/ (duration: 00m 49s)
17:23 arturo: T215605 copy python-oath from jessie-wikimedia/thirdparty to stretch-wikimedia/thirdpary in reprepro
17:03 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
17:02 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
16:31 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
16:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
15:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1097 (duration: 00m 48s)
15:16 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax for MediaInfo depicts config (beta only) (duration: 00m 49s)
14:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 49s)
14:43 moritzm: upgrading mw canaries to PHP 7.2.16
14:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 48s)
14:25 hashar: contint1001: stopping zuul-merger (it is cpu or IO starving the server)
14:21 moritzm: upgrading mwdebug servers to PHP 7.2.16
14:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1097 (duration: 00m 47s)
14:09 moritzm: importing build of PHP 7.2.16 for component/php72 (T216712)
13:58 marostegui: Upgrade mysql on db1097
13:28 arturo: disable active checks in icinga for labtestvirt200[12] (T218023)
13:04 moritzm: upgrading mwdebug2002 to php 7.2.16
12:23 gtirloni: updated slapd to version 2.4.47 on serpens (T217280)
12:05 gtirloni: updating slapd on serpens/codfw to test possible fix for memory leaks
10:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade and schema change (duration: 00m 48s)
10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:495650 Bumping portals to master (T128546) (duration: 00m 49s)
10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:495650 Bumping portals to master (T128546) (duration: 00m 49s)
09:56 moritzm: installing chromium security updates on remaining proton hosts
09:44 moritzm: installing chromium security updates on proton1001
09:44 elukey: roll restart of aqs on aqs100* to pick up new druid settings
08:02 marostegui: Upgrade pc1010 (spare)
07:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after upgrade (duration: 00m 48s)
07:32 marostegui: Upgrade MySQL and kernel on pc2010 (spare)
07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
06:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s)
06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 after upgrade (duration: 00m 52s)
06:38 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217818)
06:37 marostegui: Power cycle mw1280 - server down
06:35 marostegui: Upgrade mysql and kernel on db1099
06:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 for upgrade (duration: 03m 01s)
06:03 effie: Restarting pdfrender on scb1003
06:02 marostegui: Upgrade MySQL on dbstore1004 (s2, s3, s4)
04:01 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217818)
03:30 kartik@deploy1001: Finished deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878) (duration: 04m 01s)
03:26 kartik@deploy1001: Started deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878)

2019-03-10

22:35 gtirloni: toolforge stretch: increased nscd group TTL from 60 to 300sec (T217280)
07:14 _joe_: restarting pdfrender on scb1004

2019-03-08

19:25 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 50s)
19:21 moritzm: installing php updates on netmon1002
18:20 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta only (duration: 00m 49s)
17:30 robh: decom in progress for rdb100[123478] via T209181
16:48 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898) (duration: 00m 22s)
16:47 mbsantos@deploy1001: Started deploy [kartotherian/deploy@acf2694] (stretch): UBN geoshapes services on maps1004.eqiad.wmnet (T217898)
16:23 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898) (duration: 00m 24s)
16:22 mbsantos@deploy1001: Started deploy [kartotherian/deploy@cc302de] (stretch): UBN geoshapes services on maps2004.codfw.wmnet (T217898)
16:19 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898) (duration: 02m 00s)
16:17 mbsantos@deploy1001: Started deploy [kartotherian/deploy@d71df87] (stretch): UBN geoshapes services (T217898)
15:45 papaul: OS install on restbase2019 and restbase2020
15:30 gilles@deploy1001: Finished deploy [performance/coal@8766469]: (no justification provided) (duration: 00m 06s)
15:30 gilles@deploy1001: Started deploy [performance/coal@8766469]: (no justification provided)
14:34 arturo: T215605 add prometheus-rabbitmq-exporter v0.4 to stretch-wikimedia
14:16 gilles@deploy1001: Finished deploy [performance/navtiming@f2d8a5f]: (no justification provided) (duration: 00m 05s)
14:15 gilles@deploy1001: Started deploy [performance/navtiming@f2d8a5f]: (no justification provided)
13:09 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1077.eqiad.wmnet
12:47 akosiaris: depooling cp1077 just in case, high mailbox lag https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fops&var-cache_type=text&var-server=All&var-layer=backend&panelId=13&fullscreen
12:47 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1077.*
12:07 jbond42: rolling security updates of slite3 on jessie and trusty
11:07 moritzm: uploaded tideways 4.0.7-1+wmf1 for component/php72 (T216712)
10:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080, db1110 (duration: 00m 49s)
10:14 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
09:51 mutante: temp disabling puppet on icinga to debug an issue with elastic checks
09:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080, db1110 (duration: 00m 49s)
09:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311,db1096:3315 (duration: 00m 49s)
08:37 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1009
08:31 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311,db1096:3315 (duration: 00m 48s)
08:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1076 (duration: 00m 48s)
07:59 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 40s)
07:58 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
07:57 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 02s)
07:57 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
07:52 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 01m 18s)
07:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 (duration: 00m 48s)
07:51 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 after mysql upgrade (duration: 00m 49s)
07:35 elukey@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster (duration: 00m 30s)
07:34 elukey@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: Test deployment for Buster
07:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 49s)
07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 into API after mysql upgrade (duration: 00m 48s)
07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 after mysql upgrade (duration: 00m 48s)
06:53 marostegui: Stop MySQL on db1076 for upgrade
06:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 for mysql upgrade (duration: 00m 49s)
06:22 marostegui: Deploy schema change on s3 db1077 with replication (lag will happen on s3 labs)
06:21 marostegui: Stop replication on s3 on labsdb1009 and labsdb1011
06:20 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
06:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 51s)
00:23 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.20/skins/MinervaNeue/resources/skins.minerva.scripts/toc.js: SWAT: Passing page parameter to TOC toggler|gerrit:495021Passing page parameter to TOC toggler T217820 (duration: 00m 50s)
00:16 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Cleanup beta cluster config|gerrit:495024Cleanup beta cluster config T213599; Enable advanced mobile contributions mode on beta cluster|gerrit:495023Enable advanced mobile contributions mode on beta cluster beta-only (noop) sync (duration: 00m 49s)
00:01 ayounsi@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org,service=pdns_recursor

2019-03-07

23:53 XioNoX: set net.ipv4.ip_local_port_range="32768 60999" on dns2001 and repool server - T209989
23:46 XioNoX: set net.ipv4.ip_local_port_range="49152 65535" on dns2001 - T209989
23:43 ayounsi@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org,service=pdns_recursor
23:40 XioNoX: depool dns2001 - T209989
20:44 XioNoX: explicitely disable sampling on non eqiad routers
20:42 thcipriani: restarting gerrit on cobalt for 2.15.11 rollback
20:42 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production) (duration: 00m 07s)
20:41 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on cobalt (production)
20:40 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only (duration: 00m 10s)
20:40 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Revert "Gerrit to 2.15.11" on gerrit2001 only
20:10 thcipriani: restarting gerrit on cobalt for 2.15.11 upgrade
20:10 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production) (duration: 00m 11s)
20:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on cobalt (production)
20:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only (duration: 00m 12s)
20:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@5800deb]: Gerrit to 2.15.11 on gerrit2001 only
19:33 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Enable Priority Hints origin trial on ruwiki (duration: 00m 48s)
19:22 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant 'reupload-shared' to mediawiki uploaders and fix T217523 (duration: 00m 49s)
19:12 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Partial Blocks on Arabic Wikipedia T217283 (duration: 00m 50s)
19:04 arlolra: Updated Parsoid to d4e76d5 (T202905)
18:56 arlolra@deploy1001: Finished deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5 (duration: 05m 01s)
18:51 arlolra@deploy1001: Started deploy [parsoid/deploy@766a920]: Updating Parsoid to d4e76d5
18:39 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,name=maps2004.codfw.wmnet
18:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 01m 25s)
18:30 mbsantos@deploy1001: Started deploy [kartotherian/deploy@248b8c4] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
18:30 mbsantos@deploy1001: Finished deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet (duration: 03m 46s)
18:26 mbsantos@deploy1001: Started deploy [tilerator/deploy@fac7e5e] (stretch): Updating eqiad cluster before repool of maps2004.codfw.wmnet
18:25 gehel: cleaning kernel-proposed-updates component on reprepro (install1002)
18:15 XioNoX: disable asw2-c-eqiad <-> asw-c-eqiad link - T208734
17:55 gehel: rolling upgrade of kibana on logstash clusters completed - T216052
17:48 gehel: rolling upgrade of kibana on logstash clusters - T216052
17:44 gehel: rolling upgrade of logstash on logstash clusters completed - T216052
17:36 gehel: rolling upgrade of logstash on logstash clusters - T216052
17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 07s)
17:34 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
17:34 gehel@deploy1001: Finished deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052 (duration: 00m 08s)
17:33 gehel@deploy1001: Started deploy [logstash/plugins@7c4c5ea]: upgrade logstash plugins to 5.6.14 - T216052
17:16 gehel: rolling upgrade of elasticsearch on logstash clusters completed - T216052
17:09 ariel@deploy1001: Finished deploy [dumps/dumps@3e25558]: fix broken page-content job retries (duration: 00m 04s)
17:09 ariel@deploy1001: Started deploy [dumps/dumps@3e25558]: fix broken page-content job retries
16:54 cmjohnson1: powering off cp1099 to move to different rack T202966
15:26 gehel: rolling upgrade of elasticsearch on logstash clusters - T216052
14:54 hashar: 1.33.0-wmf.20 seems all good
14:46 marostegui: Reload haproxy on dbproxy1011 to repool labsdb1009
14:15 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.20
13:47 mutante: phab1002 - removing all php-7.2 packages and letting puppet reinstall them after component change
13:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1075 after schema change and mysql upgrade (duration: 00m 55s)
13:41 marostegui: Stop mysql on labsdb1009 for upgrade (this will trigger an haproxy IRC alert)
13:39 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1009
13:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 52s)
12:59 zeljkof: EU SWAT finished
12:56 gtirloni: re-enabled puppet on seaborgium/serpens
12:55 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable musical notation datatype on testwikidatawiki (T216730)|gerrit:493010Enable musical notation datatype on testwikidatawiki (T216730) (duration: 00m 56s)
12:42 ariel@deploy1001: Finished deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression) (duration: 00m 05s)
12:42 ariel@deploy1001: Started deploy [dumps/dumps@3a25aa0]: handle failed xml content jobs correctly (fix regression)
12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create an uploader group on mediawiki.org (T217523)|gerrit:494225Create an uploader group on mediawiki.org (T217523) (duration: 00m 55s)
12:34 zfilipin@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: Restrict local uploads on mediawiki.org, take 2 (T217523)|gerrit:494806Restrict local uploads on mediawiki.org, take 2 (T217523) (duration: 00m 56s)
12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:492447 Restore bureaucrat rights on hi.wiktionary to default () (duration: 00m 56s)
12:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:494477 Enable edittag for ExternalGuidance in CX and VE (T216123) (duration: 00m 57s)
12:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1075 after schema change and mysql upgrade (duration: 00m 56s)
11:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1075 after schema change and mysql upgrade (duration: 00m 56s)
11:45 gtirloni: temporarily disabled puppet on seaborgium/serpens to try slapd config changes
11:28 gtirloni: updated seaborgium to stretch (T217280)
11:21 mutante: doc.wikimedia.org - back up, manually fixed path to php-fpm.sock to 7.0 - puppet disabled, fix coming
11:18 mutante: doc.wikimedia.org down and being worked on - package downgrade exposed an issue
11:15 marostegui: Stop MySQL on db1075 for upgrade
11:15 mutante: doc1001 - apt-get remove --purge php7.2* (the same packages with 7.0 were previosly installed in parallel)
10:58 gtirloni: upgrading seaborgium to Stretch (so it's running the same distro as serpens/codfw)
10:34 moritzm: restarting HHVM/Apache on mediawiki canaries to pick up OpenSSL security update
10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 for schema change and mysql upgrade (duration: 00m 56s)
10:13 moritzm: upgrading mediawiki canaries to component/php72 (T216712)
09:47 moritzm: upgrading mwdebug servers in eqiad to component/php72 (T216712)
09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=citoid,cluster=scb,name=scb.*
09:37 akosiaris: rump up traffic to citoid kubernetes to 100%
09:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=citoid,cluster=scb,name=scb.*
09:21 moritzm: upgrading mwdebug servers in codfw to component/php72 (T216712)
09:15 elukey: fixed vlan-analytics1-d-eqiad members on asw2-d-eqiad - T205507
09:03 mutante: mw2151 - mkdir /var/run/nutcracker ; chown nutcracker:nutcracker /var/run/nutcracker ; systemctl start nutcracker - runs again - pooling server
08:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1122 (duration: 00m 55s)
08:54 mutante: depooled mw2151 - nutcracker failing
08:19 mutante: reloading icinga service
08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1122 (duration: 00m 55s)
07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1122 into API (duration: 00m 55s)
07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1122 (duration: 00m 55s)
07:28 marostegui@deploy1001: sync-file aborted: Repool db1121 (duration: 00m 01s)
07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 56s)
07:12 marostegui: Stop MySQL on db1122 to upgradwe
07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 for MySQL upgrade (duration: 00m 57s)
06:40 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
06:03 marostegui: Deploy schema change on db1121, this will generate lag on labsdb:s4 - T86342
06:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 57s)
04:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
01:19 twentyafterfour: phabricator update complete
01:17 twentyafterfour: starting phabricator update to tag release/2019-03-07/1 - expect momentary downtime
01:10 twentyafterfour: preparing phabricator upgrade
00:47 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/pagers/ActiveUsersPager.php: f929e2a5069 (duration: 00m 56s)
00:43 aaron@deploy1001: Synchronized php-1.33.0-wmf.20/includes/specials/SpecialActiveusers.php: f929e2a5069 (duration: 00m 56s)
00:28 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable loading WikibaseCirrusSearch (disabled) on production wikis (duration: 00m 55s)
00:23 aaron@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Run WikibaseCirrusSearch code for search on testwikidatawiki (duration: 00m 56s)

2019-03-06

21:23 XioNoX: test ping-offload with unused IP 208.80.153.225 - T190090
20:30 hashar: 1.33.0-wmf.20 looks fine with group0 and group1
20:14 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.20 (duration: 01m 43s)
20:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.20
19:51 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/LdapPrimaryAuthenticationProvider.php: Remove calls to no-longer-imeplemented methods after I2eeaeed1 - T217692 (duration: 00m 58s)
19:14 XioNoX: apply ping-offload redirect to private1-a-codfw - T190090
19:03 gtirloni: increased serpens vCPUs from 4 to 8 (T217280)
18:55 gtirloni: increased seaborgium vCPUs from 4 to 8 (T217280)
18:08 bstorm_: re-enabled puppet after observing the change works well on the partner for labstore2004 and T210818
18:07 joal@deploy1001: Finished deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train (duration: 31m 02s)
18:04 bstorm_: disabled puppet and downtimed labstore2004 while deploying a change for T210818
17:36 joal@deploy1001: Started deploy [analytics/refinery@fef9181]: Regular analytics weekly deploy train
17:34 sbisson@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Added new throttle rules, removed expired|gerrit:494782Added new throttle rules, removed expired (duration: 00m 55s)
17:33 sbisson@deploy1001: sync-file aborted: SWAT: Added new throttle rules, removed expired|gerrit:494782Added new throttle rules, removed expired (duration: 00m 01s)
17:24 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: wgCopyUploadDomains: Changed domain for mehrnews.com|gerrit:492448wgCopyUploadDomains: Changed domain for mehrnews.com (duration: 00m 56s)
17:17 sbisson@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/GrowthExperiments/extension.json: SWAT: Use schema version where reading is a valid editor_interface|gerrit:494531Use schema version where reading is a valid editor_interface (duration: 00m 56s)
17:10 elukey@deploy1001: Finished deploy [analytics/superset/deploy@911ad13]: First deploy to new host (duration: 00m 27s)
17:10 elukey@deploy1001: Started deploy [analytics/superset/deploy@911ad13]: First deploy to new host
17:09 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Welcome survey: send all newcomers to variation A (cs, ko)|gerrit:494698Welcome survey: send all newcomers to variation A (cs, ko) (duration: 00m 56s)
16:53 jbond42: built prometheus-openldap-exporter for stretch
16:51 ema: upgrade ATS to 8.0.2-1wm1
16:23 moritzm: imported conftool 1.0.2-1+deb10u1 for buster-wikimedia
16:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.20/includes/api/ApiBase.php: I921777 (duration: 00m 58s)
16:05 moritzm: imported scap for buster-wikimedia (T213527)
14:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 56s)
13:35 marostegui: Upgrade MySQL on db1123
13:18 jbond42: rolling security updates for file on jessie
13:02 zeljkof: EU SWAT finished
12:41 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change links in cswiki Help Panel (T217391)|gerrit:494668Change links in cswiki Help Panel (T217391) (duration: 00m 55s)
12:32 oblivian@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents: SWAT: Allow directing a sample of users to PHP 7 backport to wmf.19 T216676 (duration: 00m 57s)
12:22 gtirloni: updated serpens to stretch (T217280)
12:22 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle Exception for Art+Feminism event Eindhoven 8th March (T217676)|gerrit:494669Throttle Exception for Art+Feminism event Eindhoven 8th March (T217676) (duration: 00m 56s)
12:10 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Setting php7 sample rate for anonymous users to 0 (duration: 00m 57s)
11:32 godog: bounce prometheus@k8s on prometheus2004 to test limiting concurrent connections
11:21 gtirloni: updated and rebooted seaborgium (T217280)
11:18 gtirloni: updated and rebooted serpens (T217280)
10:56 marostegui: Deploy schema change on db1123
10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 53s)
10:48 volans: upgraded spicerack to 0.0.20 on cumin[12]001
10:46 volans: uploaded spicerack_0.0.20-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
10:38 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/Translate/TranslateUtils.php: Revert "TranslateUtils: Avoid use of deprecated class Revision" - T217689 (duration: 00m 59s)
10:36 hashar: Deploying a hotfix for Translate https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Translate/+/494659/
10:22 ema: lvs100[12],lvs1016: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
09:11 ema: lvs200[123]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
09:05 moritzm: removed debmonitor host entry for ruthenium (T216062)
09:01 mutante: switching noc.wikimedia.org from apache to httpd module (mwmaint2001, then mwmaint1002)
08:48 akosiaris@cumin1001: conftool action : set/weight=12; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
08:48 akosiaris@cumin1001: conftool action : set/weight=15; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50% T213194
08:48 akosiaris: increase citoid traffic to kubernetes infrastructure to 50%
08:47 marostegui: Deploy schema change on s3 codfw, this will generate lag on codfw - T86342
08:42 ema: lvs300[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090 after MySQL upgrade (duration: 00m 59s)
08:15 marostegui: Stop MySQL on db1090 for mysql upgrade
08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090 for MySQL upgrade (duration: 00m 56s)
08:14 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1105 after MySQL upgrade (duration: 00m 56s)
07:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
07:34 marostegui: Remove dbstore1002 from tendril and zarcillo T216491
07:09 elukey: raised analytics user's max_user_connection from 10 to 100 on labsdb1012 - T215231
07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic db1105 after MySQL upgrade (duration: 00m 56s)
06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105 after MySQL upgrade (duration: 00m 56s)
06:32 marostegui: Stop MySQL on db1105 for MySQL upgrade
06:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105 for MySQL upgrade (duration: 01m 14s)
06:27 marostegui: Add labsdb1012 to tendril and zarcillo - T215231
05:50 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
04:26 eileen: civicrm revision changed from 196493f372 to 4aac68eead, config revision is 8ca90b4c7b
04:00 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
00:55 twentyafterfour: finished US Eveninig SWAT.
00:41 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494524/ for SWAT refs T217276 (duration: 00m 55s)
00:23 twentyafterfour@deploy1001: Synchronized wmf-config/mobile.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494271/ for SWAT refs T212253 (duration: 00m 56s)
00:12 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/493236/ for SWAT. refs T217080 (duration: 00m 56s)

2019-03-05

23:51 ejegg: updated payments-wiki from 4f2935ad17 to f1a89d7045
21:05 godog: temporarily stop requests to k8s instance on prometheus2004
21:00 herron: restarted apache on grafana1001
20:43 herron: retarted apache on grafana1001
19:56 hashar@deploy1001: Synchronized php-1.33.0-wmf.20/extensions/LdapAuthentication/: Stop referring to the now-killed AuthPlugin class - T217692 (duration: 00m 57s)
17:44 godog: bounce uwsgi on graphite1004
17:25 herron: restarting uwsgi-graphite-web on graphite1004
16:54 moritzm: imported logstash 1:5.6.14-1 to thirdparty/elastic56
16:52 herron: restarting uwsgi-graphite-web on graphite1004
16:43 otto@deploy1001: scap-helm eventgate-analytics finished
16:43 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:43 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
16:20 herron: restarting uwsgi-graphite-web on graphite1004
15:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.20
15:35 hashar@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674 (duration: 51m 03s)
14:52 gtirloni: reprepro added bdsync_0.10-1+deb9u1 T209527
14:44 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
14:42 otto@deploy1001: scap-helm eventgate-analytics finished
14:42 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
14:42 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
14:41 otto@deploy1001: scap-helm eventgate-analytics finished
14:41 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
14:41 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
14:40 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
14:35 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.BRPBtKvzZH" --verbose' returned non-zero exit status 1 (duration: 00m 20s)
14:35 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
14:34 jijiki: Rump up citoid traffic from k8s to 25% on codfw - T213194
14:34 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.ngh6XIMz8y" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
14:33 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
14:33 jiji@cumin1001: conftool action : set/weight=5; selector: dc=codfw,service=citoid,cluster=scb,name=kubernetes.*
14:27 hashar@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.JrfRQw0oDJ" --verbose' returned non-zero exit status 1 (duration: 00m 21s)
14:27 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.20 and rebuild l10n cache # T206674
14:25 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.14 (duration: 09m 47s)
14:20 otto@deploy1001: scap-helm eventgate-analytics finished
14:20 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
14:20 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
14:20 otto@deploy1001: scap-helm eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
14:17 hashar@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "hashar"; reason is "Pruned MediaWiki: 1.33.0-wmf.14" (duration: 00m 00s)
14:14 hashar: Applied wmf/1.33.0-wmf.20 local patches # T206674
14:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 T217591 (duration: 01m 50s)
13:31 hashar: Cutting branch wmf/1.33.0-wmf.20 # T206674
13:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 T217591 (duration: 00m 48s)
13:14 ema: lvs500[12]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
13:07 zeljkof: EU SWAT finished
12:58 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wgArticleCountMethod=any for zhwikiversity (T214946)|gerrit:487115Set wgArticleCountMethod=any for zhwikiversity (T214946) (duration: 00m 49s)
12:45 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert "Enable edittag for ExternalGuidance in CX and VE" (duration: 00m 48s)
12:24 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert gerrit:493155 (duration: 00m 49s)
11:59 _joe_: upgrading scap everywhere to 3.9.2-1, T217611
11:52 ema: lvs400[56]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
11:45 _joe_: installing new scap version in codfw
11:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap version (duration: 00m 48s)
11:43 _joe_: installing new swat version on deployment servers, T217611
11:22 _joe_: uploading new scap packages , T217611
10:58 ema: lvs4007/lvs5003: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
10:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
10:55 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming/NavigationTiming.config.php: T187299 Fix wiki oversampling config validation (duration: 00m 48s)
10:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 48s)
10:27 jiji@cumin1001: conftool action : set/weight=4; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
10:24 jijiki: Rump up citoid traffic from k8s to 25% - T213194
10:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 47s)
10:10 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Oversample navtiming on ruwiki and eswiki (duration: 00m 47s)
10:07 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming: T187299 Backport wiki oversampling config syntax change (duration: 00m 48s)
10:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1103:3312 and db1103:3314 after mysql upgrade (duration: 00m 50s)
09:56 ema: lvs200[456]: upgrade linux to 4.9.144-3.1, reboot for L1TF kernel/microcode updates T203011
09:31 marostegui: Stop MySQL on db1103:3312 and db1103:3314 for MySQL upgrade
09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 and db1103:3314 for mysql upgrade (duration: 00m 47s)
09:26 ema: lvs100[456]: reboot for L1TF kernel/microcode updates T203011
09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 47s)
09:16 godog: kibana refresh field list
08:58 mutante: restarting gerrit to pickup change 493963 - disable jgit gc
08:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 47s)
08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1084 (duration: 00m 48s)
08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 in API (duration: 00m 48s)
08:32 marostegui: Optimize echo_event table on x1 codfw master (db2034) this will generate lag on x1 codfw - T217591
08:24 akosiaris: T213194 bump percentage of citoid requests reaching eqiad kubernetes cluster to 9%
08:23 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes100.*
08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1084 (duration: 00m 49s)
07:47 marostegui: Upgrade MySQL on db1084
07:18 marostegui: Stop MySQL on db1095 (backups host) to upgrade MySQL
07:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 47s)
07:08 marostegui: Start transferring data from labsdb1011 to labsdb1012 - T215231
06:56 marostegui: Reboot labsdb1012
06:55 marostegui: Defragment echo_event tables on dbstore1005:3320 T217591
06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 (duration: 00m 48s)
06:43 marostegui: Stop MySQL on db2035 (s2 codfw master) to upgrade MySQL
06:41 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
06:18 marostegui: Stop MySQL on dbstore2001 to upgrade MySQL
06:17 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011
06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 51s)
03:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Handle TitleBlacklist errors correctly (T217382) (duration: 00m 49s)
03:03 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)
02:59 ejegg: updated payments-wiki from ca7c280f3e to 4f2935ad17
02:27 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Revert hot fix (duration: 00m 46s)
02:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Hot fix for T217615 (duration: 00m 47s)
02:05 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
01:33 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
01:21 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
01:18 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 47s)
01:15 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/includes/api/ApiBase.php: Logging live patch to debug T217615 (duration: 00m 49s)
01:13 tzatziki: changing password for "Force de Mots" and "שרית חייט"
00:46 XioNoX: disable unused ports of restbase1016 on asw-a
00:44 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikimediaEvents/: Redact title/create params and drop page_title in EditorJourney schema (T213974) (duration: 00m 49s)
00:40 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES goodfaith on itwiki (T211032) (duration: 00m 47s)
00:17 catrope@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/GrowthExperiments/includes/HelpPanel.php: Exclude help panel from main page (T215664) (duration: 00m 48s)
00:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES on kowiki (T161628) (duration: 00m 49s)

2019-03-04

23:09 eileen: civicrm revision changed from 316e038a69 to 196493f372, config revision is 8ca90b4c7b
22:15 arlolra: Updated Parsoid to 1660395 (T214099, T202905)
22:05 arlolra@deploy1001: Finished deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395 (duration: 06m 34s)
21:59 arlolra@deploy1001: Started deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395
21:58 otto@deploy1001: scap-helm eventgate-analytics finished
21:58 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw]
21:58 otto@deploy1001: scap-helm eventgate-analytics finished
21:58 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
21:58 otto@deploy1001: scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad]
21:54 otto@deploy1001: scap-helm eventgate-analytics finished
21:54 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
21:54 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging]
21:54 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
21:49 ejegg: re-enabled Omnimail unsubscribe processing, disabled recipient repair job
21:46 ejegg: updated Fundraising CiviCRM from 616c58cebe to 316e038a69
21:19 XioNoX: add bgp sessions to AS137236 on cr1-eqsin
21:14 XioNoX: re-enable bgp to AS13489 on cr2-eqiad
20:44 reedy@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Echo/: T217487 (duration: 00m 53s)
20:23 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Remove outdated translations (duration: 00m 02s)
20:23 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Remove outdated translations
20:17 niharika29@deploy1001: Finished deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link (duration: 00m 02s)
20:17 niharika29@deploy1001: Started deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link
20:01 sbisson@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Enables maplink for geocoordinate Wikibase statements display on clients|gerrit:494289Enables maplink for geocoordinate Wikibase statements display on clients (duration: 00m 48s)
20:00 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reader demographics survey|gerrit:494292Enable reader demographics survey (duration: 00m 49s)
19:52 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable help panel for user and user talk NS|gerrit:493616GrowthExperiments: Enable help panel for user and user talk NS (duration: 00m 49s)
19:47 sbisson@deploy1001: Synchronized tests/loggingTest.php: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 2)|gerrit:490668Add eventbus analytics logging alongside with kafka logging. (part 2) (duration: 00m 48s)
19:46 sbisson@deploy1001: Synchronized wmf-config/: SWAT: Add eventbus analytics logging alongside with kafka logging. (part 1)|gerrit:490668Add eventbus analytics logging alongside with kafka logging. (part 1) (duration: 00m 51s)
19:41 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates (duration: 11m 07s)
19:35 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable GrowthExperiments Homepage on testwiki|gerrit:494223Enable GrowthExperiments Homepage on testwiki (duration: 00m 49s)
19:30 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates
19:03 bstorm_: dumps.wikimedia.org is now running off labstore1007 T217473
18:25 bstorm_: disabled notifications for high load on labstore1007 while failed over T217473
18:23 vgutierrez: restarting pybal on lvs5002 - T213121
18:16 XioNoX: push lvs5002 changes on cr2-eqsin - T213121
16:54 hashar: contint1001: cleaned all Docker containers, compress /var/log/zuul/ files
16:52 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001.*
16:43 marostegui: Restart MySQL on db1112 for addshore
16:33 jynus: enabing gtid replication on clouddb1002
16:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part II (duration: 00m 48s)
16:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part I (duration: 00m 51s)
16:18 moritzm: installing ldb security updates
16:13 jiji@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001
16:13 jiji@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.*
15:55 jijiki: Running puppet on sbc* and kubernetes* - T213194
15:44 jijiki: Disabling puppet on sbc* and kubernetes* - T213194
15:22 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op: Remove unused legacy EventBus config settings (duration: 00m 49s)
15:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 after changing index on logging table (duration: 00m 51s)
14:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 and db1100 after changing index on logging tbale (duration: 00m 49s)
14:20 elukey: update puppet compiler's facts
14:20 marostegui: Change indexes on logging table on db1100 (s5) and db1097:3314 (commonswiki) - T217397
14:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314, db1100 to changeindexes on logging tbale (duration: 00m 50s)
13:57 gehel: restarting blazegraph on wdqs eqiad
12:23 moritzm: testing component/php72 on mw2224
11:04 akosiaris@deploy1001: scap-helm citoid finished
11:04 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
11:04 akosiaris@deploy1001: scap-helm citoid finished
11:04 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
11:04 akosiaris@deploy1001: scap-helm citoid finished
11:04 akosiaris@deploy1001: scap-helm citoid cluster staging completed
11:04 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
10:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More weight to db1089 (duration: 00m 48s)
10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:494191 Bumping portals to master (T128546) (duration: 00m 50s)
10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546)|gerrit:494191 Bumping portals to master (T128546) (duration: 00m 50s)
09:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 48s)
09:27 ariel@deploy1001: Finished deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer (duration: 00m 09s)
09:27 ariel@deploy1001: Started deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer
09:22 godog: temporarily stop prometheus on prometheus2004 to take a snapshot
08:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T216499 Undo enabling Priority Hints origin trial on ruwiki (duration: 00m 49s)
08:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 49s)
08:38 gilles@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
08:29 marostegui: Change logging indexes on db1089 to leave the indexes exactly like the ones on tables.sql - T217397
08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 - T217397 (duration: 00m 49s)
07:48 ema: cp3032/cp3042: restart varnish-be due to mbox lag
07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for schema change (duration: 00m 49s)
07:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 53s)
07:33 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1010
07:17 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T217310)
07:13 marostegui: Remove dbstore1002 from tendril and zarcillo - T216491
07:05 marostegui: Upgrade MySQL on db2088 and db2091
06:46 marostegui: Stop MySQL on dbstore1002 for decommission T210478 T172410 T216491 T215589
06:38 marostegui: Stop MySQL on labsdb1010 for mysql upgrade
06:34 gtirloni: downtimed cloudstore1008/9 (T209527)
06:13 marostegui: Upgrade MySQL on db2041 db2049 db2056 db2095
06:06 marostegui: Run analyze table logging on db2038 and db2059 - T71222
06:05 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
06:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094:3314 for schema change (duration: 01m 11s)
05:18 kart_: Started manual run of unpublished ContentTranslation draft purge script (T217310)

2019-03-03

off: restarted icinga on icinga2001, stale status file, too many open files
10:44 elukey: restart pdfrender on scb1003

2019-03-02

12:12 gtirloni: labstore1006 started nfsd T217473

2019-03-01

20:45 ejegg: turned off fundraising omnimail process unsubscribes job
19:40 XioNoX: pre-configure asw-a8 ports on asw2-a8-eqiad - T187960
19:32 XioNoX: pre-configure asw-a7 ports on asw2-a7-eqiad - T187960
19:29 XioNoX: pre-configure asw-a6 ports on asw2-a6-eqiad - T187960
19:17 XioNoX: pre-configure asw-a5 ports on asw2-a5-eqiad - T187960
18:53 robh: notebook1003 has unusually high load recently (23) and seemed to lag in reporting to icinga. no hardware failures, pinged about it in #wikimedia-analytics
16:33 jbond42: rolling security update of bind9 packages on jessie and trusty
15:38 ema: trafficserver_8.0.2-1wm1 uploaded to stretch-wikimedia
15:02 akosiaris: restore proton config values
14:33 hashar: Updating all debian-glue Jenkins job to properly take in account the BUILD_TIMEOUT parameter # T217403
13:24 moritzm: removed sca* hosts from debmonitor database
12:49 akosiaris: lower max_render_queue_size: to 20 for proton on proton100{1,2}
12:32 akosiaris: restart proton1002, OOM showed up
12:31 akosiaris: restart proton on proton1001, counted 99 chromium processes left running since at least Jan 30
11:47 jbond42: rebooting labsdb1005.codfw.wmnet
11:17 jbond42: rebooting labstore2004.codfw.wmnet
11:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1094 (duration: 00m 50s)
08:52 godog: temporarily stop prometheus instances on prometheus2004 to take a snapshot
07:44 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 48s)
07:39 oblivian@deploy1001: Synchronized README: noop sync to test opcache-manager (duration: 00m 47s)
07:31 oblivian@deploy1001: Synchronized README: Test deploy for new scap configuration (duration: 00m 46s)
07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
07:23 _joe_: installed php 7.2 compatible packages on deploy1001,2001
07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after mysql upgrade (duration: 00m 46s)
06:48 marostegui: Deploy schema change on s4 codfw, lag will appear on s4 codfw - T86342
06:43 marostegui: Stop MySQL on db1094 for mysql upgrade
06:40 _joe_: upgrading php extensions on deploy* to versions compatible with php7.2
05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 51s)
00:12 XioNoX: pre-configure asw-a3 ports on asw2-a3-eqiad - T187960
00:09 thcipriani@deploy1001: Synchronized README: noop sync to test opcache-manager in scap 3.9.1-1 (duration: 00m 48s)

2019-02-28

23:44 XioNoX: pre-configure asw-a2 ports on asw2-a2-eqiad - T187960
23:31 XioNoX: pre-configure asw-a1 ports on asw2-a1-eqiad - T187960
23:27 bblack@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp107[678]\.eqiad\.wmnet
23:07 robh: decom cp1045-cp1055, all are role spare but may icinga alert for ping
22:39 ejegg: updated fundraising CiviCRM from c81fe7a4fd to 616c58cebe
22:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove old translate config (duration: 00m 46s)
22:29 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable some translate special pages again T217376 (duration: 00m 47s)
22:29 ottomata: replaying events from mediawki eventbus config outage - T217385
22:03 hashar: MediaWiki 1.33.0-wmf.19 deployed on all wikis # T206673
21:59 XioNoX: disable asw2-a5 <> asw-a link - T217383
21:28 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 00m 47s)
21:09 herron: disabling logstash persisted queue
20:52 herron: cleared logstash persistent queue on logstash100[7-9]
20:13 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.19
20:02 thcipriani@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle exception for Amnesty International Editathon Thottle Rules: remove "all" Add new throttle rules T216998 T217063 T217305 T217311 (duration: 00m 54s)
19:40 thcipriani@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Remove legacy eventBus config settings. (duration: 00m 53s)
19:36 _joe_: upgrading scap on all servers
19:30 thcipriani@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Art+Feminism 2019 editathon T217336 (duration: 00m 54s)
19:26 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Enable WikibaseCirrusSearch on Beta Cluster (beta only change/noop sync) T215684 (duration: 00m 55s)
19:22 robh: mw1272 being worked on by onsite
19:21 robh: mw1272 unresponsive to mgmt or production interfaces
19:16 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Start help panel experiment on viwiki T215666 (duration: 03m 02s)
18:52 moritzm: installing libgd security updates on trusty
18:52 herron: migrating logstash1006 kafka to logstash1012 T213898
18:43 XioNoX: start pybal on lvs1016 - T212348
18:34 robh: cp1078 power down for network move
18:28 XioNoX: stop pybal on lvs1016 - T212348
18:28 robh: cp1077 power off for network port relocation
18:21 robh: cp1076 power down for network port move
17:51 herron: logstash1011 kafka now in sync. transitioning logstash1005 to spare system T213898
17:24 cmjohnson1: powering down sodium to move racks T212348
17:23 jynus: recreating replicas, master ops events for db1078, db1075 T213858
16:43 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1006.eqiad.wmnet
16:39 elukey: clean up old/stale zookeeper znodes from conf100[4-6] - T216979
16:28 herron: migrating kafka on logstash1005 to logstash1011 T213898
16:27 herron: migrating kafka on logstash1005 to logstash1011 T213898
16:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T214905 Add ReferencePreviews to allowed BetaFeatures (duration: 00m 54s)
16:08 jbond42: rebooting labstore2003
15:56 thcipriani@deploy1001: Synchronized README: noop sync to test opcache-manager in scap 3.9.1-1 (duration: 00m 53s)
15:52 jbond42: rebooting labsdb1004
15:50 thcipriani@deploy1001: Synchronized README: noop sync scap 3.9.1-1 (duration: 00m 52s)
15:49 akosiaris@deploy1001: scap-helm citoid finished
15:49 akosiaris@deploy1001: scap-helm citoid cluster staging completed
15:48 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
15:46 _joe_: install scap 3.9.1-1 on the deployment servers
15:43 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1006.eqiad.wmnet
15:43 jbond42: rebooting labsdb1007
15:37 jbond42: rebooting labsdb1006
15:36 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
15:33 jbond42: rebooting labstore2002
15:29 jbond42: rebooting labstore2001
15:23 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1005.eqiad.wmnet
15:19 jbond42: rebooting rhodium
15:15 cmjohnson1: powering off db1114 to replace motherboard T214720
15:14 _joe_: uploading scap 3.9.1-1 to {stretch,jessie}-wikimedia
14:50 jbond42: reboot cloudnet2001-dev.codfw.wmnet
14:47 hashar: mw1272 fixed by running "scap sync-l10n" from deploy host
14:46 hashar: mw1272 had /srv/mediawiki/php-1.33.0-wmf.19/includes/cache/localisation/LocalisationCache.php:475) No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php.
14:46 hashar@deploy1001: scap sync-l10n completed (1.33.0-wmf.19) (duration: 03m 33s)
14:42 jbond@cumin1001: conftool action : set/pooled=no; selector: name=rhodium.eqiad.wmnet
14:41 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.19 (duration: 00m 53s)
14:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.19
14:34 milimetric@deploy1001: Finished deploy [analytics/refinery@f605fad]: New sqoop logic that uses the sharded replicas (duration: 10m 00s)
14:30 akosiaris@deploy1001: scap-helm citoid finished
14:30 akosiaris@deploy1001: scap-helm citoid cluster staging completed
14:30 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
14:28 hashar@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/WikibaseMediaInfo: Move up checks to test if we should construct depicts widgets - T217285 (duration: 00m 58s)
14:24 milimetric@deploy1001: Started deploy [analytics/refinery@f605fad]: New sqoop logic that uses the sharded replicas
13:56 elukey: re-start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952
13:52 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1003.eqiad.wmnet
13:43 godog: depool prometheus1003.eqiad.wmnet to take a data snapshot
13:34 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2003.codfw.wmnet
12:36 zeljkof: EU SWAT finished
12:35 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for Day of Digital Service (T217155) (duration: 00m 52s)
12:31 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for Czech Wikigap 2019 (T217270) (duration: 00m 53s)
12:18 zfilipin@deploy1001: Synchronized wmf-config/: SWAT: Show referencePreviews on group0 wikis as beta feature (T214905) (duration: 00m 56s)
11:59 jbond42: rolling openssl security updates to jessie systems
11:32 akosiaris: remove sca1003, sca1004, sca2003, sca2004 from the fleet. Celebrate!!!!
11:28 elukey: pause cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952
10:00 _joe_: executing a rolling puppet run (2 server at a time per cluster, per dc) in eqiad,codfw as an HHVM restart will be triggered
09:37 gilles@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/NavigationTiming/modules/ext.navigationTiming.js: T217210 Don't assume PerformanceObserver entry types are supported (duration: 00m 54s)
09:30 elukey: start cleanup of 20k+ zookeeper nodes on conf100[4-6] (old Hadoop Yarn state) - T216952
09:26 moritzm: installed php security updates on netmon1002 and people1001
09:22 marostegui: Stop MySQL on db1125 (sanitarium) to upgrade, this will generate lag on labs on: s2, s4, s6,s7
09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 54s)
09:08 marostegui: Stop MySQL on db1121 for upgrade, this will generate lag on labsdb:s4
09:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 53s)
08:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1079 (duration: 00m 53s)
08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase API traffic db1079 after mysql upgrade (duration: 00m 53s)
08:31 elukey: roll restart of Yarn Resource Managers on an-master100[1,2] to pick up new settings
08:22 marostegui: Change abuse_filter_log indexes on s3 codfw, lag will appear on codfw - T187295
08:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1079 after mysql upgrade (duration: 00m 54s)
08:06 moritzm: installing glibc security updates for stretch
07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 in API after mysql upgrade (duration: 00m 53s)
07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 after mysql upgrade (duration: 00m 56s)
07:08 marostegui: Stop MySQL on db1079 for mysql upgrade
06:50 marostegui: Deploy schema change on db1079, this will generate lag on s7 on labs - T86342
06:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 55s)
06:18 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T216983)
05:56 marostegui: Upgrade MySQL on db1124 (Sanitarium) lag will be generated on s1,s3,s5,s8
03:03 kart_: Manual run of unpublished ContentTranslation draft purge script (T216983)
02:08 bstorm_: clouddb1002 is now in place to replace labsdb1004 as replica for toolsdb but not wikilabels postgres yet T193264
01:43 twentyafterfour: phabricator upgrade completed without issues (actually completed at 01:23 UTC but I failed to hit enter and submit this message)
01:20 twentyafterfour: deploying phabricator update 2019-02-27
01:03 twentyafterfour: preparing to deploy phabricator-2019-02-27
00:55 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.19/vendor/: vendor/ruflin/Elastica: Remove scalar return type hints (duration: 01m 33s)
00:22 ebernhardson@deploy1001: Synchronized vendor/: Remove scalar type hints from ruflin/Elastica (duration: 00m 58s)
00:10 ebernhardson@deploy1001: Synchronized wmf-config/CommonSettings.php: T215725 Remove mediawikiwiki from wgCentralAuthAutoCreateWikis (duration: 00m 54s)
00:07 ebernhardson@deploy1001: Synchronized wmf-config/: T215684 Add config for switching Wikibase search to WikibaseCirrusSearch codebase (duration: 00m 55s)

2019-02-27

21:57 XioNoX: delete local pref for peering sessions in eqiad - T204281
21:44 eileen: civicrm revision is c81fe7a4fd, config revision is 050abdf9e8
21:26 XioNoX: delete local pref for peering sessions in eqord - T204281
20:53 XioNoX: delete local pref for peering sessions in codfw/eqdfw - T204281
20:50 hashar: 1.33.0-wmf.19 not rolled to group1. Pending T217285 (Wikibase raising exception on commonswiki). To be figured out during European day time.
20:50 eileen: civicrm revision changed from 224bf15206 to c81fe7a4fd, config revision is d1826e371b
20:14 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
20:04 hashar@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.19 (duration: 00m 53s)
20:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.19
19:49 bstorm_: stopped slave on labsbd1004 for T193264
19:43 bstorm_: downtimed labsdb1004 to stop mysql for transferring data for T193264
19:32 SMalyshev: repooled wdqs1005, caught up
19:26 herron: replacing kafka on logstash1004 with logstash1010 T213898
18:56 SMalyshev: depooled wdqs1005 to let it catch up
18:36 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@465673b]: Redeploy GUI for T217161 (duration: 10m 51s)
18:28 cmjohnson1: powering off mw126[3-6] one at a time to move to different rack A5 T212348
18:25 smalyshev@deploy1001: Started deploy [wdqs/wdqs@465673b]: Redeploy GUI for T217161
18:21 cmjohnson1: powering off mw1262 to move to different rack A5 T212348
18:15 cmjohnson1: powering off mw1261 to move to different rack A5 T212348
17:57 niharika29@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Flow/: Make VisualEditor unwrap <section> tags T217206 (duration: 01m 00s)
17:56 elukey: roll restart hadoop hdfs namenodes on an-master100[1,2] to pick up the new rack config of analytics1071
17:37 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Welcome survey: add a control group to viwiki T216669 (duration: 00m 54s)
17:34 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop collecting data for CitaitonUsage and CitationUsagePageLoad T213969 (duration: 00m 55s)
17:22 elukey: drain + shutdown of analytics1071 to allow its move to A5 - T212348
17:19 cmjohnson1: powering off wtp1030 to move to different rack A5 T212348
17:14 cmjohnson1: powering off wtp1029 to move to different rack A5 T212348
17:06 cmjohnson1: powering off wtp1029 to move to different rack A5 T212348
17:05 RoanKattouw: Running foreachwikiindblist dblists/echo.dblist extensions/Echo/maintenance/removeOrphanedEvents.php on mwmaint1002
16:58 hashar@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/Score: Revert "beautify lilypond error message output" - T217241 (duration: 00m 56s)
16:49 jijiki: Deploy LVS for eventgate-analytics - T211247
16:26 volans: temporarily disabled puppet on icinga[12]001 to deploy g/493171
16:21 volans: force-rebooting icinga1001 (to test some puppet changes) - T214760
15:34 jbond42: rolling openssl security updates to jessie canary servers
14:26 marostegui: Deploy schema change on abuse_filter_log on s7 codfw - lag will be generated on codfw - T187295
14:01 marostegui: Change indexes on abuse_filter_log on db1089 - T187295
14:00 moritzm: uploaded openssl 1.0.2r to jessie-wikimedia
12:08 jbond42: correction: rolling updates of apache on mw api servers *not* jobrunners
12:04 jbond42: rolling updates of apache on mw jobrunners
11:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1077 after MySQL upgrade (duration: 00m 53s)
11:28 godog: cleanup log4j from lvs eqiad / ipvsadm -D -t logstash.svc.eqiad.wmnet:4560
11:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1077 after MySQL upgrade (duration: 00m 54s)
11:17 godog: roll-restart pybal after removing logstash log4j service
10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1077 after MySQL upgrade (duration: 00m 54s)
10:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 with low weight after MySQL upgrade (duration: 00m 53s)
09:55 marostegui: Stop MySQL on db1077 for mysql upgrade - this will generate lag on labsdb:s3
09:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 for MySQL upgrade (duration: 00m 53s)
09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1082 (duration: 00m 54s)
09:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool in API db1082 after mysql upgrade (duration: 00m 53s)
09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1082 after mysql upgrade (duration: 00m 54s)
09:05 marostegui: Stop MySQL on db1082 for mysql upgrade
08:41 godog: enable mmjsonparse by default on kafka outputs - T213189
08:40 marostegui: Deploy schema change on db1082 - will generate lag on labsdb:s5 - T86342
08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 for mysql upgrade (duration: 00m 54s)
08:26 marostegui: Retroactive log, T216444 Global rename of Дагиров Умар → Takhirgeran Umar was done by alanajjar
08:02 marostegui: Global rename of HeavyTony → QTHCCAN by alanajjar - T217222
07:01 marostegui: Deploy schema change on s5 codfw master (db2052), this will generate lag on codfw - T86342
06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1076 (duration: 00m 55s)
06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 01m 08s)
05:05 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T216983)
04:58 SMalyshev: repooled wdqs1006
03:09 kart_: Manual run of unpublished ContentTranslation draft purge script (T216983)
00:38 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: [cirrus] decrease regex timeouts by 25% and drop timeout hack (duration: 00m 53s)
00:30 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.19/skins/MinervaNeue/resources/skins.minerva.scripts/errorLogging.js: MinervaNeue: Allow us to distinguish errors for logged in users (duration: 00m 53s)
00:30 bd808: Re-enabled puppet on labweb100[12]
00:23 bd808: Disabled puppet on labweb100[12]
00:15 bd808: Manually changed logging level and restarted Horizon on labweb100[12]
00:15 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] autocomplete: enable subphrase matching for officewiki (2/2) (duration: 00m 54s)
00:14 ebernhardson@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
00:07 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] autocomplete: enable subphrase suggester builds on officewiki (1/2) (duration: 00m 54s)
00:03 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: noop sync for labs files gerrit:493103 (duration: 00m 54s)

2019-02-26

23:39 tgr: T217203 running mwscript ~/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'LaurenceKingPublishing' 'Fiona at Laurence King Publishing'
23:37 tgr: T217203 running mwscript ~/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'Citycarclubfi' 'Urbaanimies'
23:16 SMalyshev: depooled wdqs1006 to see if it's catch up
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics finished
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics finished
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics finished
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
22:43 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
22:42 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
22:42 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
22:42 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
22:13 XioNoX: delete local pref for peering sessions in eqsin - T204281
19:12 reedy@deploy1001: Synchronized php-1.33.0-wmf.19/extensions/EventBus/: T217145 (duration: 00m 54s)
18:24 arlolra: Updated Parsoid to e82347d (T204608, T214099, T217093)
18:17 arlolra@deploy1001: Finished deploy [parsoid/deploy@ae76aa2]: Updating Parsoid to e82347d (duration: 11m 03s)
18:06 arlolra@deploy1001: Started deploy [parsoid/deploy@ae76aa2]: Updating Parsoid to e82347d
16:38 cdanis: cdanis@krypton sudo apt-get remove grafana
16:35 otto@deploy1001: scap-helm eventgate-analytics finished
16:35 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
16:35 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-codfw-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
16:35 otto@deploy1001: scap-helm eventgate-analytics finished
16:35 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
16:35 otto@deploy1001: scap-helm eventgate-analytics install -n production -f eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
16:34 otto@deploy1001: scap-helm eventgate-analytics install -n production eventgate-analytics-eqiad-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
16:24 jijiki: Restarting memcached on mc1028 - T208844
16:14 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.19 # T206673
16:09 herron: elasticsearch stopped on logstash100[456] T213898
16:07 otto@deploy1001: scap-helm eventgate-analytics finished
16:07 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:07 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics --set main_app.version=v1.0.0-rc2 [namespace: eventgate-analytics, clusters: staging]
16:01 otto@deploy1001: scap-helm eventgate-analytics finished
16:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:01 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:00 herron: re-enabling ircecho
16:00 hashar@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.19 and rebuild l10n cache # T206673 (duration: 58m 17s)
15:47 akosiaris@deploy1001: scap-helm mathoid finished
15:47 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
15:47 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
15:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
15:43 akosiaris@deploy1001: scap-helm mathoid finished
15:43 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
15:43 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
15:21 godog: force puppet run on failed agents in codfw
15:17 herron: stopped ircecho to squelch puppet run alerts
15:13 godog: poweroff ms-be2030 - T204567
15:02 hashar@deploy1001: Started scap: testwiki to php-1.33.0-wmf.19 and rebuild l10n cache # T206673
15:02 otto@deploy1001: scap-helm eventgate-analytics finished
15:02 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:02 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
14:58 otto@deploy1001: scap-helm eventgate-analytics finished
14:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
14:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
14:36 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 (duration: 04m 42s)
14:20 hashar: Applied 1.33.0-wmf.19 security patches | T206673
14:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 45s)
13:37 hashar: cutting deployment branch 1.33.0-wmf.19
13:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 46s)
12:14 moritzm: uploaded php7.2 7.2.15-1+0~20190209065123.16+stretch~1.gbp3ad8c0+wmf1 to component/php72 (T216712)
11:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Full repool db1074 (duration: 00m 46s)
11:12 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 38s)
11:12 jijiki: Pooling thumbor2004 - T214597
11:11 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
11:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1074 into API (duration: 00m 45s)
10:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1074 (duration: 00m 46s)
10:35 marostegui: Stop MySQL on db1074 for upgrade
10:20 marostegui: Deploy schema change on db1074, this will generate lag on labsdb for s2 - T86342
10:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 46s)
10:12 godog: bounce gerrit on gerrit2001 and cobalt after https://gerrit.wikimedia.org/r/c/operations/puppet/+/492633 - T213899
09:10 jynus: temporarilly stop dbstore1001:s1replication to perform new backup system test
09:04 jijiki: Pooling thumbor1003
08:48 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 49s)
08:47 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
08:45 moritzm: installing elfutils security updates
08:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 (duration: 00m 45s)
08:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 46s)
08:08 jijiki: Depool and reimage thumbor2004 - T214597
08:07 jijiki: Pooling thumbor2003 - T214597
08:04 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 30s)
08:04 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
07:54 elukey: removed /rmstore-analytics-test-hadoop from zookeeper main-eqiad - T216952
07:45 _joe_: publishing golang:1.11.5-1 docker image
07:44 moritzm: installing tiff security updates
07:02 marostegui: Deploy schema change on s2 codfw (this will generate lag on s2 codfw) T86342
06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 45s)
06:50 jijiki: Depool and reimage thumbor1003 and thumbor2003 - T214597
06:46 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 07s)
06:46 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
06:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 (duration: 00m 45s)
06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1088 (duration: 00m 45s)
06:41 jijiki: Pooling tthumbor1002
06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 (duration: 00m 46s)
06:34 tgr: T215107 running mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'The_Photographer' 'Wilfredor'
06:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 T86342 (duration: 00m 48s)
06:17 marostegui: Change abuse_filter_log indexes on db1083 - T187295
06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T187295 (duration: 00m 51s)
06:10 tgr: T215107 running mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'The_Photographer' 'Wilfredor'
04:25 kart_: Finished manual run of unpublished ContentTranslation draft purge script (T216983)
04:24 eileen-sorting-k: civicrm revision changed from d1fc603677 to 224bf15206, config revision is d1826e371b
03:06 kart_: Manual run of unpublished ContentTranslation draft purge script (T216983)

2019-02-25

23:50 XioNoX: Re-enabled BGP to Zayo on cr2-codfw - T215193
23:15 herron: service restarts to make logstash101[012] master eligible are taking longer than expected, leaving elasticsearch on logstash100[456] enabled overnight T213898
22:56 mholloway-shell@deploy1001: Started restart [mobileapps/deploy@1ac3c38]: Restarting mobileapps on scb2003
21:54 eileen: update process-control config revision is d1826e371b
21:14 arlolra@deploy1001: Finished deploy [parsoid/deploy@cb62482]: Updating Parsoid to a8fe45e (duration: 04m 19s)
21:11 herron: turning down elasticsearch service on logstash100[456] (data has been migrated to logstash101[012]) T213898
21:10 arlolra@deploy1001: Started deploy [parsoid/deploy@cb62482]: Updating Parsoid to a8fe45e
21:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1ac3c38]: Update mobileapps to c3871cc (duration: 03m 48s)
21:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1ac3c38]: Update mobileapps to c3871cc
19:58 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use EventBus multi endpoint configuration for eventbus configs (duration: 00m 45s)
19:53 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Swat! (duration: 00m 45s)
19:46 reedy@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Disable MFSpecialCaseMainPage for srwiki and enwikivoyage (duration: 00m 46s)
19:41 vgutierrez: restarting pybal on lvs5003 - T213121
19:35 reedy@deploy1001: Synchronized php-1.33.0-wmf.18/extensions/Renameuser: T215107 (duration: 00m 46s)
19:31 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: labs! (duration: 00m 46s)
18:44 krinkle@deploy1001: Synchronized php-1.33.0-wmf.18/includes/libs/objectcache/WANObjectCache.php: 79a1593cae48 / T203786 (duration: 00m 48s)
18:18 jijiki: Pooling thumbor2001
18:18 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 01m 09s)
18:16 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
18:13 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@4c27682]: New GUI, Updater & Blazegraph builds (duration: 09m 53s)
18:04 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@4c27682]: New GUI, Updater & Blazegraph builds
17:59 jijiki: Depooling and reimaging thumbor1002 to stretch - T214597
17:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
17:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
17:42 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
17:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
17:26 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
17:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
16:48 thcipriani@deploy1001: Synchronized README: noop sync for scap 3.9.0-1 (duration: 00m 46s)
16:43 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
16:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
16:41 jijiki: Pooling thumbor1001
16:40 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 04s)
16:40 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
16:23 chasemp: reset 2fa for JBennett on phab with video confirmation
16:21 jijiki: Depooling and reimaging thumbor2001 - T214597
16:17 fsero: upload envoy 1.9.0 to stretch-wikimedia T215810
15:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool in API db1085 after MySQL upgrade (duration: 00m 45s)
15:35 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
15:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
15:34 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
15:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
15:33 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
15:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
15:30 akosiaris@deploy1001: scap-helm citoid finished
15:30 akosiaris@deploy1001: scap-helm citoid cluster staging completed
15:30 akosiaris@deploy1001: scap-helm citoid install -n staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
15:28 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 07s)
15:28 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
15:27 jiji@deploy1001: deploy aborted: (no justification provided) (duration: 00m 04s)
15:27 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
15:20 vgutierrez: shutting down certcentral VMs for decommission - T207389
15:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase API traffic for db1085 after MySQL upgrade (duration: 00m 45s)
15:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1085 after MySQL upgrade (duration: 00m 45s)
14:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1085 after MySQL upgrade (duration: 00m 45s)
14:49 jiji@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 15s)
14:49 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
14:47 jiji@deploy1001: deploy aborted: (no justification provided) (duration: 00m 19s)
14:46 jiji@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
14:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool into API db1085 after MySQL upgrade (duration: 00m 45s)
14:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1085 after MySQL upgrade (duration: 00m 45s)
14:04 marostegui: Stop MySQL on db1085 for mysql upgrade
13:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 for MySQL upgrade and schema change (duration: 00m 46s)
13:32 akosiaris: upgrade etherpad-lite to 1.7.5
12:38 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 07s)
12:38 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
12:27 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 05s)
12:27 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
12:22 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 01m 15s)
12:21 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
11:49 moritzm: rolling out intel-microcode 3.20180807a.2 on all jessie/stretch servers, tests on a number of previously unsupported servers with Westmere CPU were successful and I've verified that all other microcode files are identical compared to the current 3.20180807a.1 microcode
11:19 jijiki: Reimageing thumbor1001 - T214597
10:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546, T202497) (duration: 00m 46s)
10:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546, T202497) (duration: 00m 46s)
10:32 gtirloni: labstore1004 restarted nfsd and killed stuck rpc.mountd.real processed (T216988)
10:16 jijiki: Depooling thumbor1001 to reimage - T214597
09:54 marostegui: Deploy schema change on db1074, this will generate lag on labsdb:s2 - T187295
09:07 marostegui@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase ParserCache TTL from 24 days to 30 - T210992 (duration: 00m 46s)
08:52 marostegui: Deploy schema change on s2 on codfw master - lag will happen on s2 codfw - T187295
08:49 _joe_: generating mcrouter certificate for mw2151 T192457
07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 after MySQL upgrade (duration: 00m 45s)
06:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 in API after MySQL upgrade (duration: 00m 45s)
06:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 after MySQL upgrade (duration: 00m 45s)
06:02 marostegui: Stop MySQL on db1104 for mysql upgrade
06:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 for MySQL upgrade (duration: 00m 50s)

2019-02-24

21:49 eileen: civicrm revision changed from 1b5d974569 to d1fc603677, config revision is 00f9c08766
18:20 elukey: clean up 2017/2018 log files in /var/log/jmxtrans on kafka1013-22 - root partitions filling up
18:15 elukey: clean up 2017/2018 log files in /var/log/jmxtrans - root partition almost filled up on kafka1012
10:22 elukey: force remount of /mnt/hdfs on an-coord1001 (fuse-hdfs stuck)

2019-02-22

18:02 gehel: rolling upgrade on elasticsearch / cirrus / eqiad completed - T215931
18:00 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
18:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
17:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
17:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
17:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
17:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
17:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
17:14 bblack: cp5007: repooling into service - T216716
17:13 bblack: cp5006: repooling into service - T216717
17:06 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
17:06 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:29 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
15:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
15:32 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
15:15 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
15:15 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
14:23 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
14:22 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
14:03 moritzm: removed labvirt1008 from debmonitor (T216661)
14:02 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
14:02 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:54 akosiaris: reboot helium for kernel/microcode updates
13:25 moritzm: installing wireshark security updates
13:19 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:09 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:09 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
13:01 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:56 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
12:48 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
12:43 moritzm: rebooting auth1002 for kernel update
12:17 moritzm: rebooting tungsten to pick up updated microcode to address SSBD/L1TF
12:13 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
12:12 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
12:12 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
11:54 moritzm: various reboots of servers with Westmere-EP CPUs to pick up updated microcode to address SSBD/L1TF
11:41 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
11:41 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
11:34 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
11:34 moritzm: rebooting cp1008 for some microcode test
11:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
11:32 jijiki: Pooling thumbor2002 after upgrade - T214597
11:20 moritzm: imported intel-microcode 3.20180807a.2 for jessie-wikimedia (T216802)
11:01 godog: swift eqiad set thumbor write ACLs for wikipedia-meta-local-thumb
10:37 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
10:36 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
10:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
10:15 jijiki: Pooling thumbor1004 after upgrade - T214597
09:55 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
09:51 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
09:51 moritzm: fixed package state on mw2167
09:38 akosiaris@deploy1001: scap-helm citoid install -n staging -f citoid-staging-values.yaml stable/citoid [namespace: citoid, clusters: staging]
09:33 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
09:33 moritzm: installing tor security update on torrelay1001
09:33 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
09:32 _joe_: set pooled=inactive on mw1272, T211668
09:26 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
09:22 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 16s)
09:22 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
09:22 moritzm: updated tor packages to 0.3.5.8-1~d90.stretch+1
09:18 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
09:16 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 14s)
09:16 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
09:16 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
09:16 gehel: starting rolling upgrade on elasticsearch / cirrus / eqiad - T215931
08:52 godog: force ftpsync run on sodium after debian mirror update
08:19 moritzm: installing uriparser security updates
08:18 godog: temporarily stop prometheus global on prometheus2004 to take a snapshot
07:47 moritzm: installing krb5 updates for jessie
07:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1013 after MySQL upgrade (duration: 00m 46s)
07:28 elukey: manually delete WANCache:v:metawiki:translate-groups from memcache on mc1022 to test fix for T203786
07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to es1013 after MySQL upgrade (duration: 00m 45s)
07:15 _joe_: deactivating mw1272, memory problems
07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1013 after MySQL upgrade (duration: 00m 45s)
06:51 marostegui: Power cycle mw1272 as it crashed - T211668
06:49 marostegui: Stop MySQL on es1013 to upgrade MySQL
06:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1013 for MySQL upgrade (duration: 02m 50s)
06:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 after MySQL upgrade (duration: 02m 51s)
06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 for MySQL upgrade (duration: 02m 53s)
06:15 marostegui: Stop MySQL on db1087 for kernel and mysql upgrade
03:26 XioNoX: delete old gr-1/0/0 from cr1-eqsin - T213121
01:58 XioNoX: power-down cp5007 - T216716
01:40 XioNoX: power-down cp5006 - T216717
00:57 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Noop sync of labs settings (duration: 00m 44s)
00:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 46s)
00:45 ebernhardson@deploy1001: sync-file aborted: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 05s)
00:39 ebernhardson@deploy1001: Synchronized wmf-config/Wikibase.php: Deploy WikibaseCirrusSearch: Part III, Wikibase.php (duration: 00m 45s)
00:27 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy WikibaseCirrusSearch: Part II, InitialiseSettings.php (duration: 00m 46s)
00:23 ebernhardson@deploy1001: Synchronized wmf-config/extension-list: Deploy WikibaseCirrusSearch: Part I, extensionlist (duration: 00m 46s)
00:21 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 45s)
00:18 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 46s)
00:17 ebernhardson@deploy1001: sync-file aborted: T215931 (duration: 00m 00s)

2019-02-21

22:25 tzatziki: change pw for NazarSusP
22:17 volans: forcing a puppet run on A:ganeti
20:35 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
20:18 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.18
20:06 ladsgroup@deploy1001: Finished deploy [ores/deploy@5d937b1]: Drop accepting pickle altogether (T206333) (duration: 13m 17s)
19:58 bblack: eqsin: repooling user traffic
19:52 ladsgroup@deploy1001: Started deploy [ores/deploy@5d937b1]: Drop accepting pickle altogether (T206333)
19:35 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Drop obsolete Wikibase configs (T213713), Part II (duration: 00m 53s)
19:33 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Drop obsolete Wikibase configs (T213713), Part I (duration: 00m 52s)
19:32 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
19:32 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
19:25 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
19:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Set wmgWikibaseRepoIdGeneratorSeparateDbConnection to true for wikidata (T215147) (duration: 00m 56s)
18:59 ladsgroup@deploy1001: Finished deploy [ores/deploy@2d84709]: Change default task serializer of celery from pickle to json (T206333) (duration: 16m 54s)
18:46 jynus: shutting down db1114 T214720
18:42 ladsgroup@deploy1001: Started deploy [ores/deploy@2d84709]: Change default task serializer of celery from pickle to json (T206333)
18:33 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
18:30 robh: ignore icinga1001 alerts, rebooting it into hardware tests via T214760
18:29 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
18:28 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
18:28 ladsgroup@deploy1001: Finished deploy [ores/deploy@5d50713]: (no justification provided) (duration: 14m 37s)
18:13 ladsgroup@deploy1001: Started deploy [ores/deploy@5d50713]: (no justification provided)
17:54 robh: cp5007 rebooting into bios update and hardware testing via T216716
17:47 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
17:11 bblack: eqsin: restarting all varnish frontends to wipe cache after purge loss (site currently depooled) (skipping 5006/7 since they're being rebooted for bios flashing anyways)
17:10 robh: rebooting cp5006 to flash bios in memory troubleshooting steps via T216717
16:50 bblack: eqsin: restarting all varnish backends to wipe cache after purge loss (site currently depooled)
16:41 volans: applied hot band-aid patch to spicerack/remote.py on cumin2001 ( https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/481858 )
16:38 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
16:23 herron: updated phabricator.wikimedia.org spf record T216714
16:22 fsero: uploading scap3 3.9.0.1 package to trusty, jessie and stretch T216666
16:20 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:17 fsero: uploading scap3 3.9.0.1 package to trusty, jessie and stretch
16:17 fsero: updating scap3 to 3.9.0-1
15:57 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
15:52 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
15:23 moritzm: installing krb5 updates for jessie
15:07 herron: migrating ES shards away from logstash100[456] with "cluster.routing.allocation.exclude._name" : "logstash1004-production-logstash-eqiad,logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608
14:50 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
14:50 gehel@cumin2001: START - Cookbook sre.elasticsearch.force-shard-allocation
14:41 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@600e689]: Update to 0bb0a07 (duration: 04m 59s)
14:37 bblack: restart vhtcpd on cp5002 to debug multicast loss
14:36 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@600e689]: Update to 0bb0a07
13:57 godog: depool and reimage logstash1007 - T213898
13:25 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
13:20 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 16s)
13:19 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
13:19 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
13:19 jbond42: restarting hhvm and updateing apache on deploy1001.eqiad.wmnet
13:18 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
13:18 gehel: restarting rolling upgrade on elasticsearch / cirrus / codfw - T215931
12:50 jbond42: restarting hhvm and updateing apache on mwmaint1002.eqiad.wmnet
12:44 zeljkof: EU SWAT finished
12:42 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add img.raremaps.com at wgCopyUploadsDomains (T216638) (duration: 00m 52s)
12:40 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 20s)
12:39 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
12:38 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon (T216642) (duration: 00m 53s)
12:33 arturo: disable puppet in cloudnet2001-dev to test T216497
12:31 akosiaris@deploy1001: scap-helm mathoid finished
12:31 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
12:30 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
12:30 akosiaris@deploy1001: scap-helm mathoid upgrade --recreate-pods -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
12:27 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: (no justification provided) (duration: 00m 38s)
12:26 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: (no justification provided)
12:24 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: l thumbor2002.codfw.wmnet (duration: 00m 04s)
12:24 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: l thumbor2002.codfw.wmnet
12:24 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: l thumbor2002 (duration: 00m 08s)
12:24 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: l thumbor2002
12:23 arturo: importing openstack mitaka packages to reprepro @ install1002 (T216497)
12:17 arturo: enable puppet in install1002 (done testing T216497)
12:14 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Disable mobile main page special casing on huwiki (T216563) (duration: 00m 54s)
12:13 gilles@deploy1001: Finished deploy [3d2png/deploy@ca39432]: Updating repo (duration: 00m 29s)
12:13 gilles@deploy1001: Started deploy [3d2png/deploy@ca39432]: Updating repo
12:10 arturo: T216497 import reprepro key 7638D0442B90D010 (debian archive automatic signing key (8/jessie)
12:01 arturo: disable puppet in install1002 to test T216497
11:13 volans: upgraded spicerack to 0.0.19 on cumin[12]001
11:11 volans: uploaded spicerack_0.0.19-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
10:55 akosiaris: upgrade mathoid staging+production to latest helm chart
10:47 akosiaris@deploy1001: scap-helm mathoid finished
10:47 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
10:47 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
10:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
10:29 akosiaris@deploy1001: scap-helm mathoid finished
10:29 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
10:29 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
10:28 akosiaris@deploy1001: scap-helm mathoid finished
10:28 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
10:28 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
10:27 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml stable/mathoid [namespace: mathoid, clusters: staging]
10:26 akosiaris@deploy1001: scap-helm list finished
10:26 akosiaris@deploy1001: scap-helm list cluster codfw completed
10:26 akosiaris@deploy1001: scap-helm list cluster eqiad completed
10:26 akosiaris@deploy1001: scap-helm list [namespace: list, clusters: eqiad,codfw]
10:23 godog: on boron unblock trusty builds with umount /var/cache/pbuilder/base-trusty-amd64.cow/dev/ptmx
10:04 akosiaris: create citoid namespace on kubernetes eqiad codfw staging clusters T213194
10:04 akosiaris: create cxserver namespace on kubernetes eqiad codfw staging clusters T213195
09:35 volans: force rebooting unresponsive icinga1001 T214760
09:29 marostegui: Deploy schema change on s3 primary master (db1078) - T210713
09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 T210713 (duration: 00m 52s)
09:14 moritzm: temporarily stop prometheus@labs.service on labmon for journald restarts (part of security update)
08:40 marostegui: Deploy schema change on db1075 - T210713
08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 T210713 (duration: 00m 54s)
08:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 T210713 (duration: 00m 53s)
07:44 moritzm: rolling out remaining systemd security updates on jessie
07:12 marostegui: Deploy schema change on db1077 - this will generate lag on labsdb:s3 T210713
07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T210713 (duration: 00m 56s)
07:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 T210713 (duration: 00m 55s)
06:22 marostegui: Deploy schema change on db1123 - T210713
06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 T210713 (duration: 00m 57s)
05:46 bblack: repooling cp5010 - T214274
05:42 bblack: removing cp5010 downtimes from icinga - T214274
05:34 bblack: rebooting cp5010 for device name on swapped disk (depooled) - T214274
04:30 kart_: Finished: Fifth manual run of unpublished draft purge script for ContentTranslation (T216470)
04:16 XioNoX: Unplug Tata/NTT/PCCW from cr1-eqsin - T213121
03:21 XioNoX: replace cp5010 disk 1 - T214274
03:15 kart_: Fifth manual run of unpublished draft purge script for ContentTranslation (T216470)
02:44 XioNoX: depool eqsin - T213121
02:31 twentyafterfour: phabricator upgrade finished, service appears to be returned to normal
01:43 twentyafterfour: running phabricator database schema changes
01:38 twentyafterfour: now taking phabricator offline for upgrade
01:15 twentyafterfour: Taking phabricator offline momentarily for upgrade
01:01 twentyafterfour: set downtime in icinga for phab100*
00:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on metawiki and mediawikiwiki (T216065) (duration: 00m 54s)

2019-02-20

23:59 ppchelko@deploy1001: Finished deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions (duration: 01m 23s)
23:57 ppchelko@deploy1001: Started deploy [changeprop/deploy@5e4486a]: Purge varnish on revision restrictions
21:48 eileen: civicrm revision changed from 165fbf5894 to 1b5d974569, config revision is ccefa3716b
21:46 arlolra: Updated Parsoid to 9b204a0 (T153080, T169975, T215824)
21:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0 (duration: 09m 33s)
21:19 arlolra@deploy1001: Started deploy [parsoid/deploy@c4574d1]: Updating Parsoid to 9b204a0
21:08 _joe_: rolling restart of php-fpm to catch up with the tideways change
20:35 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.18 (duration: 00m 53s)
20:14 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.18/extensions/EventBus/includes/EventBusRCFeedEngine.php: Check for eventServiceName in config before accessing T216561 (duration: 00m 55s)
18:30 fdans@deploy1001: Finished deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts (duration: 11m 13s)
18:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956 (duration: 20m 19s)
18:19 fdans@deploy1001: Started deploy [analytics/refinery@ccf837e]: deploying refinery for new wikis and changes in scripts
18:04 mobrovac@deploy1001: Started deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956
17:22 sbisson@deploy1001: Synchronized php-1.33.0-wmf.18/extensions/Flow/modules/mw.flow.Initializer.js: SWAT: Unbreak reply clicks with existing widget (duration: 00m 58s)
17:08 hashar: contint1001: fix broken root ownership on zuul git deploy repo: sudo find /etc/zuul/wikimedia/.git -not -user zuul -exec chown zuul:zuul {} +
16:49 herron: migrating es shards away from logstash100[56] with "cluster.routing.allocation.exclude._name" : "logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608
16:40 twentyafterfour: started phd again, seems to be working now without killing the db
16:38 bblack: multatuli: upgrade gdnsd to 3.0.0-1~wmf1
16:36 godog: depool and reimage logstash1008 with stretch - T213898
16:26 twentyafterfour: stopped phd on phab1001 and scheduled downtime in icinga
16:24 bblack: authdns1001: upgrade gdnsd to 3.0.0-1~wmf1
16:19 twentyafterfour: stopped phd on phab1002
16:03 ottomata: removing spark 1 from Analytics cluster - T212134
15:55 bblack: authdns2001: upgrade gdnsd to 3.0.0-1~wmf1
15:37 fsero: restarting docker-registry service on systemd
15:35 moritzm: temporarily stop prometheus instances on prometheus1004 for systemd upgrade/journald restart
14:43 gehel@cumin2001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
14:35 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
14:35 volans: upgraded spicerack to 0.0.18 on cumin[12]001
14:34 volans: uploaded spicerack_0.0.18-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
14:00 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)
14:00 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-upgrade
13:59 gehel: rolling upgrade of elasticsearch / cirrus / codfw to 5.6.14 - T215931
13:51 godog: prometheus on prometheus2004 crashed/exited after journald upgrade -- starting up again now
13:00 jbond42: rolling restarts for hhvm in eqiad
12:28 volans: upgraded spicerack to 0.0.17 on cumin[12]001
12:25 volans: uploaded spicerack_0.0.17-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
12:08 moritzm: restarted ircecho on kraz.wikimedia.org
11:46 jbond42: rolling restarts for hhvm in codfw
11:28 akosiaris: rebuild and re-upload rsyslog_8.38.0-1~bpo9+1wmf1_amd64.changes to apt.wikimedia.org/stretch-wikimedia to have mmkubernetes package
10:36 marostegui: Deploy schema change on db1095:3313 - T210713
10:04 marostegui: Deploy schema change on dbstore1004:3313 - T210713
09:57 moritzm: installing systemd security updates on jessie hosts
09:33 marostegui: Deploy schema change on db2043 (s3 codfw master), lag will be generated on s3 codfw - T210713
09:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1109 (duration: 00m 52s)
08:48 moritzm: powercycling rdb1001 for a test
07:45 moritzm: installing gnupg2 updates on stretch
07:14 marostegui: Deploy schema change on s1 primary master (db1067) - T210713
07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 T210713 (duration: 00m 52s)
07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s)
06:54 oblivian@deploy1001: Synchronized wmf-config/profiler.php: Fix the tideways setup (duration: 00m 52s)
06:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1109 after kernel upgrade (duration: 00m 52s)
06:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 T210713 (duration: 00m 51s)
06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 T210713 (duration: 00m 51s)
06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s)
06:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 after kernel upgrade (duration: 00m 52s)
06:18 marostegui: Stop MySQL on db1109 for kernel and mysql upgrade
06:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 for kernel and mysql upgrade (duration: 00m 52s)
06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 T210713 (duration: 01m 05s)
04:45 XioNoX: add avoid-paths WIRESTAR-OPTICALTEL to cr2-eqdfw
02:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956 (duration: 22m 37s)
01:52 mobrovac@deploy1001: Started deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956
00:24 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.17/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 52s)
00:23 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.18/skins/MinervaNeue/resources/skins.minerva.content.styles/lists.less: Revert switch to outside list style from ordered lists (duration: 00m 59s)
00:05 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: SWAT T215969 Return cirrussearch master timeout back to the default value (duration: 00m 57s)

2019-02-19

23:51 ebernhardson: restarted ferm on relforge1001
23:50 ebernhardson: temporarly stop ferm on relforge1001 to test where a connection is being blocked
20:49 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.18
20:34 thcipriani@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.18 and rebuild l10n cache (duration: 30m 31s)
20:07 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
20:04 thcipriani@deploy1001: Started scap: testwiki to php-1.33.0-wmf.18 and rebuild l10n cache
20:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
19:57 thcipriani: restarting ci-jenkins for plugin update
19:49 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.13 (duration: 11m 52s)
19:39 gtirloni: re-pooled labsdb1011 T216481
19:09 andrewbogott: rebooting cloudvirt1009 to poke around in the bios
18:20 thcipriani: starting branch-cut for 1.33.0-wmf.18
17:55 herron: temporarily increased eqiad logstash elasticsearch low disk watermark to 87% (will restore to 85% when eqiad expansion hosts are fully online)
17:52 jijiki: Restarting memcache on mc1027 - T208844
17:00 hashar: Offlined compiler1002.puppet-diffs.eqiad.wmflabs from Jenkins. Its disk is corrupt | T216513
16:39 gtirloni: depooled labsdb1011 T216481
16:33 moritzm: installing libssh update from stretch point release
16:28 jforrester@deploy1001: Synchronized php-1.33.0-wmf.17/includes/specials/pagers/ActiveUsersPager.php: T216200 Hot deploy variable name fix for ActiveUsersPager query (duration: 00m 48s)
16:26 herron: enabling elasticsearch on new eqiad hosts logstash101[0-2]
16:18 gtirloni: re-pooled labsdb1010 T216481
16:07 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
16:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
15:47 jijiki: Reimaging thumbor2002 to stretch - T214597
15:38 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
15:32 hashar: apt-get upgrade on compiler1001 and compiler1002.puppet-diffs.eqiad.wmflabs
15:27 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
15:25 hashar: Started instance compiler1002.puppet-diffs.eqiad.wmflabs via Horizon. It was in shutoff state | T216513
15:10 _joe_: uploading tideways-xhprof_5.0.0~beta3 to reprepro T176916
15:09 gtirloni: depooled labsdb1010 T216481
14:53 jynus: stopping db2089 for hw maintenance T216240
14:41 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
14:40 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
14:36 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
14:35 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
14:31 gehel@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
14:30 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
14:30 gehel: rolling upgrade of elasticsearch on relforge - T215931
14:16 jynus: stop db2090 for reboot testing T216240
14:04 gtirloni: running `maintain-views --all-databases --replace-all --clean --debug` on labsdb1010 (T216481)
13:44 Amir1: mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --interface-admin Ladsgroup
13:43 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=testwikidatawiki --force --sysop Ladsgroup (T215919)
13:31 moritzm: installing rssh update for jessie
13:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1118 T210713 (duration: 00m 46s)
13:23 gtirloni: running `maintain-views --all-databases --replace-all --clean --debug` on labsdb1009 (T216481)
12:57 zeljkof: EU SWAT finished
12:56 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for Kickstarter Edit-a-thon (T215839) (duration: 00m 43s)
12:50 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgArticleCountMethod = any on fiwikinews (T216333) (duration: 00m 45s)
12:37 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add namespace Додатак on srwiktionary (T216343) (duration: 00m 46s)
12:29 _joe_: creating gerrit repo operations/debs/tideways-xhprof T176916
12:28 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for WikiProject Women in red, enwiki (T215295) (duration: 00m 47s)
12:19 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Modifying configuration about Chinese Wikiversity (T212919) (duration: 00m 48s)
11:59 marostegui: Deploy schema change on db1118 - T210713
11:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1118 T210713 (duration: 00m 46s)
11:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 (duration: 00m 46s)
11:49 moritzm: installing ruby-rack security updates
11:26 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Launch performance perception survey on eswiki (duration: 00m 46s)
11:22 jynus: stop and restart db1064
11:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 T210713 (duration: 00m 46s)
11:10 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 (duration: 00m 46s)
11:05 marostegui: Deploy schema change on dbstore1002
10:25 marostegui: Deploy schema change on db1083 - T210713
10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T210713 (duration: 00m 46s)
10:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1093 after kernel upgrade (duration: 00m 46s)
09:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1093 after kernel upgrade (duration: 00m 46s)
09:42 mforns@deploy1001: Finished deploy [analytics/refinery@0d7ec19]: deploying refinery to update EL sanitization whitelist (duration: 07m 49s)
09:34 mforns@deploy1001: Started deploy [analytics/refinery@0d7ec19]: deploying refinery to update EL sanitization whitelist
09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 T210713 (duration: 00m 45s)
09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 on API after kernel upgrade (duration: 00m 46s)
09:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1093 after kernel upgrade (duration: 00m 46s)
08:56 _joe_: experimenting with php-fpm configuration on mwdebug1001 for T176916
08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 for kernel upgrade (duration: 00m 45s)
08:55 hashar: Cleaning contint1001 / partition
08:50 marostegui: Deploy schema change on db1089 - T210713
08:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T210713 (duration: 00m 46s)
08:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T210713 (duration: 00m 49s)
07:51 marostegui: Drop ep_* tables on s1 - T174802
07:50 moritzm: installing systemd security updates on stretch
07:46 marostegui: Reboot db1106 for kernel upgrade (and remove debug from kernel) T216240 T216273
07:21 marostegui: Drop ep_* tables on s3 - T174802
06:56 marostegui: Deploy schema change on db1106 - this will generate lag on labsdb:s1 T210713
06:56 marostegui: Deploy schema change on db1106 - T210713
06:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T210713 (duration: 00m 52s)
05:31 XioNoX: delete local pref for peering sessions in ulsfo - T204281
05:17 XioNoX: deleted previously deactivated BGP_community_actions terms - T204281
00:01 XioNoX: disable BGP to Zayo on cr2-codfw for intrusive testing - T215193

2019-02-18

20:19 gtirloni: icinga2001 ran puppet ahead of schedule (enable tools-checker-toolsdb monitor)
18:26 jynus: setting clouddb1001 in read_write mode
18:14 volans: upgraded to spicerack 0.0.16-1 cumin[12]001
18:12 volans: uploaded spicerack_0.0.16-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
18:08 jynus: killing mysql on labsdb1005
18:08 jynus: disabled puppet and edited my.cnf on labsdb1005
17:56 jynus: restarting labsdb1004
17:53 jynus: set clouddb1001 in read_only=1
17:50 jijiki: Reimaging thumbor1004 to stretch - T214597
15:41 jynus: performing es2 & es3 backups into es2002
15:21 jynus: move logical backups to subdirectory T210292
14:29 moritzm: rebooting mw2167 for kernel tests
13:59 marostegui: Drop ep_* tables from s7 - T174802
13:25 jijiki: Depooling thumbor1004 to check if the rest of our hosts can handle the load without it - T214597
12:34 moritzm: installing brltty bugfix update from stretch point release
12:31 moritzm: installing upgrading stat1005 to buster
12:28 XioNoX: update clouddb_return term from cloud-in4 on cr1/2-eqiad - T216353
11:53 moritzm: installing hdparm bugfix update from stretch point release
11:36 moritzm: installing uriparser security updates
11:11 moritzm: installing c3p0 security updates
10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T210713 (duration: 00m 46s)
10:54 jijiki: Reimaging thumbor2002 to stretch - T214597
10:40 marostegui: Drop tables ep_* from s2 (cswiki nlwiki ptwiki svwiki) T174802
09:50 marostegui: Deploy schema change on db1105:3311 T210713
09:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 T210713 (duration: 00m 46s)
09:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 T210713 (duration: 00m 46s)
09:28 marostegui: Drop ep_* from s6 (ruwiki) - T174802
09:16 marostegui: Deploy schema change on db1099:3311 - T210713
09:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 T210713 (duration: 00m 48s)
09:08 marostegui: Deploy schema change on dbstore1003:3311 and dbstore1001:3311 - T210713
08:27 marostegui: Drop ep_* tables from s5 (srwiki) - T174802
08:23 marostegui: Deploy schema change on s1 codfw master (db2048), lag will be generated on s1 codfw - T210713
07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1119 after mysql upgrade (duration: 00m 46s)
06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1119 into API service after mysql upgrade (duration: 00m 46s)
06:49 marostegui: Reboot db2085 to disable debug mode on kernel T216273
06:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1119 after mysql upgrade (duration: 00m 46s)
06:29 marostegui: Stop MySQL on db1119 for mysql and kernel upgrade
06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 for mysql upgrade (duration: 01m 01s)
05:55 marostegui: Deploy schema change on s8 primary master (db1071) - T210713
05:52 marostegui: Set dbstore1002 on read only to start the migration T210478 T215589

2019-02-17

21:20 bstorm_: The slave of labsdb1005.eqiad.wmnet is now clouddb1001.clouddb-services.eqiad.wmflabs
13:14 XioNoX: add term labsdb_return to cloud-in4 - T216353

2019-02-16

16:26 ariel@deploy1001: Finished deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures (duration: 00m 03s)
16:25 ariel@deploy1001: Started deploy [dumps/dumps@8f83eea]: fix up multistream index file recombines for large files; better errors for misc dumps failures
14:21 arturo: T194855 cloudvirt1020 is poweroff, waiting for disk setup before installing
00:20 XioNoX: add port 22 in cloud-in4 term labsdb

2019-02-15

20:40 andrewbogott: enabled virtualization (all three settings) on cloudvirt1019
19:41 arturo: T193264 reimaging cloudvirt1019 to get mitaka/stretch
18:51 arturo: T193264 icinga downtime cloudvirt1019 for 1 week
18:44 bstorm_: stopped replication and then mariadb on labsdb1004
16:52 cdanis: correction, needed to increment version; adding backported rasdaemon 0.6.0-1.2+deb8u2 to jessie-wikimedia
16:48 cdanis: adding backported rasdaemon 0.6.0-1.2+deb8u1 to jessie-wikimedia
16:29 bblack: reprepro: uploaded gdnsd-3.0.0-1~wmf1 to stretch-wikimedia
15:45 moritzm: rebooting auth1001 for kernel security update
14:50 moritzm: installing unbound update from stretch point release
14:45 moritzm: removed labvirt1012 from debmonitor (got renamed to cloudvirt1012) (T216190)
14:06 moritzm: rebooting mwlog1001 for kernel security update
13:54 moritzm: rebooting mwlog2001 for kernel security update
13:46 jbond42: install tar security updates
13:19 moritzm: rolling reboot of mwdebug servers in eqiad to pick up SSBD-enabled qemu
13:12 gtirloni: reboot cloudvirt1020
13:11 arturo: T216239 labvirt1019 has been drained of any workload
13:06 moritzm: installing NSS security updates
12:42 moritzm: installing squid3 security updates
12:30 jynus: stop db2089 mysql instances for reboot testing T216240
12:30 arturo: T216239 schedule 1week of icinga downtime for labvirt1019
10:48 akosiaris: upgrade docker on contint2001 to 18.06.2 T216236
10:42 akosiaris: upgrade docker on contint1001 to 18.06.2 T216236
10:35 gtirloni: reboot cloudvirt1019
09:44 gehel: repool maps100[12]
09:33 moritzm: imported php-defaults debs to thirdparty/php72
08:42 akosiaris: restart gerrit to pick up https://gerrit.wikimedia.org/r/490640 T177868
08:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 46s)
08:28 moritzm: rolling restart of apertium to pick up Python 3.4 security update
07:55 godog: bounce prometheus@ops on prometheus2004 to take a snapshot
06:41 marostegui: Stop puppet on labsdb1005 to leave "max_user_connections" on my.cnf - T216170 T216208
06:39 marostegui: Restart labsdb1005 with max_user_connections = 20 T216208
06:17 marostegui: Deploy schema change on db1109 - T210713
06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 49s)
06:13 marostegui: Reload haproxy on dbproxy11 to repool labsdb1009
00:39 mutante: puppetmaster1001: sudo puppet node clean bast3003.wikimedia.org ; sudo puppet node deactivate bast3003.wikimedia.org (T216199)
00:15 jynus: setting labsdb1005 back into read-write

2019-02-14

23:47 jynus: restarting labsdb1005 mysql in read only mode
23:37 niharika29@deploy1001: Finished deploy [scholarships/scholarships@25ea138]: Update app with updated dependencies to mitigate PHPMailer error T215302 (duration: 00m 02s)
23:37 niharika29@deploy1001: Started deploy [scholarships/scholarships@25ea138]: Update app with updated dependencies to mitigate PHPMailer error T215302
22:07 andrewbogott: rebuilding labvirt1012 as cloudvirt1012, T216190
20:38 bstorm_: Restarted mariadb on labsdb1005 for https://wikitech.wikimedia.org/wiki/Incident_documentation/20190214-labsdb1005
20:09 ejegg: updated fundraising CiviCRM from 02ea871b88 to 165fbf5894
19:42 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.17/extensions/GrowthExperiments/modules/help: SWAT: Help Panel: Fix IME broken in help panel search T216131 (duration: 00m 54s)
19:14 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Stop NavPopups gadget conflict with PagePreviews on Wikivoyage T214878 (duration: 00m 54s)
19:01 mutante: scandium - deleting parsoid clone dir and running puppet one more time, to fix permissions to allow wikidev
18:52 mutante: scandium - deleting parsoid clone dir and running puppet one more time, to fix permissions to allow wikidev
18:12 mutante: scandium - deleting parsoid clone dir and running puppet
18:03 fsero: upgrading tiller to 2.12.2 on eqiad
17:34 godog: bounce rsyslog on wezen/lithium, tls listener timeout in icinga
16:59 moritzm: restarting apertium-apy on scb1001 to pick up Python security update
16:39 marostegui: Depool labsdb1009 - T210713
16:26 fsero: upgrading tiller on codfw
16:11 fsero: updating tiller version on staging cluster
16:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2085 - T214840 (duration: 00m 52s)
15:50 fsero: building and publishing new tiller docker image on boron
15:50 END: (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) (volans@cumin1001)
15:43 START: - Cookbook sre.hosts.upgrade-and-reboot (volans@cumin1001)
15:28 volans: upgraded spicerack to v0.0.15 on cumin[12]001
15:26 volans: uploaded spicerack_0.0.15-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
15:12 marostegui: Clear idrac logs from db2085 - T214840
14:45 godog: depool and stop logstash1009 for stretch reimage - T213898
14:20 marostegui: Stop MySQL on db2085 for on-site maintenance - T214840
14:12 jijiki: Enabling puppet on thumbor* servers - T214597
13:39 arturo: T215892 icinga downtime cloudvirt1024 for 2 weeks
12:22 zeljkof: EU SWAT finished
12:21 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.17/extensions/ExternalGuidance/: SWAT: Fix the eventlogging schema definition as per manifest_version=2 (duration: 00m 55s)
11:43 _joe_: restarting hhvm on mw1338, hot tc exhausted T216084
11:04 _joe_: upgrading python3-etcd on stretch T209136
11:03 jbond42: rolling security updates for curl
11:02 jijiki: Disabling puppet on thumbor* servers - T214597
10:59 moritzm: installing python3.4 security updates
10:53 godog: bounce prometheus instances on prometheus2004 to take a snapshot
08:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T214840 (duration: 00m 52s)
07:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 T210713 (duration: 00m 54s)
07:36 marostegui: Stop MySQL on db1106 for reboot - T214840
06:10 marostegui: Deploy schema change on db1087 with replication, lag will be generated on labsdb:s8 T210713
06:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1087 T210713 (duration: 00m 55s)
01:52 mutante: scandium - removing parsoid deploy dir and letting puppet re-clone it after merging gerrit fix 484602 - replace manual clone with proper puppetization (T201366)
01:52 mutante: scandium - removing parsoid deploy dir and letting puppet re-clone it after merging gerrit fix 484602 - replace manual hack with proper puppet
01:15 mutante: phab1001 - phabricator mail config converted to cluster.mailers to adjust to upstream change (T212989)
00:36 bd808@deploy1001: Finished deploy [scholarships/scholarships@1d89fe2]: Live hack PHPMailer namespace T215302 (duration: 00m 02s)
00:36 bd808@deploy1001: Started deploy [scholarships/scholarships@1d89fe2]: Live hack PHPMailer namespace T215302
00:32 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ORES (damaging only) on itwiki (T211032) (duration: 00m 53s)
00:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel search on cswiki and kowiki (T209301) (duration: 00m 55s)

2019-02-13

23:42 niharika29@deploy1001: Finished deploy [scholarships/scholarships@1d89fe2]: Update scholarships app for 2019 cycle T215302 (duration: 00m 02s)
23:42 niharika29@deploy1001: Started deploy [scholarships/scholarships@1d89fe2]: Update scholarships app for 2019 cycle T215302
21:31 jijiki: Restarting nutcracker on scb100*.eqiad.wmnet
20:54 mutante: ruthenium - shell access for parsoid-testers revoked by puppet, please use scandium.eqiad.wmnet (T201366)
20:44 otto@deploy1001: Started restart [eventstreams/deploy@07033d4]: bouncing eventstreams to apply page-links-change stream config
20:43 mutante: ms-be2021 - powercycling
20:09 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.17 (duration: 00m 53s)
20:08 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.17
19:55 mforns@deploy1001: Finished deploy [analytics/refinery@5f1461e]: Deploying analytics refinery with refinery-source v0.0.85 jars (duration: 07m 36s)
19:48 mforns@deploy1001: Started deploy [analytics/refinery@5f1461e]: Deploying analytics refinery with refinery-source v0.0.85 jars
18:13 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1014 (duration: 00m 52s)
18:06 godog: reimage prometheus2003 - T187987
18:01 krinkle@deploy1001: Synchronized php-1.33.0-wmf.17/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Id70fdfa62ef / T215611 (duration: 00m 55s)
17:49 marostegui: Stop MYSQL on db1114 for onsite maintenance - T214720
17:25 jijiki: Pooling mw1299 back - T215569
17:06 cmjohnson1: db1106, troubleshooting idrac issue and updating f/w
16:58 otto@deploy1001: scap-helm eventgate-analytics finished
16:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
16:30 elukey: reimage stat1005 to Debian Buster (again)
16:22 otto@deploy1001: scap-helm list finished
16:22 otto@deploy1001: scap-helm list cluster staging completed
16:22 otto@deploy1001: scap-helm list [namespace: list, clusters: staging]
16:13 otto@deploy1001: scap-helm eventgate-analytics finished
16:13 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
16:13 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:46 marostegui: Stop MySQL on db1106 for onsite maintenance - this will generate lag on s1 labs - T214840
15:28 jynus: stop and upgrade es1014
15:27 otto@deploy1001: scap-helm eventgate-analytics finished
15:27 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:27 otto@deploy1001: scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
15:17 akosiaris@deploy1001: scap-helm eventgate-analytics finished
15:17 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:17 akosiaris@deploy1001: scap-helm eventgate-analytics install -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml --set service.port=31193 ../ [namespace: eventgate-analytics, clusters: staging]
15:16 moritzm: updated thirdparty/php72 component to PHP 7.2.15
15:10 akosiaris@deploy1001: scap-helm eventgate-analytics finished
15:10 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:10 akosiaris@deploy1001: scap-helm eventgate-analytics install -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml --set service.port=31193 ../ [namespace: eventgate-analytics, clusters: staging]
15:09 akosiaris@deploy1001: scap-helm eventgate-analytics install -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml ../ [namespace: eventgate-analytics, clusters: staging]
15:08 akosiaris@deploy1001: scap-helm eventgate-analytics finished
15:08 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:08 akosiaris@deploy1001: scap-helm eventgate-analytics install --dry-run --debug -f /srv/scap-helm/eventgate/eventgate-analytics-staging-values.yaml ../ [namespace: eventgate-analytics, clusters: staging]
15:05 akosiaris@deploy1001: scap-helm eventgate-analytics finished
15:05 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
15:05 akosiaris@deploy1001: scap-helm eventgate-analytics install --dry-run --debug -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
15:05 akosiaris@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics --dry-run --debug [namespace: eventgate-analytics, clusters: staging]
14:53 otto@deploy1001: scap-helm eventgate-analytics finished
14:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
14:53 otto@deploy1001: scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
14:25 elukey: reimage stat1005 back to stretch to test GPU drivers
14:06 godog: cancel https://integration.wikimedia.org/ci/job/operations-mw-config-composer-test-docker/12236 to unblock test-prio zuul queue
14:05 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120, depool es1014 (duration: 00m 52s)
12:34 arturo: T216030 icinga downtime cloudvirt1018 for 2 weeks
12:32 arturo: T216030 T216004 rebooting cloudvirt1018
11:55 moritzm: installing avahi security updates
11:49 jynus: stop and upgrade db1120
11:43 moritzm: installing golang updates on jessie
11:41 volans: upgraded spicerack on cumin[12]001 to v0.0.14
11:38 volans: uploaded spicerack_0.0.14-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
11:33 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1120 (duration: 00m 53s)
11:11 moritzm: installing postgis security updates
09:46 moritzm: installing golang security updates
09:33 gtirloni: labsdb1005 rebooted server
09:26 gtirloni: labsdb1005 stopped mysql
09:22 marostegui: Stop MySQL on db1106 - T214840
09:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T214840 (duration: 00m 53s)
08:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 (duration: 00m 53s)
06:46 vgutierrez: uploaded acme-chief 0.10 to apt.wikimedia.org (buster) - T215925
06:18 marostegui: Deploy schema change on db1104 - T210713
06:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 01m 07s)
06:12 marostegui: Stop MySQL on db2085 to keep debugging kernel issues - T214840
01:31 thcipriani@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add ExternalGuidance extension T213076 (part 3) (duration: 00m 53s)
01:30 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add ExternalGuidance extension T213076 (part 2) (duration: 00m 53s)
01:15 thcipriani@deploy1001: Finished scap: SWAT: Add ExternalGuidance extension T213076 (part I: build l10n and sync code) (duration: 27m 51s)
00:47 thcipriani@deploy1001: Started scap: SWAT: Add ExternalGuidance extension T213076 (part I: build l10n and sync code)
00:41 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.17/extensions/Thanks/modules/ext.thanks.mobilediff.css: SWAT: Follow ups to I807f729c1b1a9e9b5952685bb18f540f81d70f47 (duration: 00m 55s)
00:27 XioNoX: merge VRRP Icinga Check

2019-02-12

23:14 jforrester@deploy1001: Finished scap: Another full scap, hoping to find the new i18n in RL for T214482 T215471 T215472 (duration: 06m 01s)
23:09 foks: removed 4 files for legal compliance
23:08 jforrester@deploy1001: Started scap: Another full scap, hoping to find the new i18n in RL for T214482 T215471 T215472
22:47 jforrester@deploy1001: Finished scap: Full scap for new i18n and code for T214482 T215471 T215472 (duration: 18m 03s)
22:29 jforrester@deploy1001: Started scap: Full scap for new i18n and code for T214482 T215471 T215472
21:38 robh: icinga1001 in hardware testing, dont mess with it T214760
21:10 robh: working on troubleshooting icinga1001 via T214760
20:58 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/Wikibase/view/resources/resources.php: Hot-deploy I74f6389ae for other code, file 2 (duration: 00m 52s)
20:57 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/Wikibase/view/lib/resources.php: Hot-deploy I74f6389ae for other code, file 1 (duration: 00m 51s)
20:52 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/resources/Resources.php: Hot-deploy If0d7b687e for other code (duration: 00m 54s)
20:06 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: Group0 to 1.33.0-wmf.17
19:59 thcipriani@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.17 and rebuild l10n (duration: 18m 54s)
19:40 thcipriani@deploy1001: Started scap: testwiki to php-1.33.0-wmf.17 and rebuild l10n
19:37 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.12 (duration: 03m 10s)
19:32 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.9 (duration: 10m 05s)
18:48 thcipriani: make-wmf-branch 1.33.0-wmf.17
17:54 chaomodus: notebook1003 - restarted nagios-nrpe-server T212824
17:04 marostegui: Start MySQL again on db2085 for s1 and s8 - T214840
16:18 akosiaris: refresh kubernetes default egress policy T211247
15:58 akosiaris@deploy1001: scap-helm eventgate-analytics finished
15:58 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
15:58 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
15:58 akosiaris@deploy1001: scap-helm eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad,codfw]
15:46 akosiaris: create namespaces for eventgate-analytics on eqiad/codfw/staging cluster T211247 T213194
15:45 moritzm: rebooting db2085 for some tests
15:38 marostegui: Stop MySQL on db2085 - T214840
15:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2085 - T214840 (duration: 00m 47s)
15:30 otto@deploy1001: scap-helm --help finished
15:30 otto@deploy1001: scap-helm --help cluster codfw completed
15:30 otto@deploy1001: scap-helm --help cluster eqiad completed
15:30 otto@deploy1001: scap-helm --help [namespace: --help, clusters: eqiad,codfw]
15:03 ejegg: updated fundraising CiviCRM from a541a83cb2 to 02ea871b88
14:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1092 (duration: 00m 46s)
14:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More api traffic to db1092 (duration: 00m 44s)
14:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1092 (duration: 00m 46s)
13:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some api traffic to db1092 (duration: 00m 46s)
13:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1092 (duration: 00m 47s)
13:39 vgutierrez: uploaded acme-chief 0.9 to apt.wikimedia.org (stretch) - T207389 T213737
12:57 moritzm: installing openssl1.0 security updates
12:30 zeljkof: EU SWAT finished
12:30 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add https://polona.pl/ to $wgCopyUploadsDomains (T215501) (duration: 00m 46s)
12:19 moritzm: install ghostscript security updates on scb*
12:15 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create extendedconfirmed user group for viwiki (T215493) (duration: 00m 47s)
12:10 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Rollbackers User Group Right on azwiki (T215200) (duration: 00m 47s)
12:03 marostegui: Stop MySQL on db1092 to upgrade mysql and kernel
11:27 moritzm: rebooting stat1005
11:20 moritzm: installing ghostscript security updates on remaining thumbor hosts
10:25 marostegui: Deploy schema change on db1092 T210713
10:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 (duration: 00m 46s)
10:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 (duration: 00m 46s)
10:00 moritzm: installing ghostscript security updates on thumbor1001
09:36 moritzm: reimaging stat1005 to buster
08:20 marostegui: Deploy schema change on db1101:3318 - T210713
08:20 marostegui: Depool db1101:3318 - T210713
08:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 (duration: 00m 46s)
08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 (duration: 00m 49s)
07:49 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@125354e]: maintain symlink for old venv path with new virtualenv deploy script (duration: 03m 55s)
07:46 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@125354e]: maintain symlink for old venv path with new virtualenv deploy script
07:40 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy (take 2) (duration: 04m 14s)
07:35 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy (take 2)
07:31 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy (duration: 01m 07s)
07:30 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@125354e]: testing simplified virtualenv deploy
07:26 elukey: update analytics-in4 term mysql-dbstore on cr1/cr2 eqiad
07:09 marostegui: Rename ep_* tables on db1089 (s1) - T174802
06:33 kart_: Finished fourth manual run of unpublished draft purge script (T203059)
06:14 marostegui: Deploy schema change on db1099:3318 T210713
06:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 (duration: 00m 52s)
06:04 kart_: Fourth manual run of unpublished draft purge script (T203059)
02:18 thcipriani: restarting gerrit due to high load
00:49 ebernhardson@deploy1001: Finished scap: SWAT: full sync for gerrit:489309 i18n (duration: 18m 20s)
00:30 ebernhardson@deploy1001: Started scap: SWAT: full sync for gerrit:489309 i18n
00:28 ebernhardson@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: gerrit:489780 T214515 Promote new wbsearchentities profiles to default in de, fr, es (duration: 00m 46s)
00:13 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/CentralNotice/: SWAT Merge branch 'master' into wmf_deploy I8e52d222eb (duration: 00m 49s)
00:05 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Stop setting wgSessionsInObjectCache, it's being removed from MW I2946b5b9a (duration: 00m 47s)

2019-02-11

23:22 cdanis: T214760 icinga2001% sudo killall nsca
22:53 cdanis: icinga.w.o-->icinga2001 DNS change deployed T214760
22:40 cdanis: icinga1001 now passive T214760
22:34 cdanis: failing over icinga to icinga2001
21:33 arlolra: Updated Parsoid to b4b9603 (T208901, T215537, T213468, T215638)
21:24 arlolra@deploy1001: Finished deploy [parsoid/deploy@4e9b142]: Updating Parsoid to b4b9603 (duration: 09m 33s)
21:22 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: Use newer RCFeed config for EventBus based recentchange event - T215834 (duration: 00m 47s)
21:20 ottomata: deploying mediawiki-config change for update to EventBus RCFeed config (no-op)
21:16 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c6a6285]: Weekly GUI deploy (duration: 11m 54s)
21:14 arlolra@deploy1001: Started deploy [parsoid/deploy@4e9b142]: Updating Parsoid to b4b9603
21:13 mobrovac@deploy1001: Finished deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755 (duration: 03m 47s)
21:09 mobrovac@deploy1001: Started deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755
21:04 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c6a6285]: Weekly GUI deploy
20:08 ppchelko@deploy1001: Finished deploy [changeprop/deploy@bdb4740]: Update dependencies, minor refactor, safer deduplication, T207329 (duration: 01m 37s)
20:07 ppchelko@deploy1001: Started deploy [changeprop/deploy@bdb4740]: Update dependencies, minor refactor, safer deduplication, T207329
19:42 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106, db1118 with full weight (duration: 00m 46s)
19:34 catrope@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Remove main page special casing from lawiki (T215709) (duration: 00m 46s)
19:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgRestrictionLevels on Serbian projects (T215653) (duration: 00m 46s)
19:16 catrope@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/GrowthExperiments/: Help panel search instrumentation (T211166) (duration: 00m 47s)
19:08 catrope@deploy1001: Synchronized wmf-config/throttle.php: Lift account creation cap for edit-a-thon (T215069) (duration: 00m 47s)
19:08 jijiki: Repooled thumbor1004 - T215411
18:50 robh: thumbor1004 rebooted and updated firmware T215411
18:50 robh: thumbor1004 rebooted and updated firmware
16:49 jynus: stop, upgrade and restart db1106
16:36 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
16:31 marostegui: Reverse password for globaldev user on dbstore1002 - T200801
16:29 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 00m 52s)
15:49 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1118 (duration: 00m 48s)
15:24 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY (duration: 00m 47s)
15:23 marostegui: Relohad haproxy on dbproxy1010 to depool labsdb1011 - https://phabricator.wikimedia.org/T212308
15:21 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php, add conditional setting of useEntitySourceBasedFederation (duration: 00m 47s)
15:20 marostegui: Repool labsdb1010 - T212308
15:19 jynus: add missing grants to db1118
15:07 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Revert, second try (duration: 00m 47s)
15:00 addshore@deploy1001: sync-file aborted: Wikibase.php, add conditional setting of useEntitySourceBasedFederation (duration: 00m 01s)
14:55 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Revert (duration: 00m 45s)
14:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1118 for the first time (duration: 00m 47s)
14:51 mbsantos@deploy1001: Finished deploy [tilerator/deploy@d546183] (stretch): Updating maps2004 tilerator for the stretch migration work (duration: 00m 39s)
14:50 mbsantos@deploy1001: Started deploy [tilerator/deploy@d546183] (stretch): Updating maps2004 tilerator for the stretch migration work
14:48 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@173adbe] (stretch): Updating maps2004 kartotherian for the stretch migration work (duration: 00m 21s)
14:48 mbsantos@deploy1001: Started deploy [kartotherian/deploy@173adbe] (stretch): Updating maps2004 kartotherian for the stretch migration work
14:47 moritzm: installing curl security updates on trusty
14:21 marostegui: Remove staging from dbstore1003 - T210478
14:16 godog: depool and take a snapshot of prometheus data for all instances on prometheus2003 - T187987
14:09 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010 - T212308
14:08 marostegui: Deploy schema change on db1116:3318 - T210713
12:21 godog: bounce rsyslogd on lithium / wezen, syslog tls listener stuck
12:19 zeljkof: EU SWAT finished
12:18 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for Senior Citizens Write Wikipedia course (T215618) (duration: 00m 48s)
12:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Clean expired throttle rules (duration: 00m 48s)
10:47 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 46s)
10:46 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 48s)
10:41 jynus: upgrading mariadb client on cumin* hosts
10:27 mvolz@deploy1001: scap-helm zotero finished
10:27 mvolz@deploy1001: scap-helm zotero cluster codfw completed
10:27 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
10:24 mvolz@deploy1001: scap-helm zotero finished
10:24 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
10:24 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
10:19 marostegui: Add dbstore1005:3350 to tendril and zarcillo - T210478
10:17 mvolz@deploy1001: scap-helm zotero finished
10:17 mvolz@deploy1001: scap-helm zotero cluster staging completed
10:17 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml --version=0.0.1 stable/zotero [namespace: zotero, clusters: staging]
10:17 jynus: restart db1114
09:38 marostegui: Stop all mysql instances on dbstore1005 for reboot
09:11 marostegui: Stop all mysql instances on dbstore1003 for reboot
08:17 moritzm: removed cloudcontrol2001-dev.codfw.wmnet from debmonitor (actual hostname in use is cloudcontrol2001-dev.wikimedia.org)
08:07 marostegui: Deploy schema change on s8 codfw master (db2045) - this will generate lag on codfw T210713
07:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1100 (duration: 00m 46s)
07:39 marostegui: Deploy schema change on s7 primary master (db1062) - T210713
07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give api traffic to db1100 (duration: 00m 46s)
07:18 marostegui: Stop all mysql instances on dbstore1004 for a reboot
07:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 with low weight (duration: 00m 46s)
07:06 marostegui: Upgrade MySQL on db1100
07:06 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 for mysql upgrade (duration: 00m 47s)
07:00 marostegui: Restart icinga on icinga1001 - checks went awol
06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1079 (duration: 00m 48s)
06:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1079 (duration: 00m 48s)
06:14 marostegui@deploy1001: sync-file aborted: Depool db0179 (duration: 00m 01s)
04:23 TimStarling: on mwmaint1002: running normalizeThrottleParameters.php --dry-run on all wikis (T209565)
04:19 tstarling@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/AbuseFilter/maintenance/normalizeThrottleParameters.php: maintenance script update for new dry run (duration: 00m 47s)
04:19 tstarling@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/WikimediaEvents/tests/phpunit/PageViewsTest.php: test-only undeployed change (duration: 00m 46s)
04:18 tstarling@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/NavigationTiming/tests/ext.navigationTiming.test.js: test-only undeployed change (duration: 00m 51s)
04:10 tstarling@deploy1001: sync-file aborted: test-only undeployed change (duration: 00m 12s)
03:05 kartik@deploy1001: Finished deploy [cxserver/deploy@ee4a15a]: Update cxserver to 8928852 (T213256) (duration: 04m 08s)
03:01 kartik@deploy1001: Started deploy [cxserver/deploy@ee4a15a]: Update cxserver to 8928852 (T213256)

2019-02-10

off: force rebooting mw1299, stuck again - T215569
off: forcing reboot of icinga1001 because it's stuck again (no ping, no ssh, CPU stuck messages on console) - T214760
09:25 marostegui: Disable notifications for lag checks on dbstore1002 - T210478

2019-02-09

21:42 Reedy: running `foreachwiki refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu --force` on mwmaint1002 T215635
21:41 Reedy: refreshImageMetadata.php for commonswiki done T215635
16:51 Jeff_Green: restarted icinga process on icinga1001 because of passive check alert-storm

2019-02-08

23:23 Reedy: running `refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu --force` against commonswiki on mwmaint1002 T215635 (this time we mean it)
22:56 Reedy: running `refreshImageMetadata.php --mediatype BITMAP --mime image/vnd.djvu` against commonswiki on mwmaint1002 T215635
21:25 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: Move variable (duration: 00m 49s)
19:50 krinkle@deploy1001: Synchronized w/touch.php: Ia1e610a5f (duration: 00m 46s)
19:49 krinkle@deploy1001: Synchronized w/robots.php: Ia1e610a5f (duration: 00m 46s)
19:48 krinkle@deploy1001: Synchronized w/favicon.php: Ia1e610a5f (duration: 00m 46s)
19:47 krinkle@deploy1001: Synchronized w/extract2.php: Ia1e610a5f (duration: 00m 48s)
18:14 gtirloni: T213527 graphite2002 disabled puppet and commented prometheus_puppet_agent_stats cronjob due to cronspam
18:08 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase weight for s1 rc slaves (duration: 00m 49s)
17:55 mutante: phab1001 - restart aphlict service
17:52 mutante: phab1001 - restarting phd service
17:49 arturo: T215605 add prometheus-openstack-exporter 0.0.8-4 to stretch-wikimedia
17:47 mutante: phab1001 - restarting apache2 service for library upgrade
17:42 mutante: graceful reload of apache on phabricator prod server (phab1001)
17:27 XioNoX: merge Icinga: add ping check for ulsfo PDUs
16:50 ejegg: updated payments-wiki-staging from 52a271e681 to 31647bc97e
16:09 jynus: stopping s1 replication on dbstore1001 to speed up cloning T214720
16:08 moritzm: imported git-fat 0.1.3-2+deb10u1 to buster-wikimedia (T213527)
15:46 marostegui: Repool labsdb1009 - T212308
15:33 _joe_: apt-get upgrade on mwmaint2001 to fix the php installation T215376
15:31 moritzm: imported debmonitor 0.1.5-1+deb10u1 to buster-wikimedia (T213527)
15:31 _joe_: upgraded all php extensions to php 7.2 compatible versions on mwmaint1002
15:10 jijiki: Upgrading php-redis 4.1.1 to mwmaint1002 - T215376
14:51 marostegui: Reload haproxy on dbproxy1011 to depool labsdb1009 - https://phabricator.wikimedia.org/T212308
13:56 moritzm: updated firmware-enriched buster netboot image to 20190208 daily build, the alpha5 image no longer works as Linux 4.19.16-1 bumped the ABI and migrated to testing yesterday
13:45 jynus: racadm serveraction powercycle db1114
13:39 onimisionipe: starting osm-initial-import for maps2004 which is the newly migrated to stretch master - T198622
13:37 elukey: roll restart of aqs on aqs1* to pick up new druid backend changes
13:05 arturo: T209029 reimaging cloudelastic1004
12:54 ejegg: updated fundraising CiviCRM from 3a1bb82373 to a541a83cb2
12:51 jynus: disabling notifications on db1114
12:44 elukey@deploy1001: Synchronized wmf-config/db-eqiad.php: depooling db1114, host down (duration: 00m 47s)
11:36 moritzm: reimage graphite2002 to buster
11:08 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 fully (duration: 00m 47s)
10:50 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 (duration: 00m 47s)
10:27 jijiki: Restarting memcached on mc1026 to apply '-R 200' - T208844
10:23 godog: swift codfw-prod: more weight to ms-be2047 - T209395 T209921
10:15 jynus: stop and upgrade db1099
10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 (duration: 00m 47s)
09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 (duration: 00m 46s)
09:28 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 (duration: 00m 46s)
09:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 46s)
09:16 moritzm: installing rssh security updates
09:06 moritzm: installing libarchive security updates
09:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1086 (duration: 00m 47s)
08:53 moritzm: reimage graphite2002 to buster
08:50 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 with low load (duration: 00m 46s)
08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 47s)
08:24 jynus: stop and upgrade db1083
08:23 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 47s)
08:15 marostegui: Upgrade MySQL on db1086
08:05 marostegui: Upgrade MySQL on db1086 and deploy schema change
08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 46s)
07:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Full repool db1094 (duration: 00m 47s)
07:45 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1299.eqiad.wmnet
07:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1094 (duration: 02m 55s)
07:27 marostegui@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 (duration: 02m 56s)
07:12 marostegui: Upgrade mysql and kernel on db1094
06:58 marostegui: Deploy schema change on db1094 T210713
06:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 46s)
06:54 marostegui: Take a mysqldump from staging on dbstore1003 from dbstore1002 - T210478
06:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3317 (duration: 00m 49s)
06:29 marostegui: powercycle mw1299 - T215569
06:21 marostegui: Deploy schema change on db1098:3317
06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3317 (duration: 02m 58s)
06:07 marostegui: Drop staging.mep_word_persistence from dbstore1002 T215450 T213706
02:34 ejegg: updated fundraising CiviCRM from 08be00e87f to 3a1bb82373
01:37 dzahn@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
01:10 mutante: mw1299 has been down about 8 hours, does it need deployment.. depooling
01:08 mutante: powercycle crashed mw1299 via mgmt (garbled console output) (T215569)
00:22 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:488588 phab:T214515 Turn off wikidata wbsearchentities ab test in de, fr, es (duration: 02m 55s)
00:16 ebernhardson: scap sync timed out on mw1299.eqiad.wmnet
00:15 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:483044 T209873 Give protect right to centralnoticeadmin on Meta (duration: 02m 56s)

2019-02-07

23:29 XioNoX: restart ps1-22-ulsfo
23:23 reedy@deploy1001: Synchronized tests/dblistTest.php: Sync test (duration: 02m 55s)
23:18 reedy@deploy1001: Synchronized README: must be up to date (duration: 02m 54s)
22:48 reedy@deploy1001: Synchronized dblists/: alphasort dblists (duration: 02m 56s)
21:43 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.33.0-wmf.16 refs T206670
21:38 robh: updating firmware on ps1-23-ulsfo via T209101 ps1-22-ulsfo update completed
21:22 robh: updating firmware on ps1-22-ulsfo via T209101
20:55 twentyafterfour: train status: deploying 1.33.0-wmf.16 to group2
20:19 sbisson@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/WikibaseLexeme/src/DataAccess/Search/LexemeFulltextResult.php: SWAT: Fix fatal error - EmptySet does not exist anymore (duration: 03m 03s)
19:45 sbisson@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/GrowthExperiments/: SWAT: Help Panel: Fix iOS scroll bug (duration: 03m 02s)
19:28 sbisson@deploy1001: sync-file aborted: SWAT: GrowthExperiments: Enable search for help panel on testwiki (duration: 02m 22s)
19:25 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable search for help panel on testwiki (duration: 03m 04s)
18:32 mutante: LDAP - adding raz-shuty to group nda (T214488)
17:06 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 (duration: 03m 03s)
16:03 jynus: restart db1085, temporary s6 lag on wikireplicas
15:55 gehel: starting reimage of maps2004 - T198622
15:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 (duration: 00m 58s)
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on wikitech for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 8 wikis for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 7 wikis for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 6 wikis for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 5 wikis for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 4 wikis for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on remaining section 3 wikis for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 2 wikis for T215464. This may cause lag in codfw.
15:16 anomie@mwmaint1002: Fixing log_search after migrateActors.php on section 1 wikis for T215464. This may cause lag in codfw.
15:07 anomie@mwmaint1002: Fixing log_search after migrateActors.php on test wikis and mediawikiwiki for T215464. This may cause lag in codfw.
15:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1101 (duration: 00m 55s)
14:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 after alter and mysql upgrade (duration: 00m 55s)
14:34 jbond42: deploying security updates for libgd3
12:42 Amir1: EU SWAT is done
12:42 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Set EntityUsageTable addUsage batch size to 300, Part II (duration: 00m 54s)
12:42 marostegui: Set dbstore1002 as IDEMPOTENT - T213670
12:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntityUsageTable addUsage batch size to 300 (T215146), Part I (duration: 00m 55s)
12:34 marostegui: Powercycle mw1299 as it is down and not responding
12:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1101 after alter and mysql upgrade (duration: 03m 02s)
12:26 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: SWAT: Update interwiki cache to have yuewiktionary instead of zh-yue (T214400) (duration: 03m 04s)
12:06 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4026.ulsfo.wmnet
12:03 arturo: T214448 reimaging again cloudvirt200[1-3]-dev.codfw.wmnet
11:55 marostegui: Stop MySQL on db1101:3317 and db1101:3318 for mysql upgrade
11:37 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2055 (duration: 03m 02s)
11:17 fsero: upgrade helm to 2.12.2 on deploy{1001,2001} and contint{1001,2001} T215244
11:16 fsero: upgrade helm to 2.12.2 on deploy{1001,2001} and contint{1001,2001}
10:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101 for alter and mysql upgrade (duration: 00m 56s)
10:43 marostegui: Run mysqldump from dbstore1003 to dump dbstore1002:staging.mep_word_persistence - T215450
09:49 marostegui: Deploy schema change on db1116 - T210713
09:41 akosiaris: reboot mwdebug1001, mwdebug1002, mwdebug2001, mwdebug2002 for VCPU upgrade. T212955
09:23 jynus: running alter table on db2055 for perforamance testing T212092
09:15 fsero: uploading helm and tiller 2.12.2 deb package to stretch and jessie
08:53 marostegui: Deploy schema change on s7 codfw master (db2047), this will generate lag on s7 codfw - T210713
08:34 godog: swift codfw-prod: more weight to ms-be2047 - T209395 T209921
08:14 marostegui: Deploy schema change on s4 primary master (db1068) - T210713
08:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 (duration: 00m 54s)
07:50 marostegui: Deploy schema change on db1081
07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 53s)
07:48 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 20s)
07:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1084 (duration: 00m 53s)
07:42 reedy@deploy1001: Synchronized dblists/: Wikimania T215486 (duration: 00m 54s)
07:03 marostegui: Deploy schema change on db1084 - T210713
07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1084 (duration: 00m 55s)
06:48 marostegui: Restore consistency options on db2051
06:14 marostegui: Ease consistency options on db2051 (s4 master) to let it catch up on replication
04:35 tstarling@deploy1001: Synchronized wmf-config/set-time-limit.php: (no justification provided) (duration: 00m 54s)
04:00 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable EP namespaces on wikis with no EP pages (duration: 00m 57s)
01:31 eileen: civicrm revision changed from c5aec3ae76 to 08be00e87f, config revision is 306b4de48f
01:24 eileen: civicrm revision changed from 6161a021c0 to c5aec3ae76, config revision is 306b4de48f
01:05 twentyafterfour: US Evening SWAT is complete
01:04 twentyafterfour: no phabricator deployment tonight
01:04 eileen: civicrm revision changed from 613b388916 to 6161a021c0, config revision is 306b4de48f
00:57 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT config change for Bug: T214003 (duration: 00m 53s)
00:53 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/VisualEditor/: SWAT f89e12f to fix bug: T209610 (duration: 00m 55s)
00:48 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/MobileFrontend/: SWAT dd8654a (duration: 01m 00s)
00:47 twentyafterfour: syncing commit dd8654a for Bug: T209052
00:24 twentyafterfour: running `mwscript migrateUserGroup.php commonswiki extended-uploader autopatrolled` on deploy1001

2019-02-06

23:58 mutante: restarting icinga on icinga1001 to pick up new check command ?
22:22 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.16 refs T206670 (duration: 00m 53s)
22:22 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.16 refs T206670
21:45 mutante: LDAP - adding brennen to wmf, releng, ciadmin - Welcome Brennen Bearnes, Software Engineer in Release Engineering (T215365 T214556)
21:05 arlolra@deploy1001: Finished deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71 (duration: 03m 43s)
21:04 Krinkle: krinkle@webperf1002 Kill xenon-log (pid 449). It seems its Redis TCP socket to mwlog1001 has been stuck since Dec 13, causing the process to indefinitely hang on listen()/socket.recv()
21:01 arlolra@deploy1001: Started deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71
20:49 mutante: LDAP - adding h78na to wmf - welcome Hana Worku, developer on the multimedia team (T215352)
20:40 mutante: LDAP - adding egardner to wmf - welcome Eric Gardner , software engineer in Audiences (T214654)
20:35 twentyafterfour: 1.33.0-wmf.16 has a significantly higher rate of "entire web request took longer than 60 seconds and timed out"
20:03 twentyafterfour: Resuming the MediaWiki train for version 1.33.0-wmf.16. Will deploy Group0 wikis first and then catch up to group1 after a few minutes monitoring logs for stability.
19:50 robh: updated firmware on cp4026 and re-seated (already well seated) dimm b3. errors have cleared for now T214516
19:24 milimetric@deploy1001: Finished deploy [analytics/refinery@cd413dd]: Small bug fix for history checker (duration: 12m 45s)
19:13 robh: taking cp4026 offline to flash firmware and reseat dimm for testing on T214516
19:12 milimetric@deploy1001: Started deploy [analytics/refinery@cd413dd]: Small bug fix for history checker
19:11 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) cobalt T214326 (duration: 00m 09s)
19:11 thcipriani@deploy1001: Started deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) cobalt T214326
19:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) gerrit2001 first (duration: 00m 10s)
19:09 mutante: LDAP - adding afandian2 and toddleroux to nda (T214727)
19:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@3272a46]: Add healthcheck plugin (no restart) gerrit2001 first
19:04 jforrester@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/Flow/includes/Conversion/Utils.php: I405dd193 Update Parsoid Accept header to 2.0.0 so service can deploy (duration: 00m 54s)
19:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/Flow/includes/Conversion/Utils.php: I405dd193 Update Parsoid Accept header to 2.0.0 so service can deploy (duration: 00m 56s)
18:03 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Removed namespace Коментар, added namespace Портал on srwikinews T214561 T214563 (duration: 00m 53s)
18:01 mutante: LDAP - adding alaasarhan to wmde (T215066)
17:57 thcipriani@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Changed wgImportSources for srwikinews T214562 (duration: 00m 53s)
17:53 thcipriani@deploy1001: Synchronized dblists/s3.dblist: SWAT: dblists/s3.dblist: Fix sorting of list of wikis per alphabetical order (duration: 00m 54s)
17:49 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/MobileFrontend: SWAT: VE: Load HTML in parallel with modules T209052 (duration: 00m 57s)
17:40 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.16/extensions/MobileFrontend: SWAT: EditorOverlay: Pass constructor of itself to VisualEditorOverlay, not instance T215408 (duration: 00m 57s)
17:10 jynus: setting db1111 in read-write mode
16:24 moritzm: reimaging graphite2002 to buster
16:19 jynus: running alter table on db2055 T93564
16:14 gehel@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
15:44 papaul: powering down thumbor2002 for disk replacement
15:42 moritzm: installing spice security updates
15:41 andrewbogott: rebooting cloudvirt1015 to make sure that nothing drastic changes once libguestfs is installed T215423
15:11 moritzm: installing libav security updates
15:08 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2055 for performance testing T93564 (duration: 00m 55s)
14:50 moritzm: draining restbase1018 for eventual reboot for kernel security update (bundled with Java update)
14:36 moritzm: draining restbase1017 for eventual reboot for kernel security update (bundled with Java update)
14:29 elukey: add term mysql-dbstore to analytics-in4/6 on cr1/2-eqiad to allow tcp connections to dbstore100[3-5] - T210478
12:30 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3032.esams.wmnet
12:29 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3032.esams.wmnet
12:28 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3033.esams.wmnet
12:26 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3033.esams.wmnet
12:25 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3040.esams.wmnet
12:24 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3040.esams.wmnet
12:22 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3041.esams.wmnet
12:22 Amir1: EU SWAT is done
12:21 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3041.esams.wmnet
12:20 vgutierrez: restarting varnish-fe safely across esams/text cluster - T215389
12:19 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Use separate DB connection for ID insertions on testwikidatawiki (T215147), Part II (duration: 00m 54s)
12:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use separate DB connection for ID insertions on testwikidatawiki (T215147), Part I (duration: 00m 55s)
11:58 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3042.esams.wmnet
11:57 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3042.esams.wmnet
11:56 vgutierrez: restarting varnish-fe in cp3042 - T215389
11:02 _joe_: restarting nginx safely across the appserver fleets in order to be able to run puppet without errors
10:41 marostegui: Revoke access to testreduce from ruthenium on m5 - https://phabricator.wikimedia.org/T214740
10:04 moritzm: reimaging graphite2002 to buster
10:01 akosiaris: restart varnish-frontend on cp3030 T215389
10:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 (duration: 00m 52s)
09:33 marostegui: Remove wikiuser from dbstore1003-dbstore1005 T210478
09:15 godog: swift codfw-prod: more weight for ms-be2047 - T209395 T209921
09:00 marostegui: Create research_role on dbstore1003-1005 on all instances - T214469
08:49 marostegui: Deploy schema change on db1091 - T210713
08:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 53s)
08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1121 (duration: 00m 53s)
07:51 marostegui: Deploy schema change on db1121 - this will generate lag on s4 labs - also upgrade MySQL on db1121 T210713
07:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1121 (duration: 00m 54s)
07:47 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 54s)
07:19 marostegui: Deploy schema change on wikitech T210713
07:14 marostegui: Stop 's4' slave on dbstore1002
07:13 marostegui: Deploy schema change on db1103:3314 (db1097:3314 was also done previously) - T210713
07:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 (duration: 00m 53s)
07:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 56s)
06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3314 (duration: 01m 06s)
04:39 mutante: reloaded icinga service, cant find new check command definition
03:14 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.16 refs T206670 (duration: 04m 18s)
03:09 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.16 refs T206670
03:05 mutante: actinium - gzipping and rotating some access logs
03:01 twentyafterfour@deploy1001: Synchronized scap/plugins/updateinterwikicache.py: (no justification provided) (duration: 00m 55s)
02:47 mutante: actinium - blocking a bad domain and restarting squid3
02:40 twentyafterfour@deploy1001: Finished scap: sync and update localization for 1.33.0-wmf.16 (duration: 15m 50s)
02:32 XioNoX: push firewall rule to pfw3-eqiad - T215364
02:27 mutante: actinium - apt-get clean for 8% more disk space after icinga alert
02:25 twentyafterfour@deploy1001: Started scap: sync and update localization for 1.33.0-wmf.16
02:16 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.14 refs T206670
02:12 eileen: civicrm revision changed from 6042acb363 to 613b388916, config revision is 306b4de48f
02:02 twentyafterfour@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
01:22 XioNoX: remove peering4/6 prefix-list from routers
01:07 XioNoX: add maintenance and rollback to junos operations class
00:47 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.16 refs T206670
00:33 niharika29@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/MobileFrontend/: EditorOverlay: captcha/abusefilter weren't being shown correctly T215101, T202374 (duration: 00m 50s)
00:24 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Demystify Logstash debug level behavior (duration: 00m 51s)
00:23 niharika29@deploy1001: Synchronized wmf-config/logging.php: Demystify Logstash debug level behavior (duration: 00m 46s)
00:18 niharika29@deploy1001: Synchronized wmf-config/logging.php: Add PHP version to MW logs T215350 (duration: 00m 46s)
00:16 niharika29@deploy1001: Synchronized wmf-config/CommonSettings.php: Preserve Composer's include paths - T215126, T215224 (duration: 01m 40s)

2019-02-05

18:56 arlolra@deploy1001: Finished deploy [parsoid/deploy@a4acfa6]: (no justification provided) (duration: 02m 06s)
18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@a4acfa6]: (no justification provided)
18:39 arlolra@deploy1001: Finished deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71 (duration: 09m 54s)
18:29 arlolra@deploy1001: Started deploy [parsoid/deploy@a4acfa6]: Updating Parsoid to fb67a71
18:26 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@2959e12]: Update mobileapps to 107c1b1 (T214714) (duration: 04m 43s)
18:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@2959e12]: Update mobileapps to 107c1b1 (T214714)
18:17 mutante: contint1001/contint2001 -manually deleting crontab lines unpuppetized in gerrit:488019 (T209361)
18:13 Jeff_Green: authdns-update to deploy 7fee817fd3
17:22 mutante: scandium - restart parsoid-vd service
17:21 mutante: scandium -- copy /srv/visualdiff/testrecude/testrun.ids from ruthenium to the same locatio
15:15 godog: force curator action 'replicas' to set older logstash indices to 1 replica - T213078
14:30 marostegui: Deploy schema change on s4 codfw master with replication, lag will be generated on s4 codfw - T210713
14:26 Jeff_Green: authdns-update for payments dev/testing hostname
14:12 marostegui: Deploy schema change on db1066 (s2 master) - T210713
14:05 marostegui: Delete non used grants from dbstore1002: log, warehouse,project_illustration, cognate\_wiktionary, datasets - T212487 T210478
13:55 godog: swift codfw-prod: add ms-be2047 - T209395 T209921
12:18 addshore: swat done
12:18 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable confirmation prompt on rollback by default T215019 (duration: 00m 47s)
11:35 moritzm: added firmware-enriched buster netboot image (T213546)
11:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1074 (duration: 00m 46s)
10:43 marostegui: Deploy schema change on db1074 with replication, lag will be generated on s2 - T210713
10:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1074 (duration: 00m 47s)
10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3312 (duration: 00m 46s)
09:42 hashar: contint1001: docker image prune -f
09:34 marostegui: Deploy schema change on db1090:3312 - T210713
09:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3312 (duration: 00m 45s)
09:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1076 (duration: 00m 46s)
09:11 marostegui: Start all slaves on dbstore1002 - T213670
08:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 45s)
08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1076 (duration: 00m 46s)
08:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 45s)
07:56 marostegui: Upgrade MySQL and kernel on db1076
07:44 marostegui: Deploy schema change on db1076 - T210713
07:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 T210713 (duration: 00m 47s)
07:13 marostegui: Taking mysqldump from dbstore1002.staging - T210478
07:05 marostegui: Reboot mysql on db1117:3323 (this will make the dbproxies complain) T214248
02:24 XioNoX: remove BGP session to as6412 on cr2-eqiad (gone from IX)
02:21 XioNoX: delete 2nd as9121 router on cr2-esams
00:47 XioNoX: add BGP sessions to AS64050 on cr1-eqsin
00:24 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/486405/ (duration: 00m 46s)
00:11 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 46s)

2019-02-04

22:05 mutante: scandium - systemctl start parsoid-vd (T201366)
20:01 herron: manually ran puppet on mc1023
19:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Stop setting wgParsoidWikiPrefix, unused since the Parsoid extension (duration: 00m 45s)
19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Stop setting wgFlowEventLogging, unread (duration: 00m 45s)
19:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean-up: Stop setting values for wgEcho*FooterNotice*, unread (duration: 00m 46s)
19:32 James_F: Manually purged atjwiki*.png logos for T215122.
19:28 jforrester@deploy1001: Synchronized static/images/project-logos/atjwiki.png: SWAT: Milestone lobo for atjwiki T215122, 1x (duration: 00m 46s)
19:27 jforrester@deploy1001: Synchronized static/images/project-logos/atjwiki-1.5x.png: SWAT: Milestone lobo for atjwiki T215122, 1.5x (duration: 00m 45s)
19:26 jforrester@deploy1001: Synchronized static/images/project-logos/atjwiki-2x.png: SWAT: Milestone lobo for atjwiki T215122, 2x (duration: 00m 44s)
19:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T191039 Enable wgAbuseFilterRuntimeProfile on all wikis (duration: 00m 47s)
19:19 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@8b2f078]: Weekly GUI deploy (duration: 09m 47s)
19:09 smalyshev@deploy1001: Started deploy [wdqs/wdqs@8b2f078]: Weekly GUI deploy
18:31 XioNoX: adding Papaul to root@wiki
18:22 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Drop reading for wgEcho*FooterNotice*, unread (duration: 00m 46s)
18:18 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean-up: Stop setting wgEchoConfig, unused since 2016 (duration: 00m 48s)
18:11 jforrester@deploy1001: Synchronized dblists/: T213504: Finally, drop the wikidatarepo dblist (duration: 00m 45s)
18:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T213504: Stop telling CommonsSettings about the wikidatarepo dblist (duration: 00m 45s)
18:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213504: Unconfigure the wikidatarepo dblist (duration: 00m 46s)
18:05 XioNoX: manually rotate log file wtmp on csw2-esams
18:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T213504: Configure wikibaserepo dblist just like the wikidatarepo one (duration: 00m 46s)
17:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T213504: Tell CommonSettings about the new wikibaserepo dblist (duration: 00m 47s)
17:56 jforrester@deploy1001: Synchronized dblists/wikibaserepo.dblist: T213504: Create the new wikibaserepo dblist (duration: 00m 47s)
17:25 papaul: powering down thumbor2002 for disk replacement
17:10 XioNoX: revert ospf metrics to normal values on esams-eqiad Level3 link
16:50 Lucas_WMDE: deployed patch for T212118
12:41 Lucas_WMDE: EU SWAT done
12:40 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix Wikidata base URI in client config (T198946) (duration: 00m 46s)
12:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Populate wmgWikibaseRepoSpecialSiteLinkGroups for commonswiki (T213975) (duration: 00m 51s)
11:04 moritzm: installing ghostscript security updates
08:48 jynus: fixing dbstore1002 x1 replication
07:56 vgutierrez: uploaded certcentral 0.8 to apt.wikimedia.org (stretch) - T209980 T213820 T213301

2019-02-03

20:25 elukey: powercycle mw1272 - no ssh, no tty available via com2 - DIMM correctable errors + OEM errors registered in getsel
18:56 elukey: started a tmux session on dbstore1002 to migrate all the tokudb tables of mediawikiwiki to InnoDB - (s3 replication broken)
17:53 elukey: start all slaves on dbstore1002 (After a crash + recovery) + moved mediawikiwiki.revision_actor_temp to Innodb to unblock s3 slave replication (still broken though)
04:55 legoktm@deploy1001: Synchronized wmf-config/extension-list: Remove WikibaseQuality from extensions-list (T208499) (duration: 00m 51s)
01:10 elukey: powercycle mw1299 - can't ssh nor get a tty via console - racadm getsel shows "An OEM diagnostic event occurred."

2019-02-02

20:42 chaomodus: restarted pdfrender on scb1003
20:41 chaomodus: restarted pdfrender on scb1004
20:06 chaomodus: parsoid was failed on scandium and alerting, the service parsoid-vd was restarted and appears to have come back
05:44 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/VisualEditor/lib/ve/src/ui/dialogs/ve.ui.FindAndReplaceDialog.js: b/src/ui/dialogs/ve.ui.FindAndReplaceDialog.js T214963 Hot-deploy VE fix to stop hitting user pref writes without debounce (duration: 01m 02s)

2019-02-01

23:16 vgutierrez: restart pdfrender on scb1004
21:57 ejegg: updated payments-wiki-staging from 7767c7027e to 52a271e681
21:25 ejegg: updated payments-wiki-staging to fundraising/REL1_31 branch
07:13 bawolff_: reset 2FA on wikitech for User:Cicalese

2019-01-31

17:44 jynus: running alter table on metawiki.revision_actor_temp, trying to fix TokuDB horrible bugs
15:54 jynus: stop, upgrade and restart db1117
13:34 mvolz@deploy1001: scap-helm zotero finished
13:34 mvolz@deploy1001: scap-helm zotero cluster codfw completed
13:34 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
13:31 mvolz@deploy1001: scap-helm zotero finished
13:31 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
13:31 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
13:19 mvolz@deploy1001: scap-helm zotero finished
13:19 mvolz@deploy1001: scap-helm zotero cluster staging completed
13:19 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml --version=0.0.1 stable/zotero [namespace: zotero, clusters: staging]
13:18 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
12:54 jynus: stop, upgrade and restart db2044
12:12 jynus: apply new grants to m5-master with replication T214740
11:30 arturo: T215012 icinga downtime cloudvirt1015 for 4h while investigating issues
11:24 arturo: T215012 reboot cloudvirt1015
11:24 jynus: restart eventstreams on scb1002,3,4
11:22 jynus: restart eventstreams on scb1001
10:22 jynus: resetting to defaults innodb consistency options for db2048 T188327
10:00 jynus: restarting pdfrender on scb1002,3,4
09:54 jynus: restarting pdfrender on scb1001
02:01 gtirloni: T215004 restarted gerrit (using 1200% cpu, 71% mem)

2019-01-30

20:28 bawolff_: reset 2FA@wikitech for User:deigo
18:25 ladsgroup@deploy1001: Finished deploy [ores/deploy@ad160b0]: (no justification provided) (duration: 12m 46s)
18:12 ladsgroup@deploy1001: Started deploy [ores/deploy@ad160b0]: (no justification provided)
18:03 jynus: reducing innodb consistency options for db2048 T188327
17:36 XioNoX: deactivate/activate cr2-esams:xe-0/1/3
17:28 akosiaris: restart pdfrender on scb1003, scb1004
16:19 akosiaris: restart proton on proton1002
15:52 jynus: stop, upgrade and restart db2037
15:24 jynus: stop, upgrade and restart db2042
14:27 jynus: stop, upgrade and restart db2034, this will cause some lag on x1-codfw
13:53 jynus: stop, upgrade and restart db2069
11:20 jynus: stop, upgrade and restart db2045, this will cause some lag on s8-codfw
10:54 jynus: stop, upgrade and restart db2079
10:33 jynus: stop, upgrade and restart db2039, this will cause some lag on s6-codfw
10:03 jynus: stop, upgrade and restart db2052, this will cause some lag on s5-codfw
09:31 jynus: stop, upgrade and restart db2089 (s5/s6)
08:58 jynus: stop, upgrade and restart db2051, this will cause some lag on s4-codfw
08:44 jynus: stop, upgrade and restart db2090

2019-01-29

21:52 jijiki: Depooling thumbor2002 due to disc failure - T214813
16:51 arturo: T214499 update Netbox status for cloudvirt1023/1024/1025/1026/1027 from PLANNED to ACTIVE. These servers are actually providing services already.
10:05 jynus: stop, upgrade and restart db2065
09:28 jynus: stop, upgrade and restart db2058
09:12 jynus: stopping, upgrading and restarting db2035, this will cause lag on codfw-s2
08:58 jynus: stop, upgrade and restart db2041
08:38 jynus: stop, upgrade and restart db2056
08:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1114 after crash (duration: 00m 52s)
03:32 XioNoX: bump cr2-esams-cr2-eqiad ospf cost to 2000 for level3 link flapping

2019-01-28

23:51 vgutierrez: restarting cp2014 - T214872
21:02 Zoranzoki21: Done wikitext export of content of database for education program on srwiki - T174802 (duration: 8 minutes)
20:54 Zoranzoki21: Starting wikitext export of content of database for education program on srwiki - T174802 (21:54 UTC+1)
19:55 brion: running final pass of requeueTranscodes.php on all wikis to make sure stray missing VP9 transcodes are cleaned up (on mwmaint1002 in a tmux session)
16:41 hashar: contint1001: cleaning up disk space on / (docker images)
16:36 jynus: remove backups dir at dbstore2001 T214831
15:22 thcipriani: restarting jenkins for update
14:16 jynus: stop, upgrade and reboot db2048, this will cause general lag/read only on enwiki/s1-codfw for some minutes
13:52 jynus: stop, upgrade and reboot db2092
12:55 jynus: stop, upgrade and reboot db2085
12:45 jynus: powercycle ms-be1034
12:42 onimisionipe: restarting all elatsicsearch instances on relforge1002 to test spicerack command
11:21 jynus: stop, upgrade and reboot db2062
10:45 jynus: stop, upgrade and reboot db2055

2019-01-27

16:22 godog: powercycle ms-be1020 - T214778
03:28 marostegui: Fix x1 on dbstore1002 - T213670
02:24 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: Hot-deploy Ic2b08cb27 in WBMI to fix Commons File page display (duration: 00m 49s)

2019-01-26

11:06 volans: force rebooting icinga1001 (no ping, no ssh, stuck console)
03:23 marostegui: Convert all tables on incubatorwiki to innodb to fix s3 thread - T213670
00:03 XioNoX: split member-range ge-3/0/0 to ge-3/0/38 on asw-b-codfw

2019-01-25

22:45 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@5e859c4]: Update mobileapps to a8834e8 (T214728) (duration: 03m 27s)
22:42 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@5e859c4]: Update mobileapps to a8834e8 (T214728)
21:56 krinkle@deploy1001: Synchronized wmf-config/flaggedrevs.php: I95c37d628557c (duration: 00m 46s)
21:44 krinkle@deploy1001: Synchronized wmf-config/: Idb695dd033d42 (duration: 00m 46s)
21:43 krinkle@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Idb695dd033d42 (duration: 00m 47s)
21:05 robh: cleared sel on db1068, it had a power redundancy loss event (old and resolved) that was triggering the icinga check
20:04 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1106 as an extra api host (duration: 00m 46s)
19:36 jynus: powercycle db1114 T214720
19:21 jynus: disabling notifications on db1114
19:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1114 (duration: 00m 46s)
18:32 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@94b76f5]: Update mobileapps to 4c42e3d (T214714) (duration: 03m 33s)
18:28 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@94b76f5]: Update mobileapps to 4c42e3d (T214714)
17:17 chaomodus: notebook1003 restarted nagios-nrpe-server due to oom - T212824
14:43 hashar: contint1001: stopping zuul-merger for cleanup duties
09:48 marostegui: Add dbstore1005:3318 to tendril T210478
08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1105 (duration: 00m 45s)
08:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1105:3312 (duration: 00m 45s)
07:51 elukey: restart yarn/hdfs daemons on analytics1056 to pick up new disk settings - T214057
07:40 elukey: drain + reboot analytics1054 after disk swap (verify reboot + restore correct fstab mountpoints) - T213038
07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105:3312 (duration: 00m 45s)
07:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1105 (duration: 00m 47s)
06:53 marostegui: Stop MySQL on db1105 to upgrade MySQL
06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully depool db1105 (duration: 00m 46s)
06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 T210713 (duration: 00m 47s)
06:13 marostegui: Deploy schema change on db1122 - T210713
06:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 T210713 (duration: 00m 48s)
06:04 marostegui: Compress dbstore1002: staging.mep_word_persistence from Aria to InnoDB - T213706
05:42 kartik@deploy1001: Finished deploy [cxserver/deploy@a5d7181]: Update cxserver to 356f0a1 (T213257, T213275) (duration: 04m 09s)
05:38 kartik@deploy1001: Started deploy [cxserver/deploy@a5d7181]: Update cxserver to 356f0a1 (T213257, T213275)
03:12 mutante: scandium sudo chgrp -R wikidev /srv/deployment/parsoid/deploy/ ; sudo chmod -R g+w /srv/deployment/parsoid/deploy/ (T201366)
03:03 mutante: scandium - apt-get -t stretch-backports install npm ; run puppet ; remove manually created /apt/preferences.d/npm.pref ; puppet created npm_stretch_backports.pref ; puppet run without errors again (T201366)
01:33 crusnov@deploy1001: Finished deploy [netbox/deploy@7770453]: Cleanup deploy - T212524 (duration: 00m 11s)
01:33 crusnov@deploy1001: Started deploy [netbox/deploy@7770453]: Cleanup deploy - T212524
01:28 crusnov@deploy1001: Finished deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524 Try 2 (duration: 00m 31s)
01:27 crusnov@deploy1001: Started deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524 Try 2
01:26 crusnov@deploy1001: Finished deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524 (duration: 07m 43s)
01:18 crusnov@deploy1001: Started deploy [netbox/deploy@7770453]: Upgrade netbox to 2.5.3 - T212524
00:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T214515 gerrit:486154: Turn on wbsearchentities ab test in de, fr, es (duration: 00m 46s)
00:37 ebernhardson@deploy1001: Synchronized wmf-config/WikibaseSearchSettings.php: SWAT T214515 gerrit:484334: Add wbsearchentities profiles for de, fr, es (duration: 00m 45s)
00:34 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/MobileFrontend/: SWAT T214606 gerrit:486392: MobileFrontend if wikidatadata description exists, set it as tagline (duration: 00m 47s)
00:29 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.14/includes/Title.php: SWAT T210739 gerrit:486369: Clone the Title object to prevent mutation (duration: 00m 47s)
00:20 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: SWAT T212788 gerrit:485609: autocomplete subphrase matching on wikitech and mw.org 2 of 2 (duration: 00m 45s)
00:14 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T212788 gerrit:485608: autocomplete subphrase matching on wikitech and mw.org (duration: 00m 46s)
00:01 arlolra: Updated Parsoid to 4772f44 (T214649, T214648)

2019-01-24

23:54 arlolra@deploy1001: Finished deploy [parsoid/deploy@f9ef630]: Updating Parsoid to 4772f44 (duration: 11m 58s)
23:42 arlolra@deploy1001: Started deploy [parsoid/deploy@f9ef630]: Updating Parsoid to 4772f44
22:21 mutante: wikitech-static splitting apache2 config files into one file per vhost to make it possible for certbot t odetect them
22:11 mutante: wikitech-static attempted to use certbot with --authenticator webroot and --installer apache to make it properly work with certbot renew in the future. it created account in /etc/letsencrypt/ made backup in /root/; challenge fails though because all domains need to serve out of a webroot and there is status.wikimedia.org here as well. (T21640)
22:08 mutante: wikitech-static - certbot was already installed but it wasn't used to generate the existing certs so just running certbot renew did not work, attempted to use certbot to renew but apache plugin missing, installed python-certbot-apache (T214640)
21:40 twentyafterfour: Finished MediaWiki train for 1.33.0-wmf.14 (T206668) - there is no train next week so I'll be back with wmf.16 (T206670) in two weeks.
21:16 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterView.php: sync I67ca47 refs T206668 (duration: 00m 47s)
20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.14 refs T206668
20:11 jforrester@deploy1001: Finished scap: Post-SWAT full sync for new i18n for T208097 (duration: 33m 54s)
19:59 mutante: temp disabled puppet on phab1001 , applying ferm change to allow deployment servers to http to phab servers
19:37 jforrester@deploy1001: Started scap: Post-SWAT full sync for new i18n for T208097
19:35 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T213356 Enable WelcomeSurvey experiment 2 on viwiki (duration: 00m 53s)
19:33 akosiaris: delete 8505 tickets from OTRS with customerID Mailer-Daemon@wizengo.ds.planet-work.net T214604 - correction
19:32 akosiaris: delete 5076 tickets from OTRS with customerID Mailer-Daemon@wizengo.ds.planet-work.net T214604
19:32 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: SWAT T213885 Don't add mw:mediainfoView on File pages with no captions either (duration: 00m 51s)
19:26 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaMessages/i18n/wikimedia/en.json: SWAT T208097 WikimediaMessages: Add message for BlockAttacker password policy (duration: 00m 50s)
19:25 arlolra: Updated Parsoid to f1d717f (T187958, T205337, T214103)
19:23 akosiaris: delete 5076 tickets from OTRS with customerID MAILER-DAEMON@ubuntu.member.linode.com T214604
19:23 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter/includes/AbuseFilter.php: SWAT AbuseFilter Optionally pass the filter ID to checkConditions for error reporting I8510319c (duration: 00m 53s)
19:19 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/GrowthExperiments/GrowthExperiments.alias.php: SWAT T213356 Add Special:WelcomeSurvey Vietnamese alias (duration: 00m 54s)
19:12 marostegui: Convert dbstore1002 staging.organic_link from Aria to InnoDB - T213706
19:03 arlolra@deploy1001: Finished deploy [parsoid/deploy@f2384f0]: Updating Parsoid to f1d717f (duration: 09m 41s)
19:02 cdanis: T214529: cdanis@cp4026.ulsfo.wmnet ~ % sudo apt-get --purge remove edac-utils libsysfs2 libedac1
18:53 arlolra@deploy1001: Started deploy [parsoid/deploy@f2384f0]: Updating Parsoid to f1d717f
18:53 mutante: notebook1003 - restarted nagios-nrpe-server... T212824
18:52 chaomodus: notebook1002: restarted nagios-nrpe-server due to oom
18:49 cdanis: cp4026: T214529: apt-get install'ing edac-utils with new deps libedac1 libsysfs2
18:37 onimisionipe: pooling maps1003 - stretch migration is complete. T198622
18:22 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@26a8bbd] (stretch): Updating maps1001 to reflect latest changes (duration: 01m 24s)
18:21 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@26a8bbd] (stretch): Updating maps1001 to reflect latest changes
18:19 mutante: deploying polygerrit (new gerrit UI) theme change to roughly match MediaWiki timeless theme (gerrit:482379) (shoutouts: paladox, thcipiriani)
18:07 XioNoX: re-activate ping offload redirect for ping1001 restart
18:03 moritzm: rebooting ping1001 to pick up SSBD-enabled qemu
18:01 XioNoX: deactive ping offload redirect for ping1001 restart
17:58 moritzm: rebooting ping2001 to pick up SSBD-enabled qemu
17:50 akosiaris: restart exim on mendelevium T214604
17:44 akosiaris: block specific IPv4, IPv6 address on mx1001, mx2001 T214604
17:35 akosiaris: freeze all current info@wikipedia.org emails on mx1001, mx2001 T214604
17:31 moritzm: rebooting seaborgium to pick SSBD-enabled qemu
17:01 akosiaris: stop exim on mendelevium
16:25 moritzm: rebooting serpens to pick SSBD-enabled qemu
15:45 reedy@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaEvents/: Revive wgPoweredByHHVM (duration: 00m 55s)
15:14 moritzm: rebooting pollux to pick SSBD-enabled qemu
14:50 godog: roll restart prometheus after https://gerrit.wikimedia.org/r/c/operations/puppet/+/486251 - T187987
14:45 ariel@deploy1001: Finished deploy [dumps/dumps@25358e7]: fix up web links to multistream dump files (duration: 00m 03s)
14:45 ariel@deploy1001: Started deploy [dumps/dumps@25358e7]: fix up web links to multistream dump files
14:31 andrew@deploy1001: Finished deploy [horizon/deploy@94f3ec1]: Rolling out an upgraded proxy dashboard -- now use designate v2 API (duration: 03m 21s)
14:28 andrew@deploy1001: Started deploy [horizon/deploy@94f3ec1]: Rolling out an upgraded proxy dashboard -- now use designate v2 API
14:23 marostegui: Stop replication on all threads in dbstore1002 - T213706
13:13 zeljkof: EU SWAT finished
13:10 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgSitename and $wgMetaNamespace for ur.wiktionary, ur.wikibooks and ur.wikiquote (T214290) (duration: 00m 53s)
13:02 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Assign "suppressredirect" to rollbacker on newiki (T214012) (duration: 00m 53s)
13:00 zeljkof: extending EU SWAT for 5-10 minuts
12:53 reedy@deploy1001: Synchronized private/PrivateSettings.php: fix minor typo (duration: 00m 52s)
12:46 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change $wgUploadNavigationUrl for the Persian (fa) Wikisource to Commons (T214048) (duration: 00m 53s)
12:36 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add few domains at $wgCopyUploadsDomains and cleanup inline comments (T213961 T213632 T213649 T213924) (duration: 00m 53s)
12:32 zfilipin@deploy1001: sync-file aborted: SWAT: Enable reference previews on beta (T213415) (duration: 00m 01s)
12:28 jbond42: restarting pdns-recursor and ntp on dns1001 and dns1002 for a security update
12:25 zfilipin@deploy1001: Synchronized wmf-config/: SWAT: Enable reference previews on beta (T213415) (duration: 00m 54s)
12:17 zfilipin@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: Enable $wgAbuseFilterProfile on every wiki (T191039) (duration: 00m 54s)
12:12 onimisionipe: initializing postgres replication for maps1001
11:55 moritzm: installing memcached updates on dbmonitor*
11:41 moritzm: installing polarssl security updates
11:38 gehel: restart elasticsearch on elastic20205 to validate configuration change
11:27 gehel: restarting blazegraph + updater on wdqs* for jvm upgrade
11:26 moritzm: installing xen security updates (only some client libs are used)
11:12 marostegui: Add dbstore1005:3318 to zarcillo - T210478
11:08 moritzm: installing Java security updates on wdqs hosts
10:59 arturo: T214299 additional reboot for cloudnet1004
10:51 marostegui: Compress innodb tables on dbstore1005:3318 - T210478
10:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 (duration: 00m 53s)
10:37 moritzm: installing libsndfile security updates
10:37 gehel: starting stretch upgrade on maps1001 - T198622
10:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1075 (duration: 00m 52s)
10:13 moritzm: installing libav security updates
10:03 arturo: T214299 reimage cloudnet1004 to debian stretch
09:58 moritzm: installing tiff security updates on trusty
09:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 T210713 (duration: 00m 53s)
09:43 marostegui: Deploy schema change on db1095:3312 - T210713
09:30 marostegui: Deploy schema change on db1103:3312 - T210713
09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 T210713 (duration: 00m 53s)
09:24 godog: temp stop prometheus@global on prometheus2003 to grab a snapshot
08:51 dcausse: elasticsearch: deleting indices moved out of the search-chi@(eqiad|codfw) cluster (T214052)
08:49 marostegui: Transfer s8 from db1116:3318 to dbstore1005:3318 T210478
08:40 marostegui: Deploy schema change on s2 codfw master (db2035). this will generate lag on codfw - T210713
08:30 marostegui: Deploy schema change on db1070 (s5 master) - T210713
08:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1110 T210713 (duration: 00m 52s)
08:18 marostegui: Deploy schema change on db1110 - T210713
08:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 T210713 (duration: 00m 53s)
08:08 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Whitelist the php7 beta feature (duration: 00m 54s)
07:58 marostegui: Compress innodb on dbstor1004 s2 and s3 - T210478
07:53 marostegui: Deploy schema change on db1102:3315
07:50 marostegui: Compress InnoDB tables on dbstore1005:3316 - T210478
07:43 marostegui: Add dbstore1005:3316 to tendril and zarcillo - T210478
07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 T210713 (duration: 00m 52s)
07:18 marostegui: Transfer s6 from dbstore1001 to dbstore1005 using mariadbbackup - T210478
07:09 marostegui: Compress Aria tables to InnoDB on dbstore1002 staging database - T213706
07:07 marostegui: Deploy schema change on db1082, this will generate lag on labsdb s5 - T210713
07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 T210713 (duration: 00m 52s)
07:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3315 T210713 (duration: 00m 53s)
06:55 marostegui: Transfer x1 from dbstore1001 to dbstore1005 using mariadbbackup - T210478
06:51 marostegui: Deploy schema change on db1113:3315 - T210713
06:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3315 T210713 (duration: 00m 53s)
06:43 marostegui: Add dbstore1005:3320 to tendril and zarcillo - T210478
06:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 T210713 (duration: 00m 52s)
06:27 marostegui: Deploy schema change on db1100 - T210713
06:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 T210713 (duration: 00m 53s)
06:14 marostegui: Reboot dbstore1005 - T210478
06:10 marostegui: Add dbstore1003:3311 to tendril - T210478
05:03 tstarling@deploy1001: Synchronized wmf-config/profiler.php: gerrit 478137 (duration: 00m 53s)
05:01 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: gerrit 478137 (duration: 00m 53s)
04:53 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend-labs.php: gerrit 477957 (duration: 00m 53s)
04:52 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: gerrit 477957 (duration: 00m 52s)
04:51 tstarling@deploy1001: Synchronized wmf-config/LabsServices.php: gerrit 477957 (duration: 00m 52s)
04:50 tstarling@deploy1001: Synchronized wmf-config/ProductionServices.php: gerrit 477957 (duration: 00m 56s)
01:35 krinkle@deploy1001: Synchronized errorpages/: Ic093c3122f - rm php-fatal-error.html (duration: 00m 54s)
01:01 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: 477956 and Aaron's 486134 (duration: 00m 52s)
00:59 tstarling@deploy1001: Synchronized errorpages/hhvm-fatal-error.php: (no justification provided) (duration: 00m 53s)
00:58 tstarling@deploy1001: Synchronized multiversion/MWRealm.php: (no justification provided) (duration: 00m 52s)
00:57 tstarling@deploy1001: Synchronized src/ServiceConfig.php: gerrit 477956 (duration: 00m 53s)
00:45 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.14/skins/MinervaNeue/includes/skins/minerva.mustache: SWAT: Restore banners to Wikivoyage project (duration: 00m 52s)
00:42 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/MobileFrontend: SWAT: Explicitly pass in parseHTML T214451 (duration: 00m 55s)
00:34 thcipriani@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/MobileFrontend: SWAT: Explicitly pass in parseHTML T214451 (duration: 00m 57s)

2019-01-23

23:32 crusnov@deploy1001: Finished deploy [netbox/deploy@aa3c342]: Upgrade netbox to 2.5.3 - T212524 (duration: 04m 46s)
23:28 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.14 refs T206668 (duration: 00m 52s)
23:28 crusnov@deploy1001: Started deploy [netbox/deploy@aa3c342]: Upgrade netbox to 2.5.3 - T212524
23:26 chaomodus: scap deploy netbox 2.5.3
23:13 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/Translate/TranslateHooks.php: T214517 T214358 Hot-deploy Ic9d85fec1 to un-block train, hopefully (duration: 00m 53s)
23:00 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaEvents/includes/WikimediaEventsHooks.php: Hot-deploy I81165bf00 to use the right name and value for the cookie (duration: 00m 53s)
22:08 chaomodus: proton1001 restarted nagios-nrpe-server which died from oom
21:30 mutante: scandium - removing npm and nodejs*, testing puppetization to reinstall them
20:50 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.13 refs T206668 (duration: 00m 52s)
20:50 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.13 refs T206668
20:43 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.14 refs T206668 (duration: 00m 52s)
20:42 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.14 refs T206668
20:33 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.14 refs T206668
20:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.13 refs T206668
20:21 twentyafterfour: rolling back because error rate increased significantly after promoting
20:10 twentyafterfour: twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.14 refs T206668
19:33 moritzm: rebooting dubnium to pick up SSBD-enabled qemu
19:03 moritzm: rebooting puppetdb2001 to pick up SSBD-enabled qemu
18:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Disable showing 'depicts' statements on Commons for now via I66d97031 (duration: 00m 52s)
18:44 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/WikimediaEvents/includes/WikimediaEventsHooks.php: Hot-deploy Ief9c9155c to avoid auto-opting new accounts into PHP7 (duration: 00m 53s)
18:35 anomie@deploy1001: Synchronized php-1.33.0-wmf.13/includes/page/WikiPage.php: Add even more temporary logging for T210739 (duration: 00m 54s)
18:26 moritzm: rebooting mendelevium/ticket.wikimedia.org to pick up SSBD-enabled qemu
18:10 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Reapply Enable the Welcome survey on viwiki (duration: 00m 53s)
18:09 sbisson@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/GrowthExperiments/: SWAT: Help panel: ResourceLoaderHelpPanelModule handle help panel disabled (duration: 00m 54s)
18:03 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org
17:57 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org
17:24 dcausse@deploy1001: Finished deploy [search/mjolnir/deploy@a141ad3]: fix retry_on_conflict (duration: 04m 21s)
17:20 dcausse@deploy1001: Started deploy [search/mjolnir/deploy@a141ad3]: fix retry_on_conflict
16:57 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
16:53 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org
16:50 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org
16:44 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org
16:43 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org
16:36 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4001.wikimedia.org
16:31 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
16:14 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
16:13 jbond42: rolling restarts of PDNS recursors/ntpd in codfw/esams/ulsfi/eqsin to pick up openssl security update
16:02 jbond42: restarting ntpd on dns2001
16:00 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
15:57 jynus: adding dbstore1004:s2 to tendril
15:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
15:20 marostegui: Truncate wmf_checksum table on dbstore1002 - T213670
14:55 marostegui: Compress InnoDB on a few tables on dbstore1002 to gain some extra space - T213670
14:18 marostegui: Convert tokudb tables into innodb on dbstore1002 - T213706
13:47 marostegui: Convert a bunch of Aria tables to InnoDB on dbstore1002
13:38 onimisionipe: repooling maps1002
13:32 gehel: restarting kartotherian on maps100[234]
13:30 gehel: restarting kartotherian on maps1003
13:27 marostegui: Migrate some tokudb tables to innodb on dbstore1002 - T213706
13:18 gehel: running cumin 'P{O:cache::upload} and A:eqiad' 'run-puppet-agent'
13:10 zeljkof: EU SWAT finished
12:36 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/AbuseFilter: SWAT: Re-fix the throttle script (T209565) (duration: 00m 55s)
12:32 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/AbuseFilter/: SWAT: Re-fix the throttle script (T209565) (duration: 00m 54s)
12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add new namespace abbreviation for Swedish (sv) (T214329) (duration: 00m 53s)
12:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix project talk namespace alias of Persian Wikipedia (T213733) (duration: 00m 53s)
12:09 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Define ImportSources for nywiki (duration: 00m 54s)
11:44 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: T214456 (duration: 00m 53s)
11:04 arturo: T214299 reboot cloudnet2001-dev, cloudnet2002-dev and cloudnet1003 for new interface names
11:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1097:3315 T210713 (duration: 00m 52s)
10:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097:3315 T210713 (duration: 00m 52s)
10:39 arturo: updating puppet catalog compiler facts: `PUPPET_COMPILER=compiler1002.puppet-diffs.eqiad.wmflabs modules/puppet_compiler/files/compiler-update-facts`
10:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1096:3315 T210713 (duration: 00m 52s)
10:33 Amir1: Deployed patch for T207814 on wmf.14
10:31 Amir1: Deployed patch for T207814 on wmf.13
10:12 marostegui: Deploy schema change on db1096:3315 - T210713
10:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096:3315 T210713 (duration: 00m 53s)
09:39 akosiaris: upgrade mathoid in eqiad and codfw to latest chart version
09:38 akosiaris@deploy1001: scap-helm mathoid finished
09:38 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
09:38 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
09:38 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
09:30 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
09:23 akosiaris@deploy1001: scap-helm mathoid finished
09:23 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
09:23 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml --set resources.replicas=1 staging stable/mathoid [namespace: mathoid, clusters: staging]
09:22 akosiaris@deploy1001: scap-helm mathoid finished
09:22 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
09:22 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
08:55 marostegui: Deploy schema change on s5 codfw master with replication, lag will be generated - T210713
08:44 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki yuewiktionary --batch-size 1000 // T214400
08:28 marostegui: Deploy schema change on db1061 (s6 primary master) - T210713
08:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1088 T210713 (duration: 00m 55s)
08:19 marostegui: Add dbstore1004:3314 to tendril - T210478
08:18 marostegui: Add dbstore1004:3314 to zarcillo - T210478
08:12 marostegui: Deploy schema change on db1088 T210713
08:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1088 T210713 (duration: 00m 52s)
08:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093 T210713 (duration: 00m 52s)
07:51 marostegui: Compress tables on dbstore1004:3314 - T210478
07:48 marostegui: Deploy schema change on db1093 - T210713
07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 T210713 (duration: 00m 54s)
07:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1085 T210713 (duration: 00m 52s)
07:13 marostegui: Deploy schema change on db1085, this will generate lag on s6 labs - T210713
07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1085 T210713 (duration: 00m 53s)
07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 T210713 (duration: 00m 52s)
06:53 marostegui: Deploy schema change on db1113:3316 - T210713
06:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 T210713 (duration: 00m 53s)
06:25 marostegui: Stop s4 on db1102 to clone dbstore1004 - T210478
06:16 marostegui@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Increase parsercache TTL keys from 22 to 24 days T210992 (duration: 01m 06s)
04:05 tstarling@deploy1001: Finished scap: gerrit 480419 (duration: 19m 33s)
03:45 tstarling@deploy1001: Started scap: gerrit 480419
03:44 tstarling@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: gerrit 480419 (duration: 00m 52s)
03:41 tstarling@deploy1001: Synchronized wmf-config/profiler.php: gerrit 480419 (duration: 00m 54s)
03:40 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit 480419 (duration: 00m 54s)
03:38 tstarling@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
03:36 tstarling@deploy1001: Synchronized wmf-config/arclamp.php: gerrit 480419 (duration: 00m 54s)
03:32 tstarling@deploy1001: Synchronized php-1.33.0-wmf.13/LocalSettings.php: gerrit 480419 (duration: 00m 54s)
03:29 tstarling@deploy1001: Synchronized php-1.33.0-wmf.14/LocalSettings.php: gerrit 480419 (duration: 00m 52s)
03:27 tstarling@deploy1001: Synchronized src/XWikimediaDebug.php: gerrit 480419 (duration: 00m 55s)
03:22 TimStarling: manually edited LocalSettings.php in php-1.33.0-wmf.13 and php-1.33.0-wmf.14 to use a relative path, like in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/480695/
03:09 tstarling@deploy1001: Scap failed!: Call to mwscript eval.php returned: None
01:15 mutante: scandium - puppet run now without errors for the first time for the parsoid testing role on stretch instead of jessie. nodejs 10. - @subbu @arlolra you can start using it to replace ruthenium (T201366)
01:12 mutante: scandium - git cloning parsoid from gerrit - mediawiki/services/parsoid/deploy to /srv/deployment/parsoid/deploy ; still needs https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/484602/ (T201366)
01:05 mutante: scandium - deleting /etc/apt/preferences.d/stretch_backports.pref ; apt-get remove nodejs ; apt-get install -t stretch-backports npm ; now has nodejs 10 and npm from backports installed (T201366)
00:58 mutante: scandium - deleting /etc/apt/preferences.d/stretch_backports.pref ; apt-get remove nodejs
00:52 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/ContentTranslation/scripts/purge-unpublished-drafts.php: SWAT T203059 ContentTranslation: Remove waitForReplication for dry-run (duration: 00m 55s)
00:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T213851 Cirrus: Setup archive index shard/replica counts (duration: 00m 54s)
00:05 gtirloni: T209527 disabled notifications for cloudstore100{8,9}

2019-01-22

23:09 cstone: Updated payments-wiki from 7d4cd165d9 to ca7c280f3e
22:22 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.14 refs T206668 (duration: 43m 00s)
21:39 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.14 refs T206668
21:31 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: deploy I91e902 (duration: 01m 39s)
20:26 gehel: resetting cassandra authentication on maps / eqiad
20:25 milimetric@deploy1001: Finished deploy [analytics/refinery@d806b62]: Update jar versions on modified jobs (duration: 06m 48s)
20:19 milimetric@deploy1001: Started deploy [analytics/refinery@d806b62]: Update jar versions on modified jobs
20:07 onimisionipe@deploy1001: deploy aborted: Updating maps1002 to reflect latest changes (duration: 00m 01s)
20:07 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@e847e7b] (stretch): Updating maps1002 to reflect latest changes
20:06 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
20:06 volans: running cumin 'P{O:cache::upload} and A:eqiad' 'run-puppet-agent'
20:03 gehel: running nodetool repair on system_auth for maps / eqiad servers
19:30 arturo: T214299 additional reboot for cloudnet1003
19:03 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@e847e7b] (stretch): Updating maps1002 to reflect latest changes (duration: 01m 02s)
19:02 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@e847e7b] (stretch): Updating maps1002 to reflect latest changes
18:56 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@0bcdd3f]: Update mobileapps to 0aac268 (fix pronunciation detection in mobile-sections T214338) (duration: 04m 00s)
18:52 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@0bcdd3f]: Update mobileapps to 0aac268 (fix pronunciation detection in mobile-sections T214338)
18:36 arturo: T214299 reimaging cloudnet1003 as debian stretch
18:00 milimetric@deploy1001: Finished deploy [analytics/refinery@b07451e]: Denormalized job updates for actor/comment refactor (duration: 17m 24s)
17:43 milimetric@deploy1001: Started deploy [analytics/refinery@b07451e]: Denormalized job updates for actor/comment refactor
17:42 milimetric@deploy1001: Finished deploy [analytics/refinery@372c0b6]: Denormalized job updates for actor/comment refactor (duration: 02m 11s)
17:40 milimetric@deploy1001: Started deploy [analytics/refinery@372c0b6]: Denormalized job updates for actor/comment refactor
17:30 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031 (duration: 00m 55s)
17:29 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031
16:12 XioNoX: deactivate local pref for peering sessions in es/knams - T204281
15:45 akosiaris: upgrade zotero to latest chart version
15:44 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
15:43 akosiaris@deploy1001: scap-helm zotero finished
15:43 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
15:43 akosiaris@deploy1001: scap-helm zotero install -f zotero-values-eqiad.yaml -n production stable/zotero [namespace: zotero, clusters: eqiad]
15:42 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
15:34 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki yuewiktionary // T214400 (1 row)
15:32 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
15:31 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
15:30 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki yuewiktionary --site-group wiktionary // T214400
15:30 akosiaris@deploy1001: scap-helm zotero finished
15:30 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
15:30 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
15:29 addshore: addshore@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki yuewiktionary --site-group wiktionary
15:14 godog: turn on partitions.auto for rsyslog output to kafka - T214309
15:14 marostegui: Add dbstore1003:3317 to tendril - T210478
15:13 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@bb30697] (stretch): monkey patching geoshapes service for maps100[3-4] (duration: 01m 45s)
15:11 mbsantos@deploy1001: Started deploy [kartotherian/deploy@bb30697] (stretch): monkey patching geoshapes service for maps100[3-4]
15:11 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
15:11 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=zotero
15:08 akosiaris@deploy1001: scap-helm zotero finished
15:08 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
15:08 akosiaris@deploy1001: scap-helm zotero install -n production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
15:05 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=zotero
14:56 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment cobalt no restart required (duration: 00m 11s)
14:56 anomie@deploy1001: Synchronized php-1.33.0-wmf.13/includes/page/WikiPage.php: Add more temporary logging for T210739 (duration: 00m 47s)
14:56 thcipriani@deploy1001: Started deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment cobalt no restart required
14:54 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment gerrit2001 no restart required (duration: 00m 10s)
14:54 thcipriani@deploy1001: Started deploy [gerrit/gerrit@6cdece9]: Remove reviewers-by-blame from deployment gerrit2001 no restart required
14:45 onimisionipe: starting init of postgres replication on maps1002 - T198622
14:34 gehel: monkey patch kartotherian configuration to re-add proxy on maps100[34] - T214350
14:18 akosiaris@deploy1001: scap-helm mathoid finished
14:18 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
14:18 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
14:18 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
14:17 akosiaris: upgrade mathoid to the latest chart version (0.0.15)
14:17 akosiaris: upgrade blubberoid to the latest chart version (0.0.5)
14:17 akosiaris@deploy1001: scap-helm mathoid finished
14:17 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
14:17 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml --set resources.replicas=1 staging stable/mathoid [namespace: mathoid, clusters: staging]
14:15 akosiaris@deploy1001: scap-helm mathoid finished
14:15 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
14:15 akosiaris@deploy1001: scap-helm mathoid install -n staging -f mathoid-values.yaml --version=0.0.12 stable/mathoid [namespace: mathoid, clusters: staging]
14:15 akosiaris@deploy1001: scap-helm mathoid install -n staging -f mathoid-values.yaml --version=0.0.12 stable/mathoid [namespace: mathoid, clusters: staging]
14:14 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
14:10 akosiaris@deploy1001: scap-helm blubberoid finished
14:10 akosiaris@deploy1001: scap-helm blubberoid cluster staging completed
14:10 akosiaris@deploy1001: scap-helm blubberoid install -n staging -f blubberoid-values.yaml stable/blubberoid [namespace: blubberoid, clusters: staging]
14:04 akosiaris@deploy1001: scap-helm blubberoid finished
14:04 akosiaris@deploy1001: scap-helm blubberoid cluster codfw completed
14:04 akosiaris@deploy1001: scap-helm blubberoid cluster eqiad completed
14:04 akosiaris@deploy1001: scap-helm blubberoid install -n production -f blubberoid-values.yaml stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
14:04 akosiaris@deploy1001: scap-helm blubberoid upgrade -f blubberoid-values.yaml production stable/blubberoid [namespace: blubberoid, clusters: eqiad,codfw]
13:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098:3316 T210713 (duration: 00m 45s)
13:55 godog: bump logstash kafka consumer threads - T214309
13:41 marostegui: Stop replication in sync on dbstore1001:3316 and db1098:3316
13:35 Amir1: running extensions/Wikibase/lib/maintenance/populateSitesTable.php on all.dblist (T211530 )
13:30 Amir1: EU SWAT is finished
13:29 ladsgroup@deploy1001: Synchronized langlist: SWAT: Add yue to langlist (T211530) (duration: 00m 46s)
13:26 moritzm: installing apt security updates for jessie
13:19 Amir1: ladsgroup@mwmaint1002:~$ mwscript namespaceDupes.php fawiki --fix (T213733)
13:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add new synonyms for namespaces in Persian (fa) (T213733) (duration: 00m 47s)
13:13 moritzm: installing apt security updates for trusty
13:07 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable page issues improvements on English Wikipedia ([T210554]) (duration: 00m 46s)
12:52 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use new logos in IS.php (T150618) (duration: 00m 47s)
12:40 gehel: start stretch upgrade for maps1002 - T198622
12:36 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Upload HD logos for several projects (T150618) (duration: 00m 46s)
12:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove ability for bureaucrats on outreachwiki to remove bureaucrat flag (T214133) (duration: 00m 46s)
12:21 moritzm: installing apt security updates for stretch
12:20 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create extra namespace in kawiktionary (T212956) (duration: 00m 46s)
12:13 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable transwiki user group on ne.wikipedia (T214036) (duration: 00m 47s)
12:09 jynus: running mariabackup on dbstore1001:s1
12:02 Lucas_WMDE: tried and failed to deploy patch for T212118
10:55 marostegui: Deploy schema change on db1098:3316 - T210713
10:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098:3316 T210713 (duration: 00m 45s)
10:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T204031 wikidata: post edit constraint jobs on 25% of edits (duration: 00m 45s)
10:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209504 Decrease WBQualityConstraintsTypeCheckMaxEntities from 300 to 150 (duration: 00m 47s)
10:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T204031 wikidata: post edit constraint jobs on 10% of edits (duration: 00m 47s)
09:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1096:3316 T210713 (duration: 00m 47s)
09:56 gehel@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,name=maps1003.eqiad.wmnet
09:55 gehel: repooling maps1003 after upgrade to stretch - T198622
09:40 marostegui: Deploy schema change on db1096:3316 - T210713
09:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 T210713 (duration: 00m 48s)
09:23 jynus: stop upgrade and restart db1097
08:55 dcausse: elasticsearch: closing indices in search-chi@(eqiad|codfw) moved to other elastic instances (T214052)
08:53 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1097 (duration: 00m 45s)
08:42 moritzm: installing policykit-1 security updates on trusty
08:26 marostegui: Deploy schema change on dbstore1001:3316 - T210713
08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1090:3317 T210478 (duration: 00m 48s)
08:14 marostegui: Compress s7 on dbstore1003 - T210478
06:42 marostegui: Deploy schema change on db1078 (s3 master) - T85757
06:36 marostegui: Stop MySQL on db1090:3317 to clone dbstore1003 - T210478
06:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1090:3317 T210478 (duration: 00m 49s)
05:45 kartik@deploy1001: Finished deploy [cxserver/deploy@e0ca16b]: Update cxserver to c5ff0bf (duration: 04m 15s)
05:40 kartik@deploy1001: Started deploy [cxserver/deploy@e0ca16b]: Update cxserver to c5ff0bf
02:17 onimisionipe: restarting tilerator on maps100[1-2]
00:38 chaomodus: stat1007 nagios-srpe-server was off and alerted, restarting fixed it

2019-01-21

22:33 krinkle@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/TemplateData/includes/api/ApiTemplateData.php: I7647ddfc47 - T213953 (duration: 00m 47s)
19:35 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2040 (duration: 00m 45s)
19:23 jynus: mysql.py -h db1115 zarcillo -e "UPDATE masters SET instance = 'db2047' WHERE section = 's7' and dc = 'codfw'" T214264
18:55 jynus: stop and upgrade db2040 T214264
18:52 onimisionipe: pool maps1003 - postgresql sql lag issues has been fixed
18:24 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2040, promote db2047 to s7 master (duration: 00m 46s)
17:51 jynus: stop and apply puppet changes to db2047 T214264
17:44 jynus: stop replication on db2040 for master switch T214264
17:16 jynus: stop and upgrade db2054
16:03 arturo: T214303 reimaging/renaming labtestneutron2002.codfw.wmnet (jessie) to cloudnet2002-dev.codfw.wmnet (stretch)
15:58 onimisionipe: reinitializing slave replication(postgres) on maps1003
15:52 jynus: stop and upgrade db2061
15:19 dcausse: closing frwikiquote_* indices on elasticsearch search-chi@codfw (T214052)
15:11 dcausse: closing frwikiquote_* indices on elasticsearch search-chi@eqiad (T214052)
13:58 marostegui: Compress enwiki on dbstore1003:3311 - T210478
12:36 jijiki: Restarting memcached on mc1025 to apply '-R 200' - T208844
11:25 onimisionipe: depool maps1003 to fix replication lag issues
10:51 elukey: disable puppet fleetwide to ease the merge/deploy of a puppet admin module change - T212949
10:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 - T85757 (duration: 00m 44s)
10:33 jynus: upgrade and restart db2047 T214264
10:26 addshore@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/ArticlePlaceholder/includes/AboutTopicRenderer.php: T213739 Pass a usageAccumulator to SidebarGenerator (duration: 00m 47s)
10:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1089 (duration: 00m 45s)
09:57 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1089 (duration: 00m 45s)
09:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly Repool db1089 T210478 (duration: 00m 45s)
09:30 marostegui: Compress a few tables on dbstore1003:3315 - T210478
08:35 marostegui: Stop replication db1077 to deploy schema change - T85757
08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 - T85757 (duration: 00m 46s)
08:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 - T85757 (duration: 00m 48s)
08:10 moritzm: installing OpenSSL security updates
07:39 marostegui: Stop replication on db1124:3313 to fix triggers - T85757
07:00 marostegui: Stop MySQL on db1089 to clone dbstore1003 - T210478
07:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T210478 (duration: 00m 47s)
06:54 marostegui: Deploy schema change on db1123 - T85757
06:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 - T85757 (duration: 00m 50s)
06:47 marostegui: Drop tag_summary table from db1023, db1077, db1075 and db1078 T212255
06:45 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5010.eqsin.wmnet
06:32 marostegui: Drop tag_summary table from db1095:3313 - T212255
06:27 marostegui: Drop tag_summary table from dbstore1002:s3 - T212255
06:12 marostegui: Drop tag_summary table from s3 codfw - T212255
06:09 marostegui: tag_summary table from s8 - T212255

2019-01-20

15:13 marostegui: Force WriteBack on db2040 - T214264
01:07 cdanis: cdanis@wdqs1004.eqiad.wmnet /var/log/wdqs % sudo service wdqs-blazegraph restart

2019-01-19

22:12 ariel@deploy1001: Finished deploy [dumps/dumps@ab79bbb]: multistream dumps in parallel, recombine gz and multistream without decompression (duration: 00m 03s)
22:12 ariel@deploy1001: Started deploy [dumps/dumps@ab79bbb]: multistream dumps in parallel, recombine gz and multistream without decompression
20:34 gtirloni: upgraded and rebooted labstore200{3,4}
12:34 onimisionipe: pool maps1003 - stretch migration is complete T198622
12:08 elukey: run 'start all slaves' on dbstore1002 after crash
08:42 marostegui: Fixing dbstore1002 x1 replication T213670
07:36 elukey: restart pdfrender on scb1004
05:55 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 14s)
05:55 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
05:55 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 15s)
05:55 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
05:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step (duration: 00m 13s)
05:46 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: test swapping venv build to scap fetch/script step
05:25 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@af21320]: bump discovery analytics to latest (duration: 00m 17s)
05:25 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@af21320]: bump discovery analytics to latest
05:18 legoktm@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/JsonConfig/includes/JCCache.php: Revert "JCCache: Explicit load the main slot to avoid API warnings" - T214179 (duration: 00m 58s)

2019-01-18

23:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #3 (duration: 01m 01s)
23:56 mobrovac@deploy1001: Started deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #3
23:55 mobrovac@deploy1001: Finished deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #2 (duration: 00m 18s)
23:54 mobrovac@deploy1001: Started deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation), take #2
23:53 mobrovac@deploy1001: Finished deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation) - T212418 (duration: 00m 34s)
23:53 mobrovac@deploy1001: Started deploy [restbase/deploy@f24d681]: Deploy latest version to restbase1016 (was out of rotation) - T212418
20:47 mobrovac: restbase/cassandra bootstrap restbase1016-c - T212418
20:47 mobrovac: restbase/cassandra bootstrap restbase1016-c
17:24 godog: bootstrap cassandra-b on restbase1016 - T212418
17:06 marostegui: Reload haproxy on dbproxy1009 after rack a2 maintenance
16:14 arturo: T214167 reimage+rename labtestneutron2001.codfw.wmnet (jessie) to cloudnet2001-dev.codfw.wmnet (stretch)
15:36 moritzm: rebooting mwdebug servers in codfw to pick up SSBD-enabled qemu
15:27 moritzm: rebooting elnath to pick up SSBD-enabled qemu
13:41 marostegui: reload haproxy on dbproxy1004
13:18 godog: start cassandra-a on restbase1016 - T212418
13:07 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0d11a2b] (stretch): Updating stretch instance with latest code, maps1003 have wrong dependencies installed (duration: 00m 45s)
13:06 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0d11a2b] (stretch): Updating stretch instance with latest code, maps1003 have wrong dependencies installed
12:50 moritzm: uploaded ferm 2.4-1+wmf1 to buster-wikimedia (T213527)
11:46 moritzm: copied prometheus-rsyslog-exporter from stretch-wikimedia to buster-wikimedia
11:09 marostegui: Deploy schema change on db2039 (s6 codfw master) - T210713
10:54 marostegui: Deploy schema change on dbstore2001:3316 - T210713
10:42 jynus: killing and removing data from db1118
10:41 marostegui: Deploy schema change on db2076 - T210713
10:29 vgutierrez: restarting pybal in lvs2002 - T214072
10:23 marostegui: Deploy schema change on db2087:3316 - T210713
10:23 vgutierrez: restarting pybal in lvs2005 - T214072
10:02 marostegui: Add dbstore1003:3315 to zarcillo - T210478
09:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3315 - T210478 (duration: 00m 45s)
09:57 marostegui: Add dbstore1003:3315 to tendril - T210478
09:53 marostegui: Deploy schema change on db2089 - T210713
09:35 marostegui: Deploy schema change on db2067 - T210713
09:29 _joe_: uploading python{,3}-pygerrit2 to stretch-wikimedia, T214149
09:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Add migrated wikis from s3 to s5 to codfw config T184805 (duration: 00m 45s)
09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 after mysql upgrade (duration: 00m 46s)
08:12 godog: depool and take snapshots of prometheus data on prometheus2003 to test v2 conversion - T187987
07:31 moritzm: rolling restart of AQS to pick up OpenSSL security updates for nodejs
07:30 marostegui: Stop MySQL on db1113:3315 and db1113:3316 to clone dbstore1003 and for mysql and kernel upgrade
07:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 for mysql upgrade (duration: 00m 45s)
07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3315 - T210478 (duration: 00m 46s)
07:16 moritzm: installing OpenSSL security updates
06:54 marostegui: Drop table tag_summary from s7 - T212255
06:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1075 and db1103 after DC hw maintenance (duration: 00m 44s)
06:46 marostegui: Deploy schema change on dbstore1002:s3 - T85757
06:29 marostegui: Deploy schema change on db1075 - T85757
06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool DBs on A2 rack T213748 (duration: 00m 47s)
00:00 ejegg: updated payments-wiki from c455bbc6bb to 7d4cd165d9

2019-01-17

23:02 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes, 2nd try (duration: 02m 03s)
23:00 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes, 2nd try
19:29 catrope@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/GrowthExperiments/: Make welcome survey C unescapable (T213958) (duration: 00m 52s)
19:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update groupOverrides for Serbian wikis (T213055, T213059, T213063, T213065, T213679, T213680, T213681, T213682, T213684, T213685, T213686, T213687, T213824, T213825, T213826, T213827, T213828, T213829, T213830, T213832) (duration: 00m 53s)
19:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@f24d681]: Update recommendation api endpoints (duration: 20m 26s)
18:42 ppchelko@deploy1001: Started deploy [restbase/deploy@f24d681]: Update recommendation api endpoints
18:22 vgutierrez: running ipvsadm -D -t 10.2.1.29:1968 in lvs2003 - T214041
18:19 vgutierrez: running ipvsadm -D -t 10.2.1.29:1968 in lvs2006 - T214041
18:18 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5ba7582]: Update to I25c97e (duration: 05m 36s)
18:12 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5ba7582]: Update to I25c97e
17:52 elukey: re-enable eventlogging mysql clients and db1108's el replication after db1107 maintenance
17:38 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs on wikidatawiki (5% of edits) T204031 (duration: 00m 52s)
17:25 dcausse: restarting mjolnir services on all elastic* nodes
17:19 dcausse@deploy1001: Finished deploy [search/mjolnir/deploy@85aec7a]: fix multi-instances support (duration: 03m 42s)
17:15 dcausse@deploy1001: Started deploy [search/mjolnir/deploy@85aec7a]: fix multi-instances support
16:57 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/VisualEditor/modules/ve-mw/: T213922: Revert 48db45df7602 for wmf.12 (duration: 00m 52s)
16:56 jforrester@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/VisualEditor/modules/ve-mw/: T213922: Revert 48db45df7602 for wmf.13 (duration: 00m 51s)
16:46 dcausse@deploy1001: Finished deploy [search/mjolnir/deploy@42414ca]: add support for multi-instances setup (duration: 04m 59s)
16:45 paravoid: updating ps1-a3-eqiad's SNMP communities to the new ones
16:41 dcausse@deploy1001: Started deploy [search/mjolnir/deploy@42414ca]: add support for multi-instances setup
16:28 fsero: uncordoned kubernetes1001
16:27 fsero@puppetmaster1001: conftool action : set/pooled=yes; selector: name=kubernetes1001.eqiad.wmnet
16:19 moritzm: rebooting roentgenium (failoid node in eqiad) to enable SSBD-enabled qemu
16:18 cmjohnson1: ps1-a2-eqiad removing redundant power from side A to replace blown fuse
16:16 moritzm: rebooting tureis (failoid node in codfw) to enable SSBD-enabled qemu
15:12 moritzm: rebooting archiva1001 (archiva.wikimedia.org) to enable SSBD-enabled qemu
14:49 moritzm: rebooting darmstadtium (docker registry) to enable SSBD-enabled qemu
14:36 jbond42: rolling out update for debdeploy 0.0.99.6-1 -> 0.0.99.7-1 T207845
14:24 anomie: Restarting migrateActors.php on s3
14:19 marostegui: Drop empty frimpressions database from m2 - T213973
14:04 vgutierrez: running ipvsadm -D -t 10.2.2.29:1968 in lvs1016 - T214041
14:03 vgutierrez: running ipvsadm -D -t 10.2.2.29:1968 in lvs1006 - T214041
14:01 gehel: pooling maps1004 (first time after stretch upgrade) - T198622
13:46 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: dc=.*,service=.*,cluster=kubernetes,name=kubernetes1001.eqiad.wmnet
13:38 gehel: starting upgrade to stretch for maps1003 - T198622
12:59 addshore: swat done!
12:58 fsero@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet
12:58 addshore@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/Wikibase/view/resources/jquery/wikibase/jquery.wikibase.badgeselector.js: T213998 Fix js type error when adding badges to items (duration: 00m 53s)
12:53 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Enable CirrusSearchCrossClusterSearch (duration: 00m 51s)
12:46 dcausse@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/UploadWizard/: T214007: Don't reuse existing input object (duration: 00m 53s)
12:41 gtirloni: imported nfsd-ldap_1.2+deb9u1 in stretch-wikimedia (T209527)
12:41 fsero: poweroff kubernetes1001 - T213859
12:40 dcausse@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/CirrusSearch/: Hack around cross cluster search bug (duration: 00m 59s)
12:34 gehel: shutting down relforge1001 for PDU swap - T213859
12:33 akosiaris@deploy1001: Finished deploy [citoid/deploy@269c9c7]: (no justification provided) (duration: 00m 48s)
12:32 akosiaris@deploy1001: Started deploy [citoid/deploy@269c9c7]: (no justification provided)
12:29 dcausse@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/CirrusSearch/: Hack around cross cluster search bug (duration: 01m 00s)
12:25 godog: poweroff restbase1010 / restbase1011 before A3 maint - T213859
12:19 jynus: killing migrateActors.php --wiki=ptwiki on mwmaint, was using outdated db config T188327
12:17 jijiki: poweroff rdb1005.eqiad.wmnet before A3 maint - T213859
12:11 godog: poweroff ms-be1019 / ms-be1044 / ms-be1045 before A2 maint - T213748
12:09 mvolz@deploy1001: scap-helm zotero finished
12:09 mvolz@deploy1001: scap-helm zotero cluster codfw completed
12:09 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
12:08 elukey: stop mariadb and shutdown db1107 to ease rack a2 maintenance
12:04 mvolz@deploy1001: scap-helm zotero finished
12:04 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
12:04 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
11:56 mvolz@deploy1001: scap-helm zotero finished
11:56 mvolz@deploy1001: scap-helm zotero cluster staging completed
11:56 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml --version=0.0.1 stable/zotero [namespace: zotero, clusters: staging]
11:55 arturo: T209527 copy nfsd-ldap between jessie-wikimedia and stretch-wikimedia in reprepro. It will require a rebuild though bc updated build-deps/deps
11:55 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
11:43 marostegui: Poweroff db1082 db1081 db1080 db1079 db1075 db1074 es1012 es1011 - T213748
11:36 mvolz@deploy1001: scap-helm zotero finished
11:36 mvolz@deploy1001: scap-helm zotero cluster codfw completed
11:36 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
11:16 onimisionipe: shutdown elastic103[0-5] to prepare for T213859
11:09 elukey: stop eventlogging on eventlog1002 and eventlogging replication on db1108 as prep step for db1107 maintenance
10:55 marostegui: Lag will be generated on labs due to maintenance on sanitarium db masters
10:54 marostegui: Stop MySQL on db1082 db1081 db1080 db1079 db1075 db1074 es1012 es1011 - T213748
10:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool DBs on A2 rack T213748 (duration: 00m 54s)
10:39 moritzm: installing libcaca security updates
10:30 arturo: T213859 icinga downtime cloudservices1004 for 1 day
10:29 moritzm: installing ruby-loofah security updates
10:09 marostegui: Stop MySQL on db1103:3312 and db1103:3314, also poweroff the server - T213859
10:08 moritzm: installing krb5 security updates on trusty
10:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 - T213859 (duration: 00m 53s)
09:59 marostegui: Poweroff dbproxy1001 dbproxy1002 dbproxy1003 for a3 maintenance - T213859
09:25 marostegui: Poweroff dbstore1003 for hw maintenance T213859
09:24 moritzm: power off graphite1003 for later hw maintenance (T213859)
09:18 marostegui: Deploy schema change on db1095:3313 - T85757
09:02 vgutierrez: rolling NIC firmware upgrade cp[1081-1090] - T203194
08:42 jijiki: Enabling puppet on rdb1005 and switch redis::misc::master to rdb1006 - T213859
08:37 moritzm: installing remaining systemd security updates on stretch
08:32 jijiki: Restarting nutcracker on scb100* for 484572 - T213859
08:32 jynus: stop, upgrade and restart db1075
08:31 marostegui: Deploy schema change on s3 codfw, lag will be generated - T85757
08:28 marostegui: Drop table tag_summary from enwiki - T212255
08:24 jijiki: Disabling puppet on rdb1005 and switch redis::misc::master to rdb1006 - T213859
07:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase weight for db1123 (duration: 00m 53s)
07:20 marostegui: Change thread_pool_stall_limit on db1075 and db1078 - T213858
07:18 marostegui: Enable GTID on db1075 - T213858
07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s3 ready only T213858 (duration: 00m 30s)
07:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s3master eqiad from db1075 to db1078 T213858 (duration: 00m 30s)
07:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s3 on read-only T213858 (duration: 00m 31s)
07:00 marostegui: Start s3 failover T213858
06:30 marostegui: Disable puppet on db1075 and db1078 - T213858
06:26 marostegui: Enable GTID back on all hosts but db1075 db1078 - T213858
06:19 marostegui: Change s3 topology to get ready for s3 failover - T213858
06:14 marostegui: Disable gtid on s3 hosts - T213858
06:10 marostegui: Downtime s3 hosts for 2 hours - T213858
04:12 ppchelko@deploy1001: Finished deploy [mobileapps/deploy@89c4d8d]: revert new summary (duration: 01m 55s)
04:10 ppchelko@deploy1001: Started deploy [mobileapps/deploy@89c4d8d]: revert new summary
04:02 cdanis@deploy1001: Started restart [parsoid/deploy@4b82683]: (no justification provided)

2019-01-16

23:25 ppchelko@deploy1001: Finished deploy [recommendation-api/deploy@0ff39e2]: Deployment attempt with decreased worker count (duration: 04m 08s)
23:21 ppchelko@deploy1001: Started deploy [recommendation-api/deploy@0ff39e2]: Deployment attempt with decreased worker count
23:10 Krinkle: krinkle@tungsten:/srv/: rm -rf xhprof; for T196406
21:35 ppchelko@deploy1001: Finished deploy [recommendation-api/deploy@c1b6b32]: Rollback update to 1a1f824 (duration: 01m 59s)
21:33 ppchelko@deploy1001: Started deploy [recommendation-api/deploy@c1b6b32]: Rollback update to 1a1f824
21:29 ppchelko@deploy1001: deploy aborted: log (duration: 00m 02s)
21:29 ppchelko@deploy1001: Started deploy [recommendation-api/deploy@da83637]: log
21:28 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@da83637]: Update to 1a1f824 (duration: 06m 14s)
21:22 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@da83637]: Update to 1a1f824
21:17 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes (duration: 06m 31s)
21:10 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@6b344ca]: Update mobileapps to 258d76b page summary changes
20:20 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.13 (duration: 00m 51s)
20:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.13
19:48 gehel: switching wdqs categories traffic to new second instance, puppet will be disabled during the operation on all wdqs nodes - T213212
19:29 thcipriani: restarting ci jenkins for upgrade
19:13 thcipriani: restarting gerrit on cobalt for 2.15.8 upgrade
19:12 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on cobalt (duration: 00m 10s)
19:12 thcipriani@deploy1001: Started deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on cobalt
19:09 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on gerrit2001 only (duration: 00m 11s)
19:09 thcipriani@deploy1001: Started deploy [gerrit/gerrit@cec7995]: Gerrit to 2.15.8 on gerrit2001 only
19:04 thcipriani: starting gerrit upgrade to 2.15.8
18:56 mutante: upgraded jenkins version for jessie and stretch in apt.wikimedia.org to latest LTS
18:16 addshore: deploy slot done
18:13 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs enabled on wikidatawiki (1% of edits) T204031 (duration: 00m 51s)
18:07 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@0aa107a]: Re-deploy for fixing vars.sh (duration: 11m 49s)
18:03 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs enabled on testwikidatawiki T204031 (duration: 00m 52s)
17:55 smalyshev@deploy1001: Started deploy [wdqs/wdqs@0aa107a]: Re-deploy for fixing vars.sh
17:53 jynus: stop upgrade and restart db1111
17:36 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: [cirrus] Start using replica group settings (take 2) (T210381) (duration: 00m 51s)
17:35 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] Start using replica group settings (take 2) (T210381) (duration: 00m 51s)
17:22 vgutierrez: rolling NIC firmware upgrade cp[1077-1080] - T203194
17:18 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EditorJourney: Enable data collection for viwiki T213348 (duration: 00m 52s)
17:07 anomie@deploy1001: Synchronized php-1.33.0-wmf.12/includes/page/WikiPage.php: Add temporary logging for T210739 (duration: 00m 53s)
17:05 vgutierrez: upgrading NIC firmware in cp1076 - T203194
17:01 gehel@deploy1001: Finished deploy [wdqs/wdqs@6685dc0]: multi instance fixes (duration: 00m 27s)
17:01 gehel@deploy1001: Started deploy [wdqs/wdqs@6685dc0]: multi instance fixes
16:58 gehel@deploy1001: Finished deploy [wdqs/wdqs@6685dc0]: multi instance fixes (duration: 10m 29s)
16:53 jynus: stop upgrade and restart db1112
16:47 gehel@deploy1001: Started deploy [wdqs/wdqs@6685dc0]: multi instance fixes
16:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 (duration: 00m 52s)
16:45 vgutierrez: upgrading NIC firmware on cp1075 - T203194
16:08 jynus: upgrade and stop db1123
16:02 jbond42: Import new debdeploy 0.0.99.7 packages for trusty T207845
15:59 jbond42: Import new debdeploy 0.0.99.7 packages for buster T207845
15:59 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 (duration: 00m 52s)
15:58 otto@deploy1001: Finished deploy [analytics/superset/deploy@f73b897]: bump to 0.26.3-wikimedia2 with chart format string fix (duration: 00m 36s)
15:57 otto@deploy1001: Started deploy [analytics/superset/deploy@f73b897]: bump to 0.26.3-wikimedia2 with chart format string fix
15:56 jbond42: Import new debdeploy 0.0.99.7 packages for jessie T207845
15:41 jbond42: "Import new debdeploy 0.0.99.7 packages for stretch T207845
15:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 T209815 (duration: 00m 52s)
15:12 addshore: addshore@mwmaint1002:~$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Matthias_Geisler // T213928
14:56 jynus: stop upgrade db1125 (this may cause temp. lag on labsdb hosts for s7, s6, s4, s2)
14:35 otto@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: attempt to deploy 0.26.3-wikimedia1
14:29 jynus: stop upgrade db1124 (this may have temp. lag on labsdb hosts for s1, s3, s5, s8)
14:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1019 fully (duration: 00m 52s)
14:05 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1019 with low load (duration: 00m 52s)
13:15 marostegui: Stop MySQL on db1078 and power it off for firmware update - T209815
13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 T209815 (duration: 00m 52s)
13:12 dcausse: eu SWAT done
13:06 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 fully (duration: 00m 52s)
12:41 addshore@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/WikibaseQualityConstraints: gerrit:484654 T204031 T204022 Fix constraintsRunCheck Job class & test (duration: 00m 54s)
12:40 addshore@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/WikibaseQualityConstraints: gerrit:484654 T204031 T204022 Fix constraintsRunCheck Job class & test (duration: 00m 57s)
12:25 reedy@deploy1001: Synchronized wmf-config/throttle.php: T213848 (duration: 00m 53s)
12:21 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Deploy the FileExporter as a beta feature on all Wikimedia wikis (T213425) (duration: 00m 53s)
12:12 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Partial Blocks on itwiki (T210444) (duration: 00m 53s)
12:12 jynus: upgrade and restart db1095
11:02 fsero: draining kubernetes1001 for maintenance T213859
10:59 addshore: slot done
10:59 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsEnableConstraintsCheckJobs false (duration: 00m 51s)
10:53 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsEnableConstraintsCheckJobs true wd (duration: 00m 52s)
10:48 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsEnableConstraintsCheckJobs true testwd (duration: 00m 52s)
10:38 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 1% T204031 gerrit:484621 (duration: 00m 52s)
10:28 godog: restart rsyslog on wezen, tls listener stuck
10:25 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 with low load (duration: 00m 51s)
10:19 elukey: executed kafka preferred-replica-election on the logging Kafka cluster as attempt to spread load more uniformly
10:19 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 100 T204031 gerrit:484621 (duration: 00m 52s)
10:18 addshore@deploy1001: sync-file aborted: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 100 T204031 gerrit:484621 (duration: 00m 02s)
10:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 50 T204031 gerrit:484621 (duration: 00m 52s)
10:13 addshore@deploy1001: sync-file aborted: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 50 T204031 gerrit:484621 (duration: 00m 00s)
10:03 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY, gerrit:484621 (duration: 00m 52s)
09:53 godog: upgrade controller firmware on ms-be1016 - T213856
09:47 jynus: upgrade and restart db1077
09:42 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 (duration: 00m 52s)
09:29 marostegui: Stop s3 actor-migration script in order to allow s3 to catch up and to avoid lag during the failover - T188327 T213858
09:17 godog: powercycle ms-be1016 - T213856
09:16 marostegui: Stop replication in sync on dbstore1002:x1 and db2034 - T213670
09:10 dcausse: T210381: elasticsearch search cluster, creating completion suggester indices on psi&omega elastic instances in eqiad&codfw
09:00 godog: test roll-restart rsyslog on mw hosts in eqiad - T211124
08:58 akosiaris@deploy1001: scap-helm zotero finished
08:58 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
08:58 akosiaris@deploy1001: scap-helm zotero install -n production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
08:57 marostegui: Re-point m3-master from dbproxy1003 to dbproxy1008 - T213865
08:53 moritzm: installing systemd security updates for stretch
08:53 akosiaris: depool zotero eqiad for helm release cleanup
08:47 akosiaris: repool zotero in codfw
08:42 filippo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Default to new logging infrastructure - T211124 (duration: 01m 05s)
08:40 akosiaris@deploy1001: scap-helm zotero finished
08:40 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
08:40 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
08:30 akosiaris@deploy1001: scap-helm zotero finished
08:30 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
08:30 akosiaris@deploy1001: scap-helm zotero install -n production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
08:25 akosiaris@deploy1001: scap-helm zotero finished
08:25 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
08:25 akosiaris@deploy1001: scap-helm zotero install -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
08:24 marostegui: Drop table tag_summary from s4 - T212255
08:19 elukey: convert aria tables to innodb on dbstore1002 - T213706
08:18 akosiaris: depool codfw zotero for helm release cleanups
08:15 marostegui: Upgrade MySQL on db2043 (s3 codfw master)
08:11 elukey: drop unneeded tables from the staging db on dbstore1002 according to T212493#4883535
07:36 vgutierrez: powercycling cp1088 - T203194
07:27 marostegui: Drop table tag_summary from s2 - T212255
07:14 marostegui: Upgrade MySQL on db2050 and db2036
06:07 SMalyshev: started transfer wdqs2005->2006
06:06 marostegui: Deploy schema change on db1067 (s1 primary master) - T85757
06:01 SMalyshev: depooling wdq2005 and wdqs2006 for T213854
01:02 SMalyshev: repooled wdqs200[45] for now, 2006 still not done, will get to it later today
00:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@a04ebdd]: Restart RESTBase to pick up the fact that restbase1016 is not there - T212418 (duration: 21m 34s)

2019-01-15

23:54 mobrovac@deploy1001: Started deploy [restbase/deploy@a04ebdd]: Restart RESTBase to pick up the fact that restbase1016 is not there - T212418
22:53 tzatziki: removing one file for legal compliance
22:50 jforrester@deploy1001: Synchronized php-1.33.0-wmf.13/extensions/WikibaseMediaInfo/resources/filepage/CaptionsPanel.js: Hot-deploy Ibb1f763f to unbreak setting captions on WikibaseMediaInfo (duration: 00m 51s)
22:39 SMalyshev: repooled wdqs1008
21:49 XioNoX: re-activate BGP to Zayo on cr1-eqiad - T212791
21:39 SMalyshev: depooling wdqs2005 for T213854
21:23 mutante: contint1001 rmdir /srv/org/wikimedia/integration/coverage ; rmdir /srv/org/wikimedia/integration/logs (T137890)
21:21 mutante: doc.wikimedia.org httpd config has been removed from contint1001, is now on doc1001
21:13 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.13
21:09 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.13 and rebuild l10n cache (duration: 32m 42s)
20:36 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.13 and rebuild l10n cache
20:33 dduvall@deploy1001: Pruned MediaWiki: 1.33.0-wmf.8 (duration: 03m 04s)
20:30 dduvall@deploy1001: Pruned MediaWiki: 1.33.0-wmf.6 (duration: 09m 15s)
19:36 SMalyshev: started copying wdqs1008->wdqs2004 for T213854
19:28 SMalyshev: depooling wdqs1008 and wdqs2004 for DB copying for T213854
18:52 bblack: authdns-update for https://gerrit.wikimedia.org/r/c/operations/dns/+/484546 (make normal git stuff match manual changes already in place)
18:44 hashar: [2019-01-15 18:44:06,959] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 2.15.6-5-g4b9c845200 ready
18:43 hashar: Restarting Gerrit to catch up with a DNS change with the database
18:43 volans: restarted debmonitor on debmonitor1001
18:40 bblack: DNS manually updated for m1-master -> dbproxy1006 and m2-master -> dbproxy1007
17:26 godog: roll-restart logstash in eqiad - T213081
17:21 godog: depool logstash1007 before restarting logstash - T213081
17:13 godog: set partitions to 3 for existing kafka-logging topics - T213081
17:06 XioNoX: move back cr1-eqiad:xe-4/1/3 to xe-3/3/1 - T212791
16:57 XioNoX: move cr1-eqiad:xe-3/3/1 to xe-4/1/3 - T212791
16:52 jynus: stop db1115 for hw maintenance
16:50 godog: roll-restart kafka-logging in eqiad to apply new topic defaults - T213081
16:00 jynus: stop es1019 for hw maintenance T213422
15:53 dcausse: T210381: elastic search clusters, catching up updates since first import on new psi&omega clusters in eqiad&codfw (from mwmaint1002)
15:10 fdans@deploy1001: Finished deploy [analytics/superset/deploy@UNKNOWN]: reverting deploy of 0.26.3-wikimedia1 (duration: 00m 32s)
15:10 fdans@deploy1001: Started deploy [analytics/superset/deploy@UNKNOWN]: reverting deploy of 0.26.3-wikimedia1
15:02 fdans@deploy1001: Finished deploy [analytics/superset/deploy@9d6156a]: reverting deploy of 0.26.3-wikimedia1 (duration: 06m 06s)
15:01 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103 (duration: 00m 48s)
14:56 fdans@deploy1001: Started deploy [analytics/superset/deploy@9d6156a]: reverting deploy of 0.26.3-wikimedia1
14:41 fdans@deploy1001: Finished deploy [analytics/superset/deploy@408a30e]: deploying 0.26.3-wikimedia1 (duration: 00m 36s)
14:40 fdans@deploy1001: Started deploy [analytics/superset/deploy@408a30e]: deploying 0.26.3-wikimedia1
14:14 moritzm: rebooting acamar
13:53 marostegui: Downtime db1115 and es1019 for 4 hours - T196726 T213422
13:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 T85757 (duration: 00m 46s)
13:15 marostegui: Deploy schema change on db1119 - T85757
13:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 T85757 (duration: 00m 46s)
13:00 elukey: restart memcached on mc1024 to pick up new settings (-R 200) - T208844
12:47 dcausse: EU SWAT done
12:36 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T210381: [cirrus] Start writing to psi & omega (take 2) (2/2) (duration: 00m 45s)
12:33 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T210381: [cirrus] Start writing to psi & omega (take 2) (1/2) (duration: 00m 45s)
12:15 onimisionipe: starting upgrading of prometheus-elasticsearch-exporter for eqiad T210592
12:14 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change links of wgGEHelpPanelLinks for kowiki T209467 (duration: 00m 46s)
12:09 dcausse@deploy1001: Synchronized wmf-config/CommonSettings.php: [cirrus] Add cirrussearch-big-indices tag T210381 (duration: 00m 46s)
12:06 jynus: upgrade and restart db1103
12:03 onimisionipe: starting upgrading of prometheus-elasticsearch-exporter for codfw T210592
11:50 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103 (duration: 00m 45s)
11:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 fully (duration: 00m 45s)
11:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 T85757 (duration: 00m 45s)
11:02 jynus: dropping database test on db1124:s5 with replication
11:01 elukey: run 'apt-get purge tmpreaper' on mw1297,1298,2150,2151,2244,2245 (all role spare) to avoid daily cronspam
10:58 END: (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) (volans@cumin2001)
10:57 marostegui: Deploy schema change on db1083 - T85757
10:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T85757 (duration: 00m 46s)
10:53 START: - Cookbook sre.hosts.upgrade-and-reboot (volans@cumin2001)
10:49 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1091 with low load (duration: 00m 45s)
10:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 T85757 (duration: 00m 45s)
10:20 marostegui: Deploy schema change on db1080 - T85757
10:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 T85757 (duration: 00m 45s)
10:19 jynus: upgrade and restart db1091
10:16 moritzm: installing zeromq3 security updates on stretch (jessie/trusty not affected)
10:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1114 T85757 (duration: 00m 45s)
09:51 marostegui: Deploy schema change on db1114 - T85757
09:51 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1114 T85757 (duration: 00m 45s)
09:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T85757 (duration: 00m 46s)
09:25 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1091 (duration: 00m 46s)
09:20 addshore: deploy slot done
09:18 jynus: upgrade and restart db2078
09:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgWBQualityConstraintsTypeCheckMaxEntities 300, T209504 (duration: 00m 46s)
09:06 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209922 Add WikibaseQualityConstraints configs in testwikidatawiki (duration: 00m 47s)
08:38 marostegui: Stop replication on s1 on all labs hosts - T85757
08:28 marostegui: Deploy schema change on db1106 - T85757
08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T85757 (duration: 00m 45s)
08:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 T85757 (duration: 00m 46s)
08:02 marostegui: Deploy schema change on db1089 - T85757
08:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T85757 (duration: 00m 45s)
07:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 T85757 (duration: 00m 46s)
07:28 marostegui: Drop tag_summary from wikitech - T212255
07:20 marostegui: Drop tag_summary from s5 - T212255
07:07 marostegui: Deploy schema change on db1099:3311 - T85757
07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 T85757 (duration: 00m 45s)
06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1007 in pc1 - T208383 (duration: 00m 49s)
02:12 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c920aec]: Re-deploy namespace script (duration: 08m 42s)
02:04 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c920aec]: Re-deploy namespace script
01:54 mutante: wdqs1009 - icinga alerts about Blazegraph process for wdqs categories. starting wdsq blazegraph,.. already running
01:12 mutante: cp1078 - bnxt_en - TX timeout detected - Host cp1078 is DOWN - powercycled via mgmt (T203194)
00:44 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Welcome survey experiment 2: 50% variation A, 50% variation C (duration: 00m 46s)
00:37 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/: Make welcome survey config use array_plus_2d (duration: 00m 46s)
00:34 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/resources/lib/ooui/oojs-ui-core.js: OOUI backport (T213544) (duration: 00m 46s)
00:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Improve list of privileged groups (duration: 00m 46s)

2019-01-14

23:49 gehel@deploy1001: Finished deploy [wdqs/wdqs@59d5f40]: New wdqs startup script for multi-instance (duration: 09m 53s)
23:39 gehel@deploy1001: Started deploy [wdqs/wdqs@59d5f40]: New wdqs startup script for multi-instance
23:30 mutante: doc1001 - disabling puppet, testing apache config change 483775
23:12 ejegg: updated fundraising CiviCRM from 5580f0b11c to 6042acb363
22:39 andrewbogott: upgraded packages and MW version on wikitech-static
21:30 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@89c4d8d]: Update mobileapps to f2658de (fix ITN explore feed for dawiki) (duration: 03m 51s)
21:26 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@89c4d8d]: Update mobileapps to f2658de (fix ITN explore feed for dawiki)
20:37 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/resources/Resources.php: Hot-deploy I18193b19 to add missing message for OOUI v0.30.0 (duration: 00m 47s)
20:27 gehel@deploy1001: Finished deploy [wdqs/wdqs@f71131e]: upgradign wdqs1010 to latest version (duration: 00m 24s)
20:27 gehel@deploy1001: Started deploy [wdqs/wdqs@f71131e]: upgradign wdqs1010 to latest version
20:08 gehel: disabling puppet on all wdqs servers to deploy T213234
19:58 dcausse: Morning SWAT done
19:37 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean-up: Explain why WBMI wikis don't need wmgWikibaseRepoEntityNamespaces set (duration: 00m 46s)
19:32 XioNoX: re-deactivate BGP to Zayo on cr1-eqiad - T212791
19:29 dcausse@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/includes/WelcomeSurvey.php: Welcome survey: ignore check confirmed email (duration: 00m 45s)
19:28 XioNoX: re-activate BGP to Zayo on cr1-eqiad - T212791
19:19 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1081 with low load (duration: 00m 47s)
19:09 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T204016: Remove old ArticleCreationWorkflows config (duration: 00m 46s)
18:48 jynus: stop upgrade and restart db1081
18:45 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1081 (duration: 00m 46s)
18:18 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@f71131e]: Category script and GUI updates, blazegraph launcher updates and moved RWStore from scap to puppet (duration: 10m 56s)
18:07 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@f71131e]: Category script and GUI updates, blazegraph launcher updates and moved RWStore from scap to puppet
17:25 addshore: deploy slot done
17:22 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on (duration: 00m 46s)
17:13 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 (duration: 00m 46s)
17:11 addshore@deploy1001: sync-file aborted: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 (duration: 00m 01s)
17:09 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: T201831 T201838 Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter PT 2/2 (duration: 00m 45s)
17:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter PT 1/2 (duration: 00m 47s)
16:56 ejegg: re-enabled fundraising scheduled jobs
16:43 mobrovac@deploy1001: scap-helm -h finished
16:43 mobrovac@deploy1001: scap-helm -h cluster codfw completed
16:43 mobrovac@deploy1001: scap-helm -h cluster eqiad completed
16:43 mobrovac@deploy1001: scap-helm -h [namespace: -h, clusters: eqiad,codfw]
16:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T85757 (duration: 00m 45s)
16:02 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db1105:3311 T85757 (duration: 00m 46s)
15:57 akosiaris@deploy1001: scap-helm zotero finished
15:57 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
15:57 akosiaris@deploy1001: scap-helm zotero [namespace: zotero, clusters: eqiad]
15:45 anomie: Running cleanupUsersWithNoIds.php on labswiki and labtestwiki, apparently they were left out when that was done for all other wikis (and so caused issues with the migrateActors.php run).
15:44 fsero: downscaling old zotero-production-645dccfb64 replicaset on eqiad
15:33 vgutierrez: rolling restart of cp1076-cp1090 to upgrade to kernel 4.9.144 - T203194
15:17 ejegg: disabled fundraising scheduled jobs
15:16 marostegui: Deploy schema change on db1105:3311 - T85757
15:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db1105:3311 T85757 (duration: 00m 46s)
15:08 volans: testing switchdc cookbooks in DRY-RUN mode w/ latest spicerack T205884 (no real changes expected)
15:04 akosiaris: upgrade zotero pods to 2019-01-14-115905-candidate in eqiad T213693
15:04 akosiaris@deploy1001: scap-helm zotero finished
15:04 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
15:04 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
15:02 moritzm: imported debdeploy 0.0.99.6-1+deb10u1 for buster-wikimedia (T213527)
15:02 vgutierrez: upgrading kernel in cp1075 to 4.1.144-1 - T203194
15:00 moritzm: ran systemctl reset-failed on relforge1001
14:57 marostegui: Drop table tag_summary from s6 - T212255
14:52 akosiaris: upgrade zotero pods to 2019-01-14-115905-candidate in codfw T213693
14:51 akosiaris@deploy1001: scap-helm zotero finished
14:51 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
14:51 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
14:42 anomie@mwmaint1002: Running migrateActors.php on wikitech for T188327. This may cause lag in codfw.
14:42 anomie@mwmaint1002: Running migrateActors.php on section 8 wikis for T188327. This may cause lag in codfw.
14:42 anomie@mwmaint1002: Running migrateActors.php on section 7 wikis for T188327. This may cause lag in codfw.
14:42 anomie@mwmaint1002: Running migrateActors.php on section 6 wikis for T188327. This may cause lag in codfw.
14:42 anomie@mwmaint1002: Running migrateActors.php on section 5 wikis for T188327. This may cause lag in codfw.
14:42 anomie@mwmaint1002: Running migrateActors.php on section 4 wikis for T188327. This may cause lag in codfw.
14:42 anomie@mwmaint1002: Running migrateActors.php on section 2 wikis for T188327. This may cause lag in codfw.
14:41 anomie@mwmaint1002: Running migrateActors.php on section 1 wikis for T188327. This may cause lag in codfw.
14:41 anomie@mwmaint1002: Running migrateActors.php on remaining section 3 wikis for T188327. This may cause lag in codfw.
14:39 volans: updated python3-phabricator on cumin[12]001 T205884
14:36 volans: uploaded python{,3}-phabricator 0.7.0-2~wmf1 to apt.w.o T205884 (upstream removes egg files)
14:18 dcausse: elasticsearch (search cluster): pre-populating omega & psi clusters in eqiad & codfw (from mwmaint1002 and mwmaint2001 respectively) (T210381)
14:13 akosiaris@deploy1001: scap-helm zotero finished
14:13 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
14:13 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
14:11 akosiaris@deploy1001: scap-helm zotero upgrade production --debug -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
14:10 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
14:04 marostegui: Add pc1007 to tendril and zarcillo - T208383
13:51 akosiaris@deploy1001: scap-helm zotero finished
13:51 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
13:51 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
13:49 Jeff_Green: authdns update for T210445
13:48 dcausse: creating testcommonswiki index in the omega search-elastic cluster (eqiad & codfw)
13:42 akosiaris@deploy1001: scap-helm zotero finished
13:42 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
13:42 akosiaris@deploy1001: scap-helm zotero upgrade production --dry-run --debug -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
13:41 akosiaris: rollback zotero codfw deployment
13:37 akosiaris@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
13:37 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
13:10 jijiki: Restarted npre on proton1002
13:03 zeljkof: eu swat finished
13:03 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add http://mbc.cyfrowemazowsze.pl to $wgCopyUploadsDomains (T212469) (duration: 00m 46s)
12:56 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Localisation of Babel categories on nap.wikipedia.org (T123188) (duration: 00m 44s)
12:48 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgImportSources for ne.wiktionary (T213023) (duration: 00m 45s)
12:44 zfilipin@deploy1001: sync-file aborted: SWAT: Configure $wgNamespaceAliases for yue.wiktionary (T212678) (duration: 00m 01s)
12:37 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgNamespaceAliases for yue.wiktionary (T212678) (duration: 00m 45s)
12:27 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Configure $wgAddGroups, $wgRemoveGroups and $wgImportSources for ur.wiki (T212612) (duration: 00m 46s)
12:19 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add suppressredirect user right to patroller user group at zh.wikivoyage (T212272) (duration: 00m 46s)
12:10 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create Portal namespace on shn.wikipedia (T212992) (duration: 00m 46s)
12:05 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for Berklee College of Music library (T213311) (duration: 00m 52s)
11:20 volans: installed spicerack 0.0.13 on cumin1001 - T205884
10:39 moritzm: start installing systemd security updates for stretch
10:13 volans: installed spicerack 0.0.13 on cumin2001 for final testing - T205884
10:11 volans: uploaded spicerack_0.0.13-1_amd64.deb to apt.wikimedia.org stretch-wikimedia T205884
10:07 moritzm: install tmpreaper security updates on remaining hosts
09:51 marostegui: Running aria_chk for all myisam tables on dbstore1002 T213670
09:37 marostegui: Running aria_chk for all linter tables on dbstore1002 - T213670
08:44 marostegui: Stop mysql on dbstore1002 - T213670
08:38 marostegui: Stop MySQL on pc2010 to clone pc1007 - T208383
07:48 elukey: executed bmc-device --debug --cold-reset on dbstore1002 - "No more sessions available" for mgmt

2019-01-13

16:33 hoo: Updated operations/dumps/dcat (559dee37452..a86285f4e7) on snapshot1008

2019-01-12

21:46 akosiaris: restart all zotero pods in eqiad
16:12 moritzm: rebooting mw2167 for a test
02:16 legoktm@deploy1001: Synchronized docroot/mediawiki.org/keys: Add Mukunda's new subkey that was used for the 1.32 release - T213521 (duration: 00m 47s)

2019-01-11

21:56 jforrester@deploy1001: Finished scap: Full scap sync to update wmf.12 i18n for the weekend Idf2a67860f (duration: 19m 12s)
21:37 jforrester@deploy1001: Started scap: Full scap sync to update wmf.12 i18n for the weekend Idf2a67860f
18:43 legoktm@deploy1001: Synchronized wmf-config/CommonSettings.php: Update ExtensionDistributor for 1.32 release - https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/483735 (duration: 00m 46s)
18:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2060 T210713 (duration: 00m 46s)
17:10 marostegui: Deploy schema change on db2060 - T210713
16:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2060 T210713 (duration: 00m 46s)
16:53 marostegui: Defragment change_tag table on db2060 - T210713
14:37 jynus: upgrade and restart db2091 (s2, s4)
14:12 jynus: updating mariadb client packages on cumin* hosts
11:36 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1018 fully (duration: 00m 46s)
11:21 jynus: stop, upgrade and reboot es2017
11:04 jynus: stop, upgrade and reboot es2016
10:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: repool es1018 with low load (duration: 00m 46s)
10:31 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: repool es2013 (duration: 00m 45s)
10:30 jynus: upgrade and restart es1018
09:58 jynus: upgrade and reboot es2013
09:53 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2013 (duration: 00m 45s)
09:49 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2013 (duration: 00m 47s)
09:32 jynus: reset iLo on db2053
08:49 moritzm: installing tmpreaper security updates
02:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Ib87407165382 (duration: 00m 46s)
01:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T211993 Enable GrowthExperiments help panel for 50% of new users on cswiki and kowiki (duration: 00m 46s)
01:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT T211993 Enable GrowthExperiments help panel on cswiki and kowiki (duration: 00m 45s)
01:03 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/WikimediaEvents/includes/PageViews.php: SWAT: T213186 GrowthExperiments: Support templates for help desk title (duration: 00m 46s)
00:50 XioNoX: bump prefix limit for AS6939 in eqsin
00:18 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: T213453: Use slot in onEditFilterMergedContent and newVariableHolderForEdit in AbuseFilter (duration: 00m 47s)
00:12 James_F: 482373 is live on mwdebug1002 for extensive checks.
00:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Help panel: Set help desk page correctly on kowiki Ia94cfc571 (duration: 00m 46s)

2019-01-10

23:45 Krinkle: krinkle@tungsten: upgrade xhgui to include upstream f039fb9f99f - T213218
23:45 Krinkle: upgraded xhgui to upstream 2965240c91e52 (current upstream master) - T213218
23:36 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: T213497 [Commons, TestCommons] Don't use Wikibase entity search (duration: 00m 46s)
22:57 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditFilterHookRunner.php: T213453: Pass slotrole into EditFilterMergedContent hook in Wikibase repo (duration: 00m 47s)
20:47 marxarelli: both mediawiki error rates and 500 response rates have subsided back to pre-deploy levels
20:19 marxarelli: seeing increase in "60 second timed out" error rate and rise in 503 rate, as was the case with group1 deployment. continuing to monitor
20:11 gehel: restart blazegraph on wdqs1009 to validate new config
20:02 tgr@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/WikimediaEvents/modules/ve-wme/campaigns.js: SWAT: Remove unnecessary addPlugin wrapper (T213338) (duration: 00m 53s)
19:50 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove AICaptcha settings (T186244) (duration: 00m 52s)
19:47 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Whitelist *.*.archive.org in wgCopyUploadsDomains (T207581) (duration: 00m 53s)
19:41 tgr: ran mwscript namespaceDupes.php bnwikibooks --fix (238 links fixed)
19:41 volans: installed spicerack 0.0.12-1 on cumin2001 T205884
19:39 volans: uploaded spicerack_0.0.12-1_amd64.deb to apt.wikimedia.org stretch-wikimedia T205884
19:39 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Note that namespaceDupes.php maintenance script run will be needed after the deployment. (T203534) (duration: 00m 53s)
19:14 marostegui: Deploy schema change on dbstore1001 - T85757
19:13 marostegui: Deploy schema change on dbstore1002 - T85757
18:57 tzatziki: deleting three files for legal compliance
18:52 anomie@mwmaint1002: Running migrateActors.php on test wikis and mediawikiwiki for T188327. This may cause lag in codfw.
18:47 marostegui: Deploy schema change on s1 codfw master (db2048) with replication, this will generate lag on s1 codfw - T85757
18:46 marostegui: Stop replication on s1 codfw master for a schema change - T85757
18:37 marostegui: Stop replication on s8 codfw master for a schema change - T85757
18:30 marostegui: Upgrade mysql and kernel on db2060
18:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2053, db2060 for kernel and mysql upgrade (duration: 00m 51s)
18:13 marostegui: Stop MySQL on db2046 for kernel upgrade
18:12 marostegui: The above change was db2053 and not db2060
18:11 marostegui: Stop MySQL on db2053 and db2060 for mysql and kernel upgrade
18:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2053, db2060 for kernel and mysql upgrade (duration: 00m 53s)
17:50 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: repool es2015 (duration: 00m 53s)
17:49 marostegui: Deploy schema change on db2053 - T210713
17:33 marostegui: Deploy schema change on db2046 - T210713
16:59 jynus: stop and upgrade es2015
16:52 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2015 (duration: 00m 52s)
16:41 onimisionipe: data transfer from wdqs1004 -> wdqs1006 completed! - T213361
16:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T159708 Enable Structured Data on Commons, captions-only (duration: 00m 53s)
16:17 James_F: T180981 Placed patch to enable WBMI on Commons on mwdebug1002
16:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T180981 Add Commons to wikis with WikibaseMediaInfo installed (duration: 00m 52s)
16:11 jforrester@deploy1001: Synchronized dblists/wikidatarepo.dblist: T180981 Add Commons to wikis with WikibaseRepo installed (duration: 00m 54s)
16:04 James_F: T180981 Placed patch to install but not enable WBMI on Commons on mwdebug1002
15:56 marostegui: Deploy schema change on db1068 (s4 master) - T86338
15:31 fsero: rollbacking last zotero codfw deployment
15:27 marostegui: Deploy schema change on db1067 (s1 master) - T86338 T202167
15:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 T86338 T202167 (duration: 00m 49s)
15:24 addshore: T208330, MariaDB [testcommonswiki]> TRUNCATE TABLE wb_terms; # Was https://phabricator.wikimedia.org/P7973
15:22 fsero@deploy1001: scap-helm zotero upgrade production -f /srv/scap-helm/zotero/zotero-values-codfw.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: codfw]
15:21 fsero@deploy1001: scap-helm zotero upgrade -f /srv/scap-helm/zotero/zotero-values-codfw.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: codfw]
15:20 addshore@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/includes/Content: T208330 dont write to wb_terms for mediainfo (duration: 00m 54s)
15:12 addshore@deploy1001: Synchronized php-1.33.0-wmf.9/extensions/Wikibase/repo/includes/Content: T208330 dont write to wb_terms for mediainfo (duration: 00m 55s)
14:59 marostegui: Deploy schema change on db1080 - T86338 T202167
14:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 T86338 T202167 (duration: 00m 52s)
14:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1114 T86338 T202167 (duration: 00m 52s)
14:42 fsero@deploy1001: scap-helm zotero finished
14:42 fsero@deploy1001: scap-helm zotero cluster staging completed
14:42 fsero@deploy1001: scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: staging]
14:36 fsero@deploy1001: scap-helm zotero finished
14:36 fsero@deploy1001: scap-helm zotero cluster staging completed
14:36 fsero@deploy1001: scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml /srv/deployment-charts/charts/zotero-0.0.1.tgz [namespace: zotero, clusters: staging]
14:35 fsero@deploy1001: scap-helm zotero upgrade staging -f /srv/scap-helm/zotero/zotero-values-staging.yaml [namespace: zotero, clusters: staging]
14:33 fsero@deploy1001: scap-helm -h finished
14:33 fsero@deploy1001: scap-helm -h cluster staging completed
14:33 fsero@deploy1001: scap-helm -h [namespace: -h, clusters: staging]
14:33 marostegui: Deploy schema change on db1114 - T86338 T202167
14:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1114 T86338 T202167 (duration: 00m 53s)
14:14 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1019 (duration: 00m 53s)
13:51 arturo: T212302 icinga downtime for 2h cloudvirt[1013,1024,1026-1030].eqiad.wmnet bc wrong puppet code
13:24 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: depool es1018 (duration: 00m 52s)
13:10 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool es2012 (duration: 00m 52s)
13:01 zeljkof: EU SWAT finished
13:01 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Remove main page special casing from ruwikibooks and ruwikiquote (T212849) (duration: 00m 52s)
12:58 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Remove main page special casing from eswiki (T212849) (duration: 00m 53s)
12:53 zfilipin@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: Turn off main page special casing for svwiki (T213018) (duration: 00m 52s)
12:46 zfilipin@deploy1001: Synchronized dblists/flow.dblist: SWAT: Disable unused Flow extension on ur.wikibooks (T207627) (duration: 00m 55s)
12:42 onimisionipe: starting data transfer from wdqs1004 -> wdqs1006 - T213361
12:34 onimisionipe: starting data transfer from wdqs1003 -> wdqs1006 - T213361 - aborted (nodes are in different cluster)
12:28 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Re-enable QuickSurveys extension on enwiki (T209882) (duration: 00m 52s)
12:20 jynus: stop and upgrade es2012
12:12 zfilipin@deploy1001: Synchronized dblists/flow.dblist: SWAT: Reverted "Revert "Disable unused Flow extension on de.wikiversity"" (T207626) (duration: 00m 53s)
12:01 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool es2012 (duration: 00m 52s)
11:54 onimisionipe: starting data transfer from wdqs1003 -> wdqs1006 - T213361
10:59 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209857 Increase CPU benchmark sampling rate (duration: 00m 53s)
10:58 fsero: uploaded docker-registry_2.7.0~rc0~wmf1-1 debian package to reprepro for stretch-wikimedia (done yesterday at 17:21 UTC forgot about the log)
10:26 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209857 Run CPU benchmark for a portion of navtiming pageloads (duration: 00m 52s)
10:10 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T209857 Run CPU benchmark for a portion of navtiming pageloads (duration: 00m 53s)
09:52 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T187299 Decrease ruwiki navtiming rate (duration: 00m 52s)
09:45 gilles@deploy1001: Synchronized tests/InitialiseSettingsTest.php: T211395 T211529 tests: Assert that extra namespaces have correspondent talk namespaces (duration: 00m 56s)
09:34 moritzm: updated thirdparty/php72 component for stretch-wikimedia to 7.2.13
01:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make GrowthExperiments config wmf.12-proof (duration: 00m 52s)
01:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert latest config patch (caused fatal errors on kowiki) (duration: 00m 52s)
00:58 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure help desk page for help panel correctly on kowiki (T213186) (duration: 00m 53s)
00:56 cstone: updated fundraising tools from 5f44d9dd43 to da82ed111d
00:34 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/includes/MovePage.php: Fix missing ATOMIC_CANCELABLE in MovePage::move() (T213168) (duration: 00m 53s)
00:20 catrope@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/: Help panel fixes (T212973, T212890, T213186) (duration: 00m 54s)
00:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EventLogging for GrowthExperiments help panel (T211991) (duration: 00m 54s)

2019-01-09

23:51 mutante: thumb1004 - still needs broken RAM replaced, expired downtime, re-ACKed (T207721)
23:39 mutante: mw2151 - change netbox status from active to staged - it's not actually active, it's role(spare) and was jessie (T192457)
23:34 mutante: reinstalling mw2151.codfw.wmnet because it was the very last mw* host on jessie
21:20 bblack: multatuli (ns2) - upgrade gdnsd to 9949 beta release
21:04 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@bfa9241]: Increase concurrency for categoryMembershipJob T192691 (duration: 00m 45s)
21:04 James_F: Creating Wikibase repo tables on Commons for T68108
21:03 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@bfa9241]: Increase concurrency for categoryMembershipJob T192691
21:00 James_F: Running rebuildall on TestCommons
20:53 bblack: authdns1001 (ns0) - upgrade gdnsd to 9949 beta release
20:45 James_F: Created Wikibase repo tables on TestCommons
20:11 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.12 (duration: 00m 53s)
20:10 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.12
19:28 crusnov@deploy1001: Finished deploy [netbox/deploy@7fe39e1]: Deploy Django security upgrade (duration: 04m 33s)
19:23 crusnov@deploy1001: Started deploy [netbox/deploy@7fe39e1]: Deploy Django security upgrade
19:01 ejegg: updated standalone SmashPig deploy from 25713ca232 to 78b92b7fef
18:43 bblack: authdns2001 (ns1) - upgrade gdnsd to 9949 beta release
18:26 XioNoX: add bgp sessions to AS31800 on cr1-eqsin
18:19 marostegui: Rename table tag_summary on enwiki on db1089 - T212255
18:18 XioNoX: add bgp sessions to AS38895 on cr1-eqsin
18:04 marostegui: Drop valid_tag from s3 master (db1075) - T212254
17:39 tarrow: That last one was SWAT: T209504 Increase PHP constraint check entities to 150
17:36 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 00m 53s)
17:28 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1011 - T86338
17:18 James_F: Ran `namespaceDupes.php --wiki=bewikibooks` on mwmaint1002, no change
17:16 bblack: uploaded gdnsd-2.99.9949-beta-1+wmf1 to reprepro for stretch-wikimedia
17:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1083 T86338 T202167 (duration: 00m 52s)
16:29 marostegui: Deploy schema change on db1083 - T86338 T202167
16:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 T86338 T202167 (duration: 00m 53s)
16:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 with full weight (duration: 00m 53s)
16:11 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/RepoHooks.php: T213227 RepoHooks::onApiCheckCanExecute: Only fail if the edit is for our entity's slot (duration: 00m 54s)
15:50 marostegui: Drop valid_tag tables from db1095 (s3) - T212254
15:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1106 T86338 T202167 (duration: 00m 51s)
15:23 jijiki: restarting scb* pdfrender
15:10 marostegui: Deploy schema change on db1106 (sanitarium s1 master) with replication, lag will be generated on s1 labs - T86338 T202167
15:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1106 T86338 T202167 (duration: 00m 52s)
14:39 elukey: restart Hadoop HDFS namenodes on an-master100[1,2] to complete decom of analytics1028->41
14:36 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1077 T212254 (duration: 00m 53s)
14:36 volans@deploy1001: Finished deploy [debmonitor/deploy@0f096de]: Deploy Django security upgrade (duration: 01m 50s)
14:34 volans@deploy1001: Started deploy [debmonitor/deploy@0f096de]: Deploy Django security upgrade
14:28 marostegui: valid_tag table on db1077 with replication (lag will be generated on labs s3) - T212254
14:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1077 T212254 (duration: 00m 52s)
13:32 urandom: forcing removal of restbase1016-c (host down way too long to salvage) -- T212418
13:29 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1082 with low weight (duration: 00m 52s)
13:26 zeljkof: EU SWAT finished
13:22 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.9/: SWAT: Fix order of arguments in ChangeTags::getPrevTags ([T212703]) (duration: 05m 50s)
13:08 zfilipin@deploy1001: Synchronized php-1.33.0-wmf.12/: SWAT: Fix order of arguments in ChangeTags::getPrevTags ([T212703]) (duration: 06m 54s)
13:00 zeljkof: extending eu swat for 5-10 minutes
12:51 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable signature button in toolbar for the "Arbitration" namespace in ruwiki (T213049) (duration: 00m 52s)
12:44 moritzm: installing OpenSSL 1.0.2 security updates for stretch
12:40 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable reader trust survey (T209882) (duration: 01m 07s)
12:02 gehel: repool wdqs100[78] - data import complete - T213210
11:55 jynus: enabling gtid on db1124:s5
11:54 jynus: enabling gtid on db1082
11:23 jynus: stopping db1082 and db2052 s5 replication in sync to migrate db1124:s5 master
10:30 moritzm: fixed package installation status on db2062
10:01 volans: upgraded spicerack to 0.0.11 on cumin2001 T205884
10:00 volans: uploaded spicerack_0.0.11 to apt.wikimedia.org stretch-wikimedia T205884
09:44 hashar: Some CI npm jobs get broken due to a faulty node module. https://phabricator.wikimedia.org/T213249
09:38 banyek: repooling labdsb1010 - T210693
09:26 banyek: dropping materialized views on labdb1010 - T210693
09:26 banyek: depooled labsdb1010
08:28 moritzm: installing openssl security updates for on stretch-based DB servers
07:55 moritzm: installing libseccomp updates from stretch point release
07:43 hashar: contint1001: restarted Zuul to take in account SMTP configuration | https://gerrit.wikimedia.org/r/376739 | T93414
06:03 kartik@deploy1001: Finished deploy [cxserver/deploy@1098942]: Update cxserver to 656c468 (duration: 04m 08s)
05:59 kartik@deploy1001: Started deploy [cxserver/deploy@1098942]: Update cxserver to 656c468
01:15 jforrester@deploy1001: Synchronized php-1.33.0-wmf.12/extensions/Wikibase/repo/RepoHooks.php: T213227 Don't have onApiCheckCanExecute die for inactive entity types (duration: 00m 53s)
01:04 jforrester@deploy1001: Synchronized docroot/: T187716 Remove mobilelanding.php, no longer pointed to by Apache (duration: 00m 52s)
00:58 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [Wikimania] Add 2019 content to default search (duration: 00m 53s)
00:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T202683 [Wikimania] Create year namespaces for each Wikimania, 2005–2019 (duration: 00m 53s)
00:34 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Make password policy and logging code saner (duration: 00m 52s)
00:33 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Make password policy and logging code saner (duration: 00m 55s)

2019-01-08

23:44 SMalyshev: repooled wdqs1004
23:35 eileen: process-control config revision is 9dc6e63fcd
23:00 XioNoX: Update pfw3-codfw/eqiad security policies - T213100
22:39 XioNoX: deactivate policy-statement BGP_fundraising_aggregates term nat on pfw3-eqiad/codfw - T211028
22:29 gehel: starting data copy from wdqs1007 to wdqs1008 (both will be depooled) - T213217
22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Add default search NSes (duration: 00m 51s)
22:22 James_F: Ran /docroot/noc/createTxtFileSymlinks.sh for new dblist
22:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use new wikidatarepo dblist where appropriate (duration: 00m 52s)
22:20 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: dblists: Load wikibaserepo (duration: 00m 52s)
22:15 jforrester@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
22:14 jforrester@deploy1001: Synchronized dblists/wikidata.dblist: dblists: Remove testcommons from wikidata list (duration: 00m 52s)
22:13 jforrester@deploy1001: Synchronized dblists/wikidatarepo.dblist: dblists: Add wikidatarepo list (duration: 00m 53s)
22:12 urandom: forcing removal of restbase1016-b (host down way too long to salvage) -- T212418
22:08 marostegui: Drop valid_tag table from db2043 with replication (s3 codfw master - lag will be generated) - T212254
22:03 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: cleanup - Idfa129a65a41 (duration: 00m 53s)
21:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 T212254 (duration: 00m 52s)
21:49 marostegui: Drop valid_tag table from db1078 (s3) - T212254
21:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 T212254 (duration: 00m 53s)
21:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1123 T212254 (duration: 00m 53s)
21:38 marostegui: Drop valid_tag table from db1123 (s3) - T212254
21:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1123 T212254 (duration: 00m 53s)
21:31 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.12
21:03 dduvall@deploy1001: Finished scap: testwiki to php-1.33.0-wmf.12 and rebuild l10n cache (duration: 39m 22s)
20:42 ejegg: updated payments-wiki from b8acb95a2a to c455bbc6bb
20:24 dduvall@deploy1001: Started scap: testwiki to php-1.33.0-wmf.12 and rebuild l10n cache
20:24 gehel: starting data copy from wdqs1004 to wdqs1007 (both will be depooled) - T213217
20:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Don't enable entities, we're not Wikidata.org (duration: 01m 44s)
20:11 XioNoX: change BGP_fundraising_aggregates term nat from static to aggregate on pfw3-eqiad - T211028
19:51 ejegg: updated fundraising CiviCRM from b8e3a71845 to 5580f0b11c
19:48 krinkle@deploy1001: Finished deploy [performance/navtiming@68fd54d]: (no justification provided) (duration: 00m 05s)
19:48 krinkle@deploy1001: Started deploy [performance/navtiming@68fd54d]: (no justification provided)
19:48 dduvall@deploy1001: Pruned MediaWiki: 1.33.0-wmf.12 (duration: 06m 26s)
19:11 arlolra: Updated Parsoid to 2c5dc7b (T197616, T205491, T209772, T199926, T209194, T204622)
19:06 marostegui: Drop valid_tag table from s1 - T212254
19:00 arlolra@deploy1001: Finished deploy [parsoid/deploy@4b82683]: Updating Parsoid to 2c5dc7b (duration: 10m 40s)
18:54 XioNoX: make pfw3-codfw source NAT similar to pfw3-eqiad - T211028
18:54 ejegg: updated SmashPig standalone install from fb3268897b to 25713ca232
18:50 marostegui: Drop valid_tag table from s4 - T212254
18:50 XioNoX: add NAT workaround to pfw3-eqiad - T211028
18:49 arlolra@deploy1001: Started deploy [parsoid/deploy@4b82683]: Updating Parsoid to 2c5dc7b
18:38 XioNoX: temporarily permit ssh from frpm1001 to pfw3-eqiad on pfw3-eqiad
18:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T86338 T202167 (duration: 00m 45s)
18:27 jynus: restarting s5 replication on labsdb1009/10/11
17:41 moritzm: installing libseccomp updates from stretch point release
17:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource, take #2 (duration: 02m 29s)
17:38 mobrovac@deploy1001: Started deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource, take #2
17:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616 (duration: 96m 50s)
17:33 _joe_: applying the new apache configuration to jobrunners in eqiad
17:24 elukey: roll restart of aqs on aqs100* to pick up new Druid settings
17:20 _joe_: depooling mw1299 for testing of the apache change
17:16 SMalyshev: restarted Blazegraph wdqs1006 due to unresponsiveness (caused by load?)
16:56 urandom: forcing removal of restbase1016-a (host down way too long to salvage) -- T212418
16:56 jynus: changing db1124:s5 replication to db2066
16:55 marostegui: Deploy schema change on db1105:3311 T86338 T202167
16:54 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 T86338 T202167 (duration: 00m 44s)
16:54 jynus: stopping s5 replication on labsdb1009/10/11 to prevent undoable mistakes
16:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool es2019 - T212833 (duration: 02m 51s)
16:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 T86338 T202167 (duration: 00m 45s)
16:12 XioNoX: add BGP sessions to AS64050 in AMS-IX
16:04 marostegui: Drop valid_tag table from s7 - T212254
16:00 mobrovac@deploy1001: Started deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616
15:59 marostegui: Deploy schema change on db1089 T86338 T202167
15:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 T86338 T202167 (duration: 00m 45s)
15:45 marostegui: Drop valid_tag table from s2 - T212254
15:32 marostegui: Stop MySQL on es2019 for upgrade - T212833
15:23 godog: briefly stop carbon daemons on graphite1004 to move /srv/whisper -> /srv/carbon/whisper
15:17 marostegui: Increase connections from 10 to 50 for recommendationapiservice on m2 - T212154
15:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool es2019 - T212833 (duration: 00m 44s)
15:04 hashar: Restarted CI Jenkins
13:02 zeljkof: EU SWAT finished
12:59 jynus: transfering db1102:s5 mariadb datadir to db1082
12:57 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give all users (including IPs) the pagequality right in plwikisource (T212478) (duration: 00m 45s)
12:45 akosiaris@deploy1001: scap-helm zotero finished
12:45 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
12:45 akosiaris@deploy1001: scap-helm zotero install --name production2 -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
12:44 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Allow ptwikis bureaucrats to grant/revoke rollbacker user group (T212735) (duration: 00m 45s)
12:39 akosiaris@deploy1001: scap-helm zotero upgrade production2 -f zoterov2-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
12:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use localized wgMetaNamespace and wgMetaNamespaceTalk in satwiki (T211294) (duration: 00m 45s)
12:23 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for students writing Wikipedia program (T212226) (duration: 00m 44s)
12:14 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: New throttle rule for University of Southern California editathon (T212917) (duration: 00m 45s)
12:07 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T212768 [cirrus] re-enable HHVM connection pooling (duration: 00m 45s)
12:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@503b29c] (dev-cluster): Add test-commons and nap.wikisource (duration: 12m 38s)
11:49 mobrovac@deploy1001: Started deploy [restbase/deploy@503b29c] (dev-cluster): Add test-commons and nap.wikisource
11:46 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Increase time out on the MW side to 60s - T204183 (duration: 00m 51s)
11:36 akosiaris@deploy1001: scap-helm zotero finished
11:36 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
11:36 akosiaris@deploy1001: scap-helm zotero upgrade production -f zoterov2-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
11:35 akosiaris@deploy1001: scap-helm zotero finished
11:35 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
11:35 akosiaris@deploy1001: scap-helm zotero upgrade production -f zoterov2-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
11:33 mobrovac@deploy1001: Started restart [electron-render/deploy@94d27d7]: Electron strugling, restart - T213154
11:29 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw
11:24 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=zotero,name=codfw
11:07 jynus: stoping and restarting db1102 (s5, s4) for upgrade
11:04 moritzm: rebooting mw1261
10:48 moritzm: installing libseccomp updates from stretch point release
10:34 dcausse: elastic@eqiad setting crosscluster conf on production search cluster (T213150)
10:25 banyek: executing schema change on db1062 - T85757
09:39 foks: reset user email for Zergiorubio
09:26 akosiaris@deploy1001: scap-helm zotero finished
09:26 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
09:26 akosiaris@deploy1001: scap-helm zotero install --name production2 -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
09:22 jynus: stop replication on db1124:s5 T213108
09:21 akosiaris@deploy1001: scap-helm zotero finished
09:21 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
09:21 akosiaris@deploy1001: scap-helm zotero install --name production2 -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
09:19 hashar: gerrit: resaved configuration for All-Projects by changing "Max Reviewers" from 3 to 4. Might enable adding reviewers automatically based on git blame. See task for config diff # T101131
09:12 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@f91cf04]: Increase the concurrency of categoryMembershipJob - T192691 (duration: 00m 59s)
09:12 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@f91cf04]: Increase the concurrency of categoryMembershipJob - T192691
05:39 SMalyshev: restarted some Blazegraph servers as precaution against corruption issues
04:26 onimisionipe: depooling wdqs1008 - T213134
03:23 kartik@deploy1001: Finished deploy [cxserver/deploy@b669f95]: Update cxserver to d6b1d6f (duration: 05m 00s)
03:18 kartik@deploy1001: Started deploy [cxserver/deploy@b669f95]: Update cxserver to d6b1d6f
00:22 gehel: restarting tilerator on all maps servers
00:06 gehel: depooling wdqs1007 (something looks like DB corruption)

2019-01-07

23:56 eileen: update civicrm revision changed from bcb4b7a7d1 to b8e3a71845, config revision is 260be32d0a
22:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Re-enable uploading of files, accidentally prevented (duration: 00m 44s)
21:19 XioNoX: push NAT changes to pfw3-eqiad - T211028
21:16 awight@deploy1001: Finished deploy [ores/deploy@9253beb]: T212530: new ORES models; revscoring 2.3.0 (duration: 15m 28s)
21:13 mforns@deploy1001: Finished deploy [analytics/refinery@faac592]: deploying analytics/refinery to account with refinery-source v0.0.83 (duration: 06m 52s)
21:06 mforns@deploy1001: Started deploy [analytics/refinery@faac592]: deploying analytics/refinery to account with refinery-source v0.0.83
21:00 awight@deploy1001: Started deploy [ores/deploy@9253beb]: T212530: new ORES models; revscoring 2.3.0
20:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: TestCommons: Final go-switch for WBMI Ie52b8af006ba (duration: 00m 45s)
19:52 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove redundant namespace talk definitions (T206952) (duration: 00m 44s)
19:46 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set $wgMetaNamespace for bewikibooks (T212665) (duration: 00m 45s)
19:43 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikibaseRepo and WikibaseMediaInfo on testcommonswiki (duration: 00m 44s)
19:42 XioNoX: push firewall change to pfw3-codfw/eqiad - T211712
19:40 catrope@deploy1001: Synchronized wmf-config/Wikibase.php: Set empty clientDbList for testcommonswiki (duration: 00m 44s)
19:38 catrope@deploy1001: Synchronized dblists/wikidata.dblist: Enable Wikidata on testcommonswiki (duration: 00m 44s)
19:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add importupload to sysops on testcommons (duration: 00m 45s)
19:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on viwikisource (T212929) (duration: 00m 45s)
19:13 catrope@deploy1001: Synchronized dblists/flow.dblist: Enable Flow on viwikisource (T212929) (duration: 00m 45s)
19:11 RoanKattouw: Ran emptyUserGroup.php for autoreview, reviewer and editor groups on srwikinews (T212058)
18:51 XioNoX: re-deactivate bgp sessions to Zayo on cr1-eqiad - T212791
18:20 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@d8f911c]: new GUI, Updater & Blazegraph build (duration: 10m 13s)
18:18 XioNoX: activate bgp sessions to Zayo on cr1-eqiad - T212791
18:10 jynus: manually creating tables on es1015, es1017 with replication for testcommonswiki
18:10 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d8f911c]: new GUI, Updater & Blazegraph build
18:07 onimisionipe@deploy1001: deploy aborted: (no justification provided) (duration: 00m 04s)
18:06 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@d8f911c]: (no justification provided)
18:05 XioNoX: deactivate bgp sessions to Zayo on cr1-eqiad T212791
17:35 akosiaris: restart pdfrender on scb1004
17:35 akosiaris: restart pdfrender
17:23 kartik@deploy1001: Finished deploy [cxserver/deploy@594420b]: Update cxserver to 7632c43 (duration: 04m 06s)
17:19 kartik@deploy1001: Started deploy [cxserver/deploy@594420b]: Update cxserver to 7632c43
16:24 jynus: shutting down mariadb again and rebooting db1107
16:15 jynus: starting mariadb on db1107
16:12 onimisionipe: starting inplace reindexing for enwiki - T212224
16:07 volans: powercycle db1107
16:03 elukey: stop eventlogging mysql consumers on eventlog1002 and eventlogging replication on db1108 due to issues with db1107
16:02 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1082 (duration: 00m 45s)
15:46 cmjohnson1: replacing bad fuse on the PDU rack A2 eqiad
14:19 moritzm: added jbond to WMF-LDAP group in Phabricator (T213079)
13:56 ariel@deploy1001: Finished deploy [dumps/dumps@acd9bca]: logging and quiet mode for adds-changes and other dumps (duration: 00m 05s)
13:56 ariel@deploy1001: Started deploy [dumps/dumps@acd9bca]: logging and quiet mode for adds-changes and other dumps
13:02 zeljkof: EU SWAT finished
13:01 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cirrus: increase number of shards (T212224) (duration: 00m 44s)
12:48 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Restrict moving categories for users at srwiki (T213050) (duration: 00m 44s)
12:40 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Cleanup old throttle rules (duration: 00m 44s)
12:34 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: To lift a cap on account creation from IP for mrwiki community (T212921) (duration: 00m 43s)
12:30 Zoranzoki21: tools.zoranzoki21wiki Archived https://www.mediawiki.org/w/index.php?title=Extension:Woopra (https://www.wikidata.org/wiki/Q21679347) - T212994
12:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reader trust survey (T209882) (duration: 00m 45s)
12:21 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Quiz extension on ru.wikibooks (T212622) (duration: 00m 45s)
12:15 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add suppressredirect user right to editor user group at pl.wikisource (T212655) (duration: 00m 44s)
12:11 gtirloni: disabled notifications for cloudvirt0124 (T212360)
12:11 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable extendedmover user group at en.wiktionary (T212662) (duration: 00m 46s)
12:07 kartik@deploy1001: Finished deploy [cxserver/deploy@2d54a64]: Deploy Google Translation (T90208) (duration: 05m 07s)
12:02 kartik@deploy1001: Started deploy [cxserver/deploy@2d54a64]: Deploy Google Translation (T90208)
10:36 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1079 after schema change - T85757 (duration: 00m 44s)
10:31 filippo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move group1 to new logging infrastructure - T211124 (duration: 00m 45s)
10:30 banyek: repooling db1079 after schema change - T85757
10:27 banyek: restarting replication on db1079 - T85757
09:55 banyek: executing schema change on db1079 with replication enabled - T85757
09:53 banyek: stopping replication on db1079 - T85757
09:47 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1079 for schema change - T85757 (duration: 01m 02s)
09:36 banyek: depooling db1079 for schema change - T85757
08:30 moritzm: rolling restart of swift backend servers to pick up OpenSSL security update
07:24 elukey: restart pdfrender on scb1002

2019-01-06

14:50 ariel@deploy1001: Finished deploy [dumps/dumps@cb30b6c]: check xml files for closing mediawiki tag (duration: 00m 06s)
14:50 ariel@deploy1001: Started deploy [dumps/dumps@cb30b6c]: check xml files for closing mediawiki tag

2019-01-05

20:23 elukey: manually clean up of big logs under /var/log/.. on analytics-tool1002 due to root partition almost filled up

2019-01-04

23:07 mutante: scandium apt-get remove nodejs nodes-legacy ; puppet agent -tv - after merging gerrit:482150 this fixed "you have held broken packages" issue, now we are at a puppet dependecy cycle with apt::pin T201366
15:42 bawolff@deploy1001: Synchronized private/PrivateSettings.php: T212667 - More aggressive anti-spam measures for account creation on kowiki (duration: 00m 48s)
14:08 moritzm: rebooting etcd1001-1003 to pick up SSBD-enabled qemu
13:52 moritzm: rebooting etcd1004-1006 to pick up SSBD-enabled qemu
13:33 moritzm: rebooting kubernetes staging etcd hosts to pick up SSBD-enabled qemu
13:11 moritzm: rebooting kubernetes staging master to pick up SSBD-enabled qemu
12:57 moritzm: rebooting kubernetes staging workers for kernel security update
11:58 moritzm: installing libsndfile security updates
11:33 moritzm: installing jasper security updates
11:31 moritzm: installing libdatetime-timezone-perl updates for recent tz changes
10:47 arturo: T212898 reimaging cloudvirt1024 as stretch
10:46 moritzm: rolling restart of swift proxies to pick up OpenSSL update
09:57 jijiki: restarting thumbor services to pick up 481141
09:50 onimisionipe: restarting nginx on all wdqs hosts
09:40 banyek: executing schema change on dbstore1002 - T85757
09:13 moritzm: restarting nginx on puppetdb hosts to pick up new OpenSSL
09:03 banyek: executing schema change on db1116 - T85757
08:44 moritzm: restarting nginx on francium to pick up new OpenSSL
08:16 elukey: restart eventlogging daemons on eventlog1002 to pick up openssl updates
07:56 moritzm: installing OpenSSL security updates
00:07 mutante: an-coord1001 - apt-get clean to free disk space, reacting to Icinga alert for running out of disk

2019-01-03

23:08 volans: restarted pdfrender on scb1004
22:29 volans: restarted all slaves on dbstore1002 (relayed from banyek)
22:14 banyek: stopping all slaves on dbstore1002 (NOT labsdb)
22:14 banyek: stopping all slaves on labsdb1002
20:50 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: Fix error for testcommons (duration: 00m 44s)
20:46 reedy@deploy1001: Synchronized dblists/group0.dblist: Add testcommonswiki to group0 (duration: 00m 43s)
20:43 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Updating interwiki cache (duration: 02m 05s)
20:24 reedy@deploy1001: Synchronized wmf-config/db-codfw.php: T197616 (duration: 00m 44s)
20:23 reedy@deploy1001: Synchronized wmf-config/db-eqiad.php: T197616 (duration: 00m 44s)
20:13 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T197616 (duration: 00m 44s)
20:12 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: T197616 (duration: 00m 44s)
20:11 reedy@deploy1001: rebuilt and synchronized wikiversions files: T197616
20:09 reedy@deploy1001: Synchronized dblists/: T197616 (duration: 00m 45s)
18:51 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1182b3b]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests, part 2 (duration: 05m 27s)
18:46 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1182b3b]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests, part 2
18:37 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c470ed2]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests (duration: 04m 11s)
18:33 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c470ed2]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests
18:21 volans: restart pdfrender on scb1003
17:58 ariel@deploy1001: Finished deploy [dumps/dumps@10dc8ad]: return properly if commands failed (duration: 00m 08s)
17:58 ariel@deploy1001: Started deploy [dumps/dumps@10dc8ad]: return properly if commands failed
16:32 XioNoX: remove old 10.64.22.0/24 IPs from cloud-instance-transport1-b-eqiad - T207663
16:22 moritzm: rebooting kubernetes workers in eqiad for kernel security update
16:02 arturo: reimaging cloudvirt1013 cloudvirt1026-1028 to stretch
15:48 moritzm: restart parsoid on wtp1025 to pick up OpenSSL update for nodejs
15:43 jijiki: Enabled puppet on mw servers after merging 481796 - T197616
15:31 jijiki: Disabling puppet on mw servers to test 481796 - T197616
15:14 ejegg: updated Fundraising CiviCRM from b33dcd3c94 to bcb4b7a7d1
14:37 moritzm: rebooting kubernetes workers in codfw for kernel security update
14:37 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1101:3317 after schema change - T85757 (duration: 00m 44s)
14:32 banyek: repooling db1101:3317 after schema change - T85757
14:21 moritzm: rebooting kubernetes masters in eqiad to pick up SSBD-enabled qemu
14:14 moritzm: rebooting kubernetes mastes in codfw to pick up SSBD-enabled qemu
14:05 arturo: T209616 reimage cloudvirt1029 as debian stretch
13:43 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1101:3317 for schema change - T85757 (duration: 00m 44s)
13:41 banyek: depooling db1101:3317 for schema change - T85757
13:38 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1098:3317 after schema change - T85757 (duration: 00m 44s)
13:34 banyek: repooling db1098:3317 after schema change - T85757
13:24 kartik@deploy1001: Finished deploy [cxserver/deploy@3b2ede7]: Update cxserver to 2369a18 (duration: 04m 30s)
13:20 kartik@deploy1001: Started deploy [cxserver/deploy@3b2ede7]: Update cxserver to 2369a18
12:58 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1098:3317 for schema change - T85757 (duration: 00m 45s)
12:55 banyek: depooling db1098:3317 for schema change - T85757
12:54 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1094 after schema change - T85757 (duration: 00m 45s)
12:49 banyek: repooling db1094 after schema change - T85757
12:41 arturo: T212302 reimaging again cloudvirt1030 to test final puppet code
12:33 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1094 for schema change - T85757 (duration: 00m 46s)
12:28 banyek: depooling db1094 for schema change - T85757
12:27 moritzm: restarting tor on torrelay1001 to pick up OpenSSL security update
11:02 _joe_: manually reloading icinga to pick up changes to commands.cfg
10:55 moritzm: installing apache updates on puppetmasters
10:22 moritzm: installing ghostscript security updates on jessie
09:51 elukey: restart memcached on mc1023 to apply -R 200 - T208844
09:46 moritzm: remove imagemagick remnants from ATS hosts (obsoleted by upstream packaging change which dropped the webp plugin)
09:39 moritzm: installing nginx updates on puppetdb*
09:26 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: repool es2019 - T212833 (duration: 01m 33s)
09:18 banyek: repooling es2019 - T212833
08:46 moritzm: rolling restart of proton to pick up OpenSSL update
08:35 banyek: depooled es2019 as host was unsresponsive - T212833
08:35 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2019, host is unsresponsible - T212833 (duration: 00m 49s)
08:11 moritzm: installing OpenSSL security updates
00:21 mutante: notebook1004 - started nagios-nrpe-server one more time

2019-01-02

23:59 mutante: notebook1004 still keeps running out of memory from some user actions and that kills nagios-nrpe-server and that causes a bunch of Icinga alerts
23:39 mutante: notebook1004 - systemctl start nagios-nrpe-server
23:39 mutante: notebook1004 - systemctl status nagios-nrpe-server
20:59 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,service=parsoid,name=wtp1028.eqiad.wmnet
20:59 herron: repooling wtp1028 T212624
20:52 herron: rebooting wtp1028 — looking for POST errors T212624
20:05 Krinkle: mwmaint1002: foreachwikiindblist s5 deleteEqualMessages.php
20:04 Krinkle: mwmaint1002: foreachwikiindblist s2 deleteEqualMessages.php
18:35 volans: restarting icinga on icinga1001 T212669
16:50 XioNoX: create BGP sessions to AS3214 in AMS-IX
16:46 XioNoX: remove BGP sessions to AS42949 in AMS-IX (leaving the IX)
16:43 XioNoX: remove BGP sessions to AS6866 in AMS-IX (leaving the IX)
16:33 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1090:3317 after schema change - T85757 (duration: 00m 46s)
16:30 arturo: reimaging cloudvirt1030 with stretch, server cleanup after puppet refactoring
16:29 moritzm: restarting Superset to pick up openssl security update
16:25 moritzm: restarting Hue to pick up openssl security update
16:23 arturo: T212302 re-enable puppet in all {cloud,lab}virt* servers, all was fine
16:22 banyek: repooling db1090:3317 after schema change (T85757)
16:11 arturo: T212302 disable puppet in all {cloud,lab}virt* servers to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481194/
15:39 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1090:3317 for schema change - T85757 (duration: 00m 44s)
15:34 moritzm: installing OpenSSL security updates
15:31 banyek: depooling db1090:3317 for schema change (T85757)
15:13 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1086 after schema change - T85757 (duration: 00m 44s)
15:07 banyek: repooling db1086 after schema change (T85757)
14:49 banyek: executing schema change on db1086 - T85757
14:48 moritzm: installing ghostscript security update for jessie
14:47 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1086 for schema change - T85757 (duration: 00m 45s)
14:38 banyek: depooling db1086 for schema change (T85757)
14:15 ema: cp hosts: upgrade OpenSSL from 1.1.0f to 1.1.0j
13:39 moritzm: installing ghostscript update for stretch
13:33 moritzm: installing libav security updates
13:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 T86338 T202167 (duration: 00m 44s)
13:17 moritzm: installing openjpeg2 security updates
13:17 banyek: executing schema change on db2040 (s7 codfw master) replication lag could be expected on codfw - T85757
13:13 banyek: stopping replication on db2077 prior to executing schema change on codfw s7 master (db2040) - T85757
13:06 marostegui: Deploy schema change on db1119 - T86338 T202167
13:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 T86338 T202167 (duration: 00m 45s)
13:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 T86338 T202167 (duration: 00m 47s)
12:00 moritzm: rebooting labtestpuppetmaster2001 for kernel security update
11:53 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1006.eqiad.wmnet
11:51 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1006.eqiad.wmnet
11:50 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1006.codfw.wmnet
11:46 ema: replace TLS certificates on ms-fe eqiad hosts T212215
11:41 moritzm: rebooting labtestweb2001 for kernel security update
11:24 marostegui: Deploy schema change on db1099:3311 - T86338 T202167
11:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 T86338 T202167 (duration: 00m 45s)
11:17 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2006.codfw.wmnet
11:10 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2006.codfw.wmnet
10:59 ema: replace TLS certificates on ms-fe codfw hosts T212215
10:52 moritzm: rebooting centrallog1001 for kernel security update
10:48 volans: testing the new spicerack package on cumin2001, in the unlikely event you need to use spicerack cookbooks today please use cumin1001
10:45 godog: ms-be2018 Flashing Smart Array P840 in Slot 3 [ 3.00 -> 6.60 ]
10:43 moritzm: removed labvirt1013 from debmonitor, got renamed in T212513
10:42 volans: uploaded spicerack_0.0.10-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
10:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2096 (duration: 00m 44s)
09:50 marostegui: Stop MySQL on db2096 for kernel and mysql upgrade
09:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2096 (duration: 00m 45s)
09:48 marostegui@deploy1001: sync-file aborted: Depool db2096 (duration: 00m 01s)
09:18 moritzm: installing c3p0 security updates
09:07 Zoranzoki21: Drop valid_tag from s8 by Marostegui - T212254
09:06 godog: eqiad-prod: final weight for ms-be10[44-50].eqiad.wmnet - T209618
08:56 moritzm: installing libarchive security updates
07:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 - T212692 (duration: 00m 46s)
07:30 marostegui: Fix login.logging table on db1078 - T212692
07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 - T212692 (duration: 00m 47s)
07:01 marostegui: Deploy schema change on s1 codfw master (lag will be generated on s1 codfw) - T202167 T86338
06:54 marostegui: Drop empty valid_tag table from labswiki labtestwiki - T212254
06:49 marostegui: Drop empty valid_tag table from s5 - T212254
06:25 marostegui: Drop valid_tag from s6 - T212254
06:15 marostegui: Fix last chunks on db1124:338 - T212574

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s